skip to main content

Title: Dimensionally reduced machine learning model for predicting single component octanol–water partition coefficients

MF-LOGP, a new method for determining a single component octanol–water partition coefficients ($$LogP$$LogP) is presented which uses molecular formula as the only input. Octanol–water partition coefficients are useful in many applications, ranging from environmental fate and drug delivery. Currently, partition coefficients are either experimentally measured or predicted as a function of structural fragments, topological descriptors, or thermodynamic properties known or calculated from precise molecular structures. The MF-LOGP method presented here differs from classical methods as it does not require any structural information and uses molecular formula as the sole model input. MF-LOGP is therefore useful for situations in which the structure is unknown or where the use of a low dimensional, easily automatable, and computationally inexpensive calculations is required. MF-LOGP is a random forest algorithm that is trained and tested on 15,377 data points, using 10 features derived from the molecular formula to make$$LogP$$LogPpredictions. Using an independent validation set of 2713 data points, MF-LOGP was found to have an average$$RMSE$$RMSE= 0.77 ± 0.007,$$MAE$$MAE= 0.52 ± 0.003, and$${R}^{2}$$R2= 0.83 ± 0.003. This performance fell within the spectrum of performances reported in the published literature for conventional higher dimensional models ($$RMSE$$RMSE= 0.42–1.54,$$MAE$$MAE= 0.09–1.07, and$${R}^{2}$$R2= 0.32–0.95). Compared with existing models, MF-LOGP requires a maximum of ten features and no structural information, thereby providing a practical and yet predictive tool. The development of MF-LOGP provides the groundwork for development of more physical prediction models leveraging big data analytical methods or complex multicomponent mixtures.

Graphical Abstract

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Cheminformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Consider two half-spaces$$H_1^+$$H1+and$$H_2^+$$H2+in$${\mathbb {R}}^{d+1}$$Rd+1whose bounding hyperplanes$$H_1$$H1and$$H_2$$H2are orthogonal and pass through the origin. The intersection$${\mathbb {S}}_{2,+}^d:={\mathbb {S}}^d\cap H_1^+\cap H_2^+$$S2,+d:=SdH1+H2+is a spherical convex subset of thed-dimensional unit sphere$${\mathbb {S}}^d$$Sd, which contains a great subsphere of dimension$$d-2$$d-2and is called a spherical wedge. Choosenindependent random points uniformly at random on$${\mathbb {S}}_{2,+}^d$$S2,+dand consider the expected facet number of the spherical convex hull of these points. It is shown that, up to terms of lower order, this expectation grows like a constant multiple of$$\log n$$logn. A similar behaviour is obtained for the expected facet number of a homogeneous Poisson point process on$${\mathbb {S}}_{2,+}^d$$S2,+d. The result is compared to the corresponding behaviour of classical Euclidean random polytopes and of spherical random polytopes on a half-sphere.

    more » « less
  2. Abstract

    We report on a measurement of Spin Density Matrix Elements (SDMEs) in hard exclusive$$\rho ^0$$ρ0meson muoproduction at COMPASS using 160 GeV/cpolarised$$ \mu ^{+}$$μ+and$$ \mu ^{-}$$μ-beams impinging on a liquid hydrogen target. The measurement covers the kinematic range 5.0 GeV/$$c^2$$c2$$< W<$$<W<17.0 GeV/$$c^2$$c2, 1.0 (GeV/c)$$^2$$2$$< Q^2<$$<Q2<10.0 (GeV/c)$$^2$$2and 0.01 (GeV/c)$$^2$$2$$< p_{\textrm{T}}^2<$$<pT2<0.5 (GeV/c)$$^2$$2. Here,Wdenotes the mass of the final hadronic system,$$Q^2$$Q2the virtuality of the exchanged photon, and$$p_{\textrm{T}}$$pTthe transverse momentum of the$$\rho ^0$$ρ0meson with respect to the virtual-photon direction. The measured non-zero SDMEs for the transitions of transversely polarised virtual photons to longitudinally polarised vector mesons ($$\gamma ^*_T \rightarrow V^{ }_L$$γTVL) indicate a violation ofs-channel helicity conservation. Additionally, we observe a dominant contribution of natural-parity-exchange transitions and a very small contribution of unnatural-parity-exchange transitions, which is compatible with zero within experimental uncertainties. The results provide important input for modelling Generalised Parton Distributions (GPDs). In particular, they may allow one to evaluate in a model-dependent way the role of parton helicity-flip GPDs in exclusive$$\rho ^0$$ρ0production.

    more » « less
  3. Abstract

    The electricE1 and magneticM1 dipole responses of the$$N=Z$$N=Znucleus$$^{24}$$24Mg were investigated in an inelastic photon scattering experiment. The 13.0 MeV electrons, which were used to produce the unpolarised bremsstrahlung in the entrance channel of the$$^{24}$$24Mg($$\gamma ,\gamma ^{\prime }$$γ,γ) reaction, were delivered by the ELBE accelerator of the Helmholtz-Zentrum Dresden-Rossendorf. The collimated bremsstrahlung photons excited one$$J^{\pi }=1^-$$Jπ=1-, four$$J^{\pi }=1^+$$Jπ=1+, and six$$J^{\pi }=2^+$$Jπ=2+states in$$^{24}$$24Mg. De-excitation$$\gamma $$γrays were detected using the four high-purity germanium detectors of the$$\gamma $$γELBE setup, which is dedicated to nuclear resonance fluorescence experiments. In the energy region up to 13.0 MeV a total$$B(M1)\uparrow = 2.7(3)~\mu _N^2$$B(M1)=2.7(3)μN2is observed, but this$$N=Z$$N=Znucleus exhibits only marginalE1 strength of less than$$\sum B(E1)\uparrow \le 0.61 \times 10^{-3}$$B(E1)0.61×10-3 e$$^2 \, $$2fm$$^2$$2. The$$B(\varPi 1, 1^{\pi }_i \rightarrow 2^+_1)/B(\varPi 1, 1^{\pi }_i \rightarrow 0^+_{gs})$$B(Π1,1iπ21+)/B(Π1,1iπ0gs+)branching ratios in combination with the expected results from the Alaga rules demonstrate thatKis a good approximative quantum number for$$^{24}$$24Mg. The use of the known$$\rho ^2(E0, 0^+_2 \rightarrow 0^+_{gs})$$ρ2(E0,02+0gs+)strength and the measured$$B(M1, 1^+ \rightarrow 0^+_2)/B(M1, 1^+ \rightarrow 0^+_{gs})$$B(M1,1+02+)/B(M1,1+0gs+)branching ratio of the 10.712 MeV$$1^+$$1+level allows, in a two-state mixing model, an extraction of the difference$$\varDelta \beta _2^2$$Δβ22between the prolate ground-state structure and shape-coexisting superdeformed structure built upon the 6432-keV$$0^+_2$$02+level.

    more » « less
  4. Abstract

    Approximate integer programming is the following: For a given convex body$$K \subseteq {\mathbb {R}}^n$$KRn, either determine whether$$K \cap {\mathbb {Z}}^n$$KZnis empty, or find an integer point in the convex body$$2\cdot (K - c) +c$$2·(K-c)+cwhich isK, scaled by 2 from its center of gravityc. Approximate integer programming can be solved in time$$2^{O(n)}$$2O(n)while the fastest known methods for exact integer programming run in time$$2^{O(n)} \cdot n^n$$2O(n)·nn. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point$$x^* \in (K \cap {\mathbb {Z}}^n)$$x(KZn)can be found in time$$2^{O(n)}$$2O(n), provided that theremaindersof each component$$x_i^* \mod \ell $$ximodfor some arbitrarily fixed$$\ell \ge 5(n+1)$$5(n+1)of$$x^*$$xare given. The algorithm is based on acutting-plane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a$$2^{O(n)}n^n$$2O(n)nnalgorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (Integer programming, lattice algorithms, and deterministic, vol. Estimation. Georgia Institute of Technology, Atlanta, 2012) that is considerably more involved. Our algorithm also relies on a newasymmetric approximate Carathéodory theoremthat might be of interest on its own. Our second method concerns integer programming problems in equation-standard form$$Ax = b, 0 \le x \le u, \, x \in {\mathbb {Z}}^n$$Ax=b,0xu,xZn. Such a problem can be reduced to the solution of$$\prod _i O(\log u_i +1)$$iO(logui+1)approximate integer programming problems. This implies, for example thatknapsackorsubset-sumproblems withpolynomial variable range$$0 \le x_i \le p(n)$$0xip(n)can be solved in time$$(\log n)^{O(n)}$$(logn)O(n). For these problems, the best running time so far was$$n^n \cdot 2^{O(n)}$$nn·2O(n).

    more » « less
  5. Abstract

    The elliptic flow$$(v_2)$$(v2)of$${\textrm{D}}^{0}$$D0mesons from beauty-hadron decays (non-prompt$${\textrm{D}}^{0})$$D0)was measured in midcentral (30–50%) Pb–Pb collisions at a centre-of-mass energy per nucleon pair$$\sqrt{s_{\textrm{NN}}} = 5.02$$sNN=5.02 TeV with the ALICE detector at the LHC. The$${\textrm{D}}^{0}$$D0mesons were reconstructed at midrapidity$$(|y|<0.8)$$(|y|<0.8)from their hadronic decay$$\mathrm {D^0 \rightarrow K^-\uppi ^+}$$D0K-π+, in the transverse momentum interval$$2< p_{\textrm{T}} < 12$$2<pT<12 GeV/c. The result indicates a positive$$v_2$$v2for non-prompt$${{\textrm{D}}^{0}}$$D0mesons with a significance of 2.7$$\sigma $$σ. The non-prompt$${{\textrm{D}}^{0}}$$D0-meson$$v_2$$v2is lower than that of prompt non-strange D mesons with 3.2$$\sigma $$σsignificance in$$2< p_\textrm{T} < 8~\textrm{GeV}/c$$2<pT<8GeV/c, and compatible with the$$v_2$$v2of beauty-decay electrons. Theoretical calculations of beauty-quark transport in a hydrodynamically expanding medium describe the measurement within uncertainties.

    more » « less