skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dimensionally reduced machine learning model for predicting single component octanol–water partition coefficients
Abstract MF-LOGP, a new method for determining a single component octanol–water partition coefficients ($$LogP$$ LogP ) is presented which uses molecular formula as the only input. Octanol–water partition coefficients are useful in many applications, ranging from environmental fate and drug delivery. Currently, partition coefficients are either experimentally measured or predicted as a function of structural fragments, topological descriptors, or thermodynamic properties known or calculated from precise molecular structures. The MF-LOGP method presented here differs from classical methods as it does not require any structural information and uses molecular formula as the sole model input. MF-LOGP is therefore useful for situations in which the structure is unknown or where the use of a low dimensional, easily automatable, and computationally inexpensive calculations is required. MF-LOGP is a random forest algorithm that is trained and tested on 15,377 data points, using 10 features derived from the molecular formula to make$$LogP$$ LogP predictions. Using an independent validation set of 2713 data points, MF-LOGP was found to have an average$$RMSE$$ RMSE = 0.77 ± 0.007,$$MAE$$ MAE = 0.52 ± 0.003, and$${R}^{2}$$ R 2 = 0.83 ± 0.003. This performance fell within the spectrum of performances reported in the published literature for conventional higher dimensional models ($$RMSE$$ RMSE = 0.42–1.54,$$MAE$$ MAE = 0.09–1.07, and$${R}^{2}$$ R 2 = 0.32–0.95). Compared with existing models, MF-LOGP requires a maximum of ten features and no structural information, thereby providing a practical and yet predictive tool. The development of MF-LOGP provides the groundwork for development of more physical prediction models leveraging big data analytical methods or complex multicomponent mixtures. Graphical Abstract  more » « less
Award ID(s):
2021871
PAR ID:
10391990
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Cheminformatics
Volume:
15
Issue:
1
ISSN:
1758-2946
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this paper, we present counterexamples to maximal$$L^p$$ L p -regularity for a parabolic PDE. The example is a second-order operator in divergence form with space and time-dependent coefficients. It is well-known from Lions’ theory that such operators admit maximal$$L^2$$ L 2 -regularity on$$H^{-1}$$ H - 1 under a coercivity condition on the coefficients, and without any regularity conditions in time and space. We show that in general one cannot expect maximal$$L^p$$ L p -regularity on$$H^{-1}(\mathbb {R}^d)$$ H - 1 ( R d ) or$$L^2$$ L 2 -regularity on$$L^2(\mathbb {R}^d)$$ L 2 ( R d )
    more » « less
  2. Abstract In this paper we prove a higher dimensional analogue of Carleson’s$$\varepsilon ^{2}$$ ε 2 conjecture. Given two arbitrary disjoint Borel sets$$\Omega ^{+},\Omega ^{-}\subset \mathbb{R}^{n+1}$$ Ω + , Ω R n + 1 , and$$x\in \mathbb{R}^{n+1}$$ x R n + 1 ,$$r>0$$ r > 0 , we denote$$ \varepsilon _{n}(x,r) := \frac{1}{r^{n}}\, \inf _{H^{+}} \mathcal{H}^{n} \left ( ((\partial B(x,r)\cap H^{+}) \setminus \Omega ^{+}) \cup (( \partial B(x,r)\cap H^{-}) \setminus \Omega ^{-})\right ), $$ ε n ( x , r ) : = 1 r n inf H + H n ( ( ( B ( x , r ) H + ) Ω + ) ( ( B ( x , r ) H ) Ω ) ) , where the infimum is taken over all open affine half-spaces$$H^{+}$$ H + such that$$x \in \partial H^{+}$$ x H + and we define$$H^{-}= \mathbb{R}^{n+1} \setminus \overline{H^{+}}$$ H = R n + 1 H + . Our first main result asserts that the set of points$$x\in \mathbb{R}^{n+1}$$ x R n + 1 where$$ \int _{0}^{1} \varepsilon _{n}(x,r)^{2} \, \frac{dr}{r}< \infty $$ 0 1 ε n ( x , r ) 2 d r r < is$$n$$ n -rectifiable. For our second main result we assume that$$\Omega ^{+}$$ Ω + ,$$\Omega ^{-}$$ Ω are open and that$$\Omega ^{+}\cup \Omega ^{-}$$ Ω + Ω satisfies the capacity density condition. For each$$x \in \partial \Omega ^{+} \cup \partial \Omega ^{-}$$ x Ω + Ω and$$r>0$$ r > 0 , we denote by$$\alpha ^{\pm }(x,r)$$ α ± ( x , r ) the characteristic constant of the (spherical) open sets$$\Omega ^{\pm }\cap \partial B(x,r)$$ Ω ± B ( x , r ) . We show that, up to a set of$$\mathcal{H}^{n}$$ H n measure zero,$$x$$ x is a tangent point for both$$\partial \Omega ^{+}$$ Ω + and$$\partial \Omega ^{-}$$ Ω if and only if$$ \int _{0}^{1} \min (1,\alpha ^{+}(x,r) + \alpha ^{-}(x,r) -2) \frac{dr}{r} < \infty . $$ 0 1 min ( 1 , α + ( x , r ) + α ( x , r ) 2 ) d r r < . The first result is new even in the plane and the second one improves and extends to higher dimensions the$$\varepsilon ^{2}$$ ε 2 conjecture of Carleson. 
    more » « less
  3. Abstract Consider two half-spaces$$H_1^+$$ H 1 + and$$H_2^+$$ H 2 + in$${\mathbb {R}}^{d+1}$$ R d + 1 whose bounding hyperplanes$$H_1$$ H 1 and$$H_2$$ H 2 are orthogonal and pass through the origin. The intersection$${\mathbb {S}}_{2,+}^d:={\mathbb {S}}^d\cap H_1^+\cap H_2^+$$ S 2 , + d : = S d H 1 + H 2 + is a spherical convex subset of thed-dimensional unit sphere$${\mathbb {S}}^d$$ S d , which contains a great subsphere of dimension$$d-2$$ d - 2 and is called a spherical wedge. Choosenindependent random points uniformly at random on$${\mathbb {S}}_{2,+}^d$$ S 2 , + d and consider the expected facet number of the spherical convex hull of these points. It is shown that, up to terms of lower order, this expectation grows like a constant multiple of$$\log n$$ log n . A similar behaviour is obtained for the expected facet number of a homogeneous Poisson point process on$${\mathbb {S}}_{2,+}^d$$ S 2 , + d . The result is compared to the corresponding behaviour of classical Euclidean random polytopes and of spherical random polytopes on a half-sphere. 
    more » « less
  4. Abstract Physical experiments and numerical simulations have observed a remarkable stabilizing phenomenon: a background magnetic field stabilizes and dampens electrically conducting fluids. This paper intends to establish this phenomenon as a mathematically rigorous fact on a magnetohydrodynamic (MHD) system with anisotropic dissipation in$$\mathbb R^3$$ R 3 . The velocity equation in this system is the 3D Navier–Stokes equation with dissipation only in the$$x_1$$ x 1 -direction, while the magnetic field obeys the induction equation with magnetic diffusion in two horizontal directions. We establish that any perturbation near the background magnetic field (0, 1, 0) is globally stable in the Sobolev setting$$H^3({\mathbb {R}}^3)$$ H 3 ( R 3 ) . In addition, explicit decay rates in$$H^2({\mathbb {R}}^3)$$ H 2 ( R 3 ) are also obtained. For when there is no presence of a magnetic field, the 3D anisotropic Navier–Stokes equation is not well understood and the small data global well-posedness in$$\mathbb R^3$$ R 3 remains an intriguing open problem. This paper reveals the mechanism of how the magnetic field generates enhanced dissipation and helps to stabilize the fluid. 
    more » « less
  5. Abstract LetXbe ann-element point set in thek-dimensional unit cube$$[0,1]^k$$ [ 0 , 1 ] k where$$k \ge 2$$ k 2 . According to an old result of Bollobás and Meir (Oper Res Lett 11:19–21, 1992) , there exists a cycle (tour)$$x_1, x_2, \ldots , x_n$$ x 1 , x 2 , , x n through thenpoints, such that$$\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} \le c_k$$ i = 1 n | x i - x i + 1 | k 1 / k c k , where$$|x-y|$$ | x - y | is the Euclidean distance betweenxandy, and$$c_k$$ c k is an absolute constant that depends only onk, where$$x_{n+1} \equiv x_1$$ x n + 1 x 1 . From the other direction, for every$$k \ge 2$$ k 2 and$$n \ge 2$$ n 2 , there existnpoints in$$[0,1]^k$$ [ 0 , 1 ] k , such that their shortest tour satisfies$$\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} = 2^{1/k} \cdot \sqrt{k}$$ i = 1 n | x i - x i + 1 | k 1 / k = 2 1 / k · k . For the plane, the best constant is$$c_2=2$$ c 2 = 2 and this is the only exact value known. Bollobás and Meir showed that one can take$$c_k = 9 \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}$$ c k = 9 2 3 1 / k · k for every$$k \ge 3$$ k 3 and conjectured that the best constant is$$c_k = 2^{1/k} \cdot \sqrt{k}$$ c k = 2 1 / k · k , for every$$k \ge 2$$ k 2 . Here we significantly improve the upper bound and show that one can take$$c_k = 3 \sqrt{5} \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}$$ c k = 3 5 2 3 1 / k · k or$$c_k = 2.91 \sqrt{k} \ (1+o_k(1))$$ c k = 2.91 k ( 1 + o k ( 1 ) ) . Our bounds are constructive. We also show that$$c_3 \ge 2^{7/6}$$ c 3 2 7 / 6 , which disproves the conjecture for$$k=3$$ k = 3 . Connections to matching problems, power assignment problems, related problems, including algorithms, are discussed in this context. A slightly revised version of the Bollobás–Meir conjecture is proposed. 
    more » « less