We demonstrate the utility of an unsupervised machine learning tool for the detection of phase transitions in off-lattice systems. We focus on the application of principal component analysis (PCA) to detect the freezing transitions of two-dimensional hard-disk and three-dimensional hard-sphere systems as well as liquid-gas phase separation in a patchy colloid model. As we demonstrate, PCA autonomously discovers order-parameter-like quantities that report on phase transitions, mitigating the need for a priori construction or identification of a suitable order parameter—thus streamlining the routine analysis of phase behavior. In a companion paper, we further develop the method established here to explore the detection of phase transitions in various model systems controlled by compositional demixing, liquid crystalline ordering, and non-equilibrium active forces.
more »
« less
Minimax rates for sparse signal detection under correlation
Abstract We fully characterize the nonasymptotic minimax separation rate for sparse signal detection in the Gaussian sequence model with $$p$$ equicorrelated observations, generalizing a result of Collier, Comminges and Tsybakov. As a consequence of the rate characterization, we find that strong correlation is a blessing, moderate correlation is a curse and weak correlation is irrelevant. Moreover, the threshold correlation level yielding a blessing exhibits phase transitions at the $$\sqrt{p}$$ and $$p-\sqrt{p}$$ sparsity levels. We also establish the emergence of new phase transitions in the minimax separation rate with a subtle dependence on the correlation level. Additionally, we study group structured correlations and derive the minimax separation rate in a model including multiple random effects. The group structure turns out to fundamentally change the detection problem from the equicorrelated case and different phenomena appear in the separation rate.
more »
« less
- PAR ID:
- 10474890
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Information and Inference: A Journal of the IMA
- Volume:
- 12
- Issue:
- 4
- ISSN:
- 2049-8772
- Format(s):
- Medium: X Size: p. 2873-2969
- Size(s):
- p. 2873-2969
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Distance covariance is a popular dependence measure for two random vectors $$X$$ and $$Y$$ of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that $$X$$ and $$Y$$ are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert–Schmidt) kernel distance covariance in high dimensions, in the distributional class of $(X,Y)$ with a separable covariance structure. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between $$X$$ and $$Y$$. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by $$n\cdot \operatorname{dCor}^{2}(X,Y)/\sqrt{2}$$ in the high-dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between $$X$$ and $$Y$$.more » « less
-
Abstract We present improvements to thehydropathyscale (HPS) coarse‐grained (CG) model for simulating sequence‐specific behavior of intrinsically disordered proteins (IDPs), including their liquid–liquid phase separation (LLPS). The previous model based on an atomistic hydropathy scale by Kapcha and Rossky (KR scale) is not able to capture some well‐known LLPS trends such as reduced phase separation propensity upon mutations (R‐to‐K and Y‐to‐F). Here, we propose to use the Urry hydropathy scale instead, which was derived from the inverse temperature transitions in a model polypeptide with guest residues X. We introduce two free parameters to shift (Δ) and scale (µ) the overall interaction strengths for the new model (HPS‐Urry) and use the experimental radius of gyration for a diverse group of IDPs to find their optimal values. Interestingly, many possible (Δ,µ) combinations can be used for typical IDPs, but the phase behavior of a low‐complexity (LC) sequence FUS is only well described by one of these models, which highlights the need for a careful validation strategy based on multiple proteins. The CG HPS‐Urry model should enable accurate simulations of protein LLPS and provide a microscopically detailed view of molecular interactions.more » « less
-
Abstract Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension p can be larger than the sample size n. Once a change point is detected, we estimate the change point location by maximising the ℓ∞-norm of the generalised CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when p is much larger than n. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results.more » « less
-
Abstract We consider simple mean field continuum models for first order liquid–liquid demixing and solid–liquid phase transitions and show how the Maxwell construction at phase coexistence emerges on going from finite-size closed systems to the thermodynamic limit. The theories considered are the Cahn–Hilliard model of phase separation, which is also a model for the liquid-gas transition, and the phase field crystal model of the solid–liquid transition. Our results show that states comprising the Maxwell line depend strongly on the mean density with spatially localized structures playing a key role in the approach to the thermodynamic limit.more » « less
An official website of the United States government
