NSF PAR Search | NSF Public Access Repository

Dense gas in molecular clouds is an important signature of ongoing and future star formation. We identify and track dense cores in the Starforge simulations, following the core evolution from birth through dispersal by stellar feedback for typical Milky Way cloud conditions. Only ∼8% of cores host protostars, and most disperse before forming stars. The median starless and protostellar core lifetimes are ∼0.5–0.6 Myr and ∼0.8–1.1 Myr, respectively, where the protostellar phase lasts 0.1 Myr. While core evolution is stochastic, we find that virial ratios and line widths decline in prestellar cores, coincident with turbulent decay. Collapse occurs over ∼0.1 Myr, once the central density exceeds ≳10⁶cm⁻³. Starless cores, only, follow line-width–size and mass–size relations,σ∝R^0.3andM∝R¹. The core median mass, radius, and velocity dispersion scale weakly with the cloud magnetic field strength. We cluster the core properties and find that protostellar cores have >80% likelihood of belonging to three particular groups that are characterized by high central densities, compact radii, and lower virial parameters. Overall, core evolution appears to be universally set by the interplay of gravity and magnetized turbulence, while stellar feedback dictates protostellar core properties and sets the protostellar phase lifetime.

Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

Anderson, Cullen; Phillips, Jeff M (February 2025, Transactions on machine learning research)

Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on large data size requirements as a function of dimension. In this paper, we perform an extensive experimentation over various mean estimation techniques where data size might not meet this requirement due to the highdimensional setting. For data with inliers generated from a Gaussian with known covariance, we find experimentally that several robust mean estimation techniques can practically improve upon the sample mean, with the quantum entropy scaling approach from Dong et.al. (NeurIPS 2019) performing consistently the best. However, this consistent improvement is conditioned on a couple of simple modifications to how the steps to prune outliers work in the high-dimension low-data setting, and when the inliers deviate significantly from Gaussianity. In fact, with these modifications, they are typically able to achieve roughly the same error as taking the sample mean of the uncorrupted inlier data, even with very low data size. In addition to controlled experiments on synthetic data, we also explore these methods on large language models, deep pretrained image models, and non-contextual word embedding models that do not necessarily have an inherent Gaussian distribution. Yet, in these settings, a mean point of a set of embedded objects is a desirable quantity to learn, and the data exhibits the high-dimension low-data setting studied in this paper. We show both the challenges of achieving this goal, and that our updated robust mean estimation methods can provide significant improvement over using just the sample mean. We additionally publish a library of Python implementations of robust mean estimation algorithms, allowing practitioners and researchers to apply these techniques and to perform further experimentation.

Free, publicly-accessible full text available February 11, 2026

Search for: All records