NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Linear convergence of the subspace constrained mean shift algorithm: from Euclidean to directional data

https://doi.org/10.1093/imaiai/iaac005

Zhang, Yikun; Chen, Yen-Chi (April 2022, Information and Inference: A Journal of the IMA)

Abstract This paper studies the linear convergence of the subspace constrained mean shift (SCMS) algorithm, a well-known algorithm for identifying a density ridge defined by a kernel density estimator. By arguing that the SCMS algorithm is a special variant of a subspace constrained gradient ascent (SCGA) algorithm with an adaptive step size, we derive the linear convergence of such SCGA algorithm. While the existing research focuses mainly on density ridges in the Euclidean space, we generalize density ridges and the SCMS algorithm to directional data. In particular, we establish the stability theorem of density ridges with directional data and prove the linear convergence of our proposed directional SCMS algorithm.
more » « less
sconce : a cosmic web finder for spherical and conic geometries

https://doi.org/10.1093/mnras/stac2504

Zhang, Yikun; de Souza, Rafael S.; Chen, Yen-Chi (October 2022, Monthly Notices of the Royal Astronomical Society)

ABSTRACT The latticework structure known as the cosmic web provides a valuable insight into the assembly history of large-scale structures. Despite the variety of methods to identify the cosmic web structures, they mostly rely on the assumption that galaxies are embedded in a Euclidean geometric space. Here, we present a novel cosmic web identifier called sconce (Spherical and CONic Cosmic wEb finder) that inherently considers the 2D (RA, DEC) spherical or the 3D (RA, DEC, z) conic geometry. The proposed algorithms in sconce generalize the well-known subspace constrained mean shift (scms) method and primarily address the predominant filament detection problem. They are intrinsic to the spherical/conic geometry and invariant to data rotations. We further test the efficacy of our method with an artificial cross-shaped filament example and apply it to the SDSS galaxy catalogue, revealing that the 2D spherical version of our algorithms is robust even in regions of high declination. Finally, using N-body simulations from Illustris, we show that the 3D conic version of our algorithms is more robust in detecting filaments than the standard scms method under the redshift distortions caused by the peculiar velocities of haloes. Our cosmic web finder is packaged in python as sconce-scms and has been made publicly available.
more » « less
Posthurricane damage assessment using satellite imagery and geolocation features

https://doi.org/10.1111/risa.14244

Cao, Quoc Dung; Choe, Youngjun (May 2024, Risk Analysis)

Gaining timely and reliable situation awareness after hazard events such as a hurricane is crucial to emergency managers and first responders. One effective way to achieve that goal is through damage assessment. Recently, disaster researchers have been utilizing imagery captured through satellites or drones to quantify the number of flooded/damaged buildings. In this paper, we propose a mixed‐data approach, which leverages publicly available satellite imagery and geolocation features of the affected area to identify damaged buildings after a hurricane. The method demonstrated significant improvement from performing a similar task using only imagery features, based on a case study of Hurricane Harvey affecting Greater Houston area in 2017. This result opens door to a wide range of possibilities to unify the advancement in computer vision algorithms such as convolutional neural networks and traditional methods in damage assessment, for example, using flood depth or bare‐earth topology. In this work, a creative choice of the geolocation features was made to provide extra information to the imagery features, but it is up to the users to decide which other features can be included to model the physical behavior of the events, depending on their domain knowledge and the type of disaster. The data set curated in this work is made openly available (DOI: 10.17603/ds2‐3cca‐f398).
more » « less
Full Text Available
Data-driven sparse polynomial chaos expansion for models with dependent inputs

https://doi.org/10.1016/j.jnlssr.2023.08.003

Liu, Zhanlin; Choe, Youngjun (December 2023, Journal of Safety Science and Resilience)

Full Text Available
Skeleton Clustering: Dimension-Free Density-Aided Clustering

https://doi.org/10.1080/01621459.2023.2174122

Wei, Zeyu; Chen, Yen-Chi (March 2023, Journal of the American Statistical Association)

Full Text Available
The Emptiness Inside: Finding Gaps, Valleys, and Lacunae with Geometric Data Analysis

https://doi.org/10.3847/1538-3881/ac961e

Contardo, Gabriella; Hogg, David W.; Hunt, Jason A.; Peek, Joshua E.; Chen, Yen-Chi (October 2022, The Astronomical Journal)

Abstract Discoveries of gaps in data have been important in astrophysics. For example, there are kinematic gaps opened by resonances in dynamical systems, or exoplanets of a certain radius that are empirically rare. A gap in a data set is a kind of anomaly, but in an unusual sense: instead of being a single outlier data point, situated far from other data points, it is a region of the space, or a set of points, that is anomalous compared to its surroundings. Gaps are both interesting and hard to find and characterize, especially when they have nontrivial shapes. We present in this paper a statistic that can be used to estimate the (local) “gappiness” of a point in the data space. It uses the gradient and Hessian of the density estimate (and thus requires a twice-differentiable density estimator). This statistic can be computed at (almost) any point in the space and does not rely on optimization; it allows us to highlight underdense regions of any dimensionality and shape in a general and efficient way. We illustrate our method on the velocity distribution of nearby stars in the Milky Way disk plane, which exhibits gaps that could originate from different processes. Identifying and characterizing those gaps could help determine their origins. We provide in an appendix implementation notes and additional considerations for finding underdensities in data, using critical points and the properties of the Hessian of the density. 7 7 A Python implementation of t methods presented here is available at https://github.com/contardog/FindTheGap .
more » « less
Full Text Available
Statistical Inference with Local Optima

https://doi.org/10.1080/01621459.2021.2023550

Chen, Yen-Chi (February 2022, Journal of the American Statistical Association)

Full Text Available
Pattern graphs: A graphical approach to nonmonotone missing data

https://doi.org/10.1214/21-AOS2094

Chen, Yen-Chi (February 2022, The Annals of Statistics)

Full Text Available
Solution manifold and its statistical applications

https://doi.org/10.1214/21-EJS1962

Chen, Yen-Chi (January 2022, Electronic Journal of Statistics)

Full Text Available
Splitting Gaussian processes for computationally-efficient regression

https://doi.org/10.1371/journal.pone.0256470

Terry, Nick; Choe, Youngjun (August 2021, PLOS ONE)
Scalas, Enrico (Ed.)
Gaussian processes offer a flexible kernel method for regression. While Gaussian processes have many useful theoretical properties and have proven practically useful, they suffer from poor scaling in the number of observations. In particular, the cubic time complexity of updating standard Gaussian process models can be a limiting factor in applications. We propose an algorithm for sequentially partitioning the input space and fitting a localized Gaussian process to each disjoint region. The algorithm is shown to have superior time and space complexity to existing methods, and its sequential nature allows the model to be updated efficiently. The algorithm constructs a model for which the time complexity of updating is tightly bounded above by a pre-specified parameter. To the best of our knowledge, the model is the first local Gaussian process regression model to achieve linear memory complexity. Theoretical continuity properties of the model are proven. We demonstrate the efficacy of the resulting model on several multi-dimensional regression tasks.
more » « less
Full Text Available

« Prev Next »

Search for: All records