Abstract Stationary points embedded in the derivatives are often critical for a model to be interpretable and may be considered as key features of interest in many applications. We propose a semiparametric Bayesian model to efficiently infer the locations of stationary points of a nonparametric function, which also produces an estimate of the function. We use Gaussian processes as a flexible prior for the underlying function and impose derivative constraints to control the function's shape via conditioning. We develop an inferential strategy that intentionally restricts estimation to the case of at least one stationary point, bypassing possible mis-specifications in the number of stationary points and avoiding the varying dimension problem that often brings in computational complexity. We illustrate the proposed methods using simulations and then apply the method to the estimation of event-related potentials derived from electroencephalography (EEG) signals. We show how the proposed method automatically identifies characteristic components and their latencies at the individual level, which avoids the excessive averaging across subjects that is routinely done in the field to obtain smooth curves. By applying this approach to EEG data collected from younger and older adults during a speech perception task, we are able to demonstrate how the time course of speech perception processes changes with age.
more »
« less
Peak Persistence Diagrams for Signal Estimation under Additive and Warping Noise
Addressing the fundamental challenge of signal estimation from noisy data is a crucial aspect of signal processing and data analysis. Existing literature offers various estimators based on distinct observation models and criteria for estimation. This paper introduces an innovative framework that leverages topological and geometric features of the data for signal estimation. The proposed approach introduces a topological tool -- {\it peak-persistence diagram} (PPD) -- to analyze prominent peaks within potential solutions. Initially, the PPD estimates the unknown shape, incorporating details such as the number of internal peaks and valleys. Subsequently, a shape-constrained optimization strategy is employed to estimate the signal. This approach strikes a balance between two prior approaches: signal averaging without alignment and signal averaging with complete elastic alignment. Importantly, the proposed method provides an estimator within a statistical model where the signal is affected by both additive and warping noise. A computationally efficient procedure for implementing this solution is presented, and its effectiveness is demonstrated through simulations and real-world examples, including applications to COVID rate curves and household electricity consumption curves. The results showcase superior performance of the proposed approach compared to several current state-of-the-art techniques.
more »
« less
- Award ID(s):
- 1953087
- PAR ID:
- 10557061
- Publisher / Repository:
- arXiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- Florida State University
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the persistent homology of both functional data on compact topological spaces and structural data presented as compact metric measure spaces. One of our goals is to define persistent homology so as to capture primarily properties of the shape of a signal, eliminating otherwise highly persistent homology classes that may exist simply because of the nature of the domain on which the signal is defined. We investigate the stability of these invariants using metrics that downplay regions where signals are weak. The distance between two signals is small if they exhibit high similarity in regions where they are strong, regardless of the nature of their full domains, in particular allowing different homotopy types. Consistency and estimation of persistent homology of metric measure spaces from data are studied within this framework. We also apply the methodology to the construction of multi-scale topological descriptors for data on compact Riemannian manifolds via metric relaxations derived from the heat kernel.more » « less
-
null (Ed.)Abstract Target enrichment (such as Hyb-Seq) is a well-established high throughput sequencing method that has been increasingly used for phylogenomic studies. Unfortunately, current widely used pipelines for analysis of target enrichment data do not have a vigorous procedure to remove paralogs in target enrichment data. In this study, we develop a pipeline we call Putative Paralogs Detection (PPD) to better address putative paralogs from enrichment data. The new pipeline is an add-on to the existing HybPiper pipeline, and the entire pipeline applies criteria in both sequence similarity and heterozygous sites at each locus in the identification of paralogs. Users may adjust the thresholds of sequence identity and heterozygous sites to identify and remove paralogs according to the level of phylogenetic divergence of their group of interest. The new pipeline also removes highly polymorphic sites attributed to errors in sequence assembly and gappy regions in the alignment. We demonstrated the value of the new pipeline using empirical data generated from Hyb-Seq and the Angiosperm 353 kit for two woody genera Castanea (Fagaceae, Fagales) and Hamamelis (Hamamelidaceae, Saxifragales). Comparisons of datasets showed that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed evident differences between data from HybPiper and data from our new PPD pipeline. We further evaluated the accuracy and error rates of PPD by BLAST mapping of putative paralogous and orthologous sequences to a reference genome sequence of Castanea mollissima. Compared to HybPiper alone, PPD identified substantially more paralogous gene sequences that mapped to multiple regions of the reference genome (31 genes for PPD compared with 4 genes for HybPiper alone). In conjunction with HybPiper, paralogous genes identified by both pipelines can be removed resulting in the construction of more robust orthologous gene datasets for phylogenomic and divergence time analyses. Our study demonstrates the value of Hyb-Seq with data derived from the Angiosperm 353 probe set for elucidating species relationships within a genus, and argues for the importance of additional steps to filter paralogous genes and poorly aligned regions (e.g., as occur through assembly errors), such as our new PPD pipeline described in this study.more » « less
-
The problem of using covariates to predict shapes of objects in a regression setting is important in many fields. A formal statistical approach, termed Geodesic regression model, is commonly used for modeling and analyzing relationships between Euclidean predictors and shape responses. Despite its popularity, this model faces several key challenges, including (i) misalignment of shapes due to pre-processing steps, (ii) difficulties in shape alignment due to imaging heterogeneity, and (iii) lack of spatial correlation in shape structures. This paper proposes a comprehensive geodesic factor regression model that addresses all these challenges. Instead of using shapes as extracted from pre-registered data, it takes a more fundamental approach, incorporating alignment step within the proposed regression model and learns them using both pre-shape and covariate data. Additionally, it specifies spatial correlation structures using low-dimensional representations, including latent factors on the tangent space and isotropic error terms. The proposed framework results in substantial improvements in regression performance, as demonstrated through simulation studies and a real data analysis on Corpus Callosum contour data obtained from the ADNI study.more » « less
-
Diffraction patterns from small protein crystals illuminated by highly coherent X-rays often contain measurable interference signals between Bragg peaks. This coherent `shape transform' signal introduces enough additional information to allow the molecular densities to be determined from the diffracted intensities directly, without prior information or resolution restrictions. However, the various correlations amongst molecular occupancies/vacancies at the crystal surface result in a subtle yet critical problem in shape transform phasing whereby the sublattices of symmetry-related molecules exhibit a form of partial coherence amongst lattice sites when an average is taken over many crystal patterns. Here an iterative phase retrieval algorithm is developed which is capable of treating this problem; it is demonstrated on simulated data.more » « less
An official website of the United States government
