skip to main content


Title: Elastic Depths for Detecting Shape Anomalies in Functional Data
We propose a new family of depth measures called the elastic depths that can be used to greatly improve shape anomaly detection in functional data. Shape anomalies are functions that have considerably different geometric forms or features from the rest of the data. Identifying them is generally more difficult than identifying magnitude anomalies because shape anomalies are often not distinguishable from the bulk of the data with visualization methods. The proposed elastic depths use the recently developed elastic distances to directly measure the centrality of functions in the amplitude and phase spaces. Measuring shape outlyingness in these spaces provides a rigorous quantification of shape, which gives the elastic depths a strong theoretical and practical advantage over other methods in detecting shape anomalies. A simple boxplot and thresholding method is introduced to identify shape anomalies using the elastic depths. We assess the elastic depth’s detection skill on simulated shape outlier scenarios and compare them against popular shape anomaly detectors. Finally, we use hurricane trajectories to demonstrate the elastic depth methodology on manifold valued functional data.  more » « less
Award ID(s):
1922758 1830312
NSF-PAR ID:
10291130
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Technometrics
ISSN:
0040-1706
Page Range / eLocation ID:
1 to 11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Density estimation is a widely used method to perform unsupervised anomaly detection. By learning the density function, data points with relatively low densities are classified as anomalies. Unfortunately, the presence of anomalies in training data may significantly impact the density estimation process, thereby imposing significant challenges to the use of more sophisticated density estimation methods such as those based on deep neural networks. In this work, we propose RobustRealNVP, a deep density estimation framework that enhances the robustness of flow-based density estimation methods, enabling their application to unsupervised anomaly detection. RobustRealNVP differs from existing flow-based models from two perspectives. First, RobustRealNVP discards data points with low estimated densities during optimization to prevent them from corrupting the density estimation process. Furthermore, it imposes Lipschitz regularization to the flow-based model to enforce smoothness in the estimated density function. We demonstrate the robustness of our algorithm against anomalies in training data from both theoretical and empirical perspectives. The results show that our algorithm achieves competitive results as compared to state-of-the-art unsupervised anomaly detection methods. 
    more » « less
  2. ABSTRACT

    New time-domain surveys, such as the Vera C. Rubin Observatory Legacy Survey of Space and Time, will observe millions of transient alerts each night, making standard approaches of visually identifying new and interesting transients infeasible. We present two novel methods of automatically detecting anomalous transient light curves in real-time. Both methods are based on the simple idea that if the light curves from a known population of transients can be accurately modelled, any deviations from model predictions are likely anomalies. The first modelling approach is a probabilistic neural network built using Temporal Convolutional Networks (TCNs) and the second is an interpretable Bayesian parametric model of a transient. We demonstrate our methods’ ability to provide anomaly scores as a function of time on light curves from the Zwicky Transient Facility. We show that the flexibility of neural networks, the attribute that makes them such a powerful tool for many regression tasks, is what makes them less suitable for anomaly detection when compared with our parametric model. The parametric model is able to identify anomalies with respect to common supernova classes with high precision and recall scores, achieving area under the precision-recall curves above 0.79 for most rare classes such as kilonovae, tidal disruption events, intermediate luminosity transients, and pair-instability supernovae. Our ability to identify anomalies improves over the lifetime of the light curves. Our framework, used in conjunction with transient classifiers, will enable fast and prioritized followup of unusual transients from new large-scale surveys.

     
    more » « less
  3. ABSTRACT Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. ‘#diffuse’), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100 per cent accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled data sets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code zoobot. Zoobot is accessible to researchers with no prior experience in deep learning. 
    more » « less
  4. Abstract

    The Tell Atlas of Algeria has a huge potential for hydrothermal energy from over 240 thermal springs with temperatures up to$$98^\circ$$98C in the Guelma area. The most exciting region is situated in the northeastern part which is known to have the hottest hydrothermal systems. In this work, we use a high-resolution gravity study to identify the location and origin of the hot water, and how it reaches the surface. Gravimetric data analysis shows the shapes of the anomalies arising due to structures at different subsurface depths. The calculation of the energy spectrum for the data also showcases the depths of the bodies causing anomalies. 3D-Euler deconvolution is applied to estimate the depths of preexisting tectonic structures (faults). These preprocessing steps assist with assessing signal attenuation that impacts the Bouguer anomaly map. The residual anomaly is used in a three-dimensional inversion to provide a subsurface density distribution model that illustrates the locations of the origin of the dominant subsurface thermal systems. Overall, the combination of these standard processing steps applied to the measurements of gravity data at the surface provides new insights about the sources of the hydrothermal systems in the Hammam Debagh and Hammam Ouled Ali regions. Faults that are key to the water infiltrating from depth to the surface are also identified. These represent the pathway of the hot water in the study area.

     
    more » « less
  5. Anomaly detection aims at identifying data points that show systematic deviations from the major- ity of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary la- bels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by out- lier exposure (Hendrycks et al., 2018) that con- siders synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our exper- iments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines. 
    more » « less