skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Extreme Event Analysis in Next Generation Simulation Architectures
Numerical simulations present challenges because they generate petabyte-scale data that must be extracted and reduced during the simulation. We demonstrate a seamless integration of feature extraction for a simulation of turbulent fluid dynamics. The simulation produces on the order of 6 TB per timestep. In order to analyze and store this data, we extract velocity data from a dilated volume of the strong vortical regions and also store a lossy compressed representation of the data. Both reduce data by one or more orders of magnitude. We extract data from user checkpoints in transit while they reside on temporary burst buffer SSD stores. In this way, analysis and compression algorithms are designed to meet specific time constraints so they do not interfere with simulation computations. Our results demonstrate that we can perform feature extraction on a world-class direct numerical simulation of turbulence while it is running and gather meaningful scientific data for archival and post analysis.  more » « less
Award ID(s):
1633124
PAR ID:
10042382
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
High Performance Computing. ISC 2017. Lecture Notes in Computer Science
Volume:
10266
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To analyze the abundance of multidimensional data, tensor-based frameworks have been developed. Traditionally, the matrix singular value decomposition (SVD) is used to extract the most dominant features from a matrix containing the vectorized data. While the SVD is highly useful for data that can be appropriately represented as a matrix, this step of vectorization causes us to lose the high-dimensional relationships intrinsic to the data. To facilitate efficient multidimensional feature extraction, we utilize a projection-based classification algorithm using the t-SVDM, a tensor analog of the matrix SVD. Our work extends the t-SVDM framework and the classification algorithm, both initially proposed for tensors of order 3, to any number of dimensions. We then apply this algorithm to a classification task using the StarPlus fMRI dataset. Our numerical experiments demonstrate that there exists a superior tensor-based approach to fMRI classification than the best possible equivalent matrix-based approach. Our results illustrate the advantages of our chosen tensor framework, provide insight into beneficial choices of parameters, and could be further developed for classification of more complex imaging data. We provide our Python implementation at https://github.com/elizabethnewman/tensor-fmri 
    more » « less
  2. Endmember extraction plays a prominent role in a variety of data analysis problems as endmembers often correspond to data representing the purest or best representative of some feature. Identifying endmembers then can be useful for further identification and classification tasks. In settings with high-dimensional data, such as hyperspectral imagery, it can be useful to consider endmembers that are subspaces as they are capable of capturing a wider range of variations of a signature. The endmember extraction problem in this setting thus translates to finding the vertices of the convex hull of a set of points on a Grassmannian. In the presence of noise, it can be less clear whether a point should be considered a vertex. In this paper, we propose an algorithm to extract endmembers on a Grassmannian, identify subspaces of interest that lie near the boundary of a convex hull, and demonstrate the use of the algorithm on a synthetic example and on the 220 spectral band AVIRIS Indian Pines hyperspectral image. 
    more » « less
  3. Advances in materials science require leveraging past findings and data from the vast published literature. While some materials data repositories are being built, they typically rely on newly created data in narrow domains because extracting detailed data and metadata from the enormous wealth of publications is immensely challenging. The advent of large language models (LLMs) presents a new opportunity to rapidly and accurately extract data and insights from the published literature and transform it into structured data formats for easy query and reuse. In this paper, we build on initial strategies for using LLMs for rapid and autonomous data extraction from materials science articles in a format curatable by materials databases. We presented the subdomain of polymer composites as our example use case and demonstrated the success and challenges of LLMs on extracting tabular data. We explored diferent table representations for use with LLMs, fnding that a multimodal model with an image input yielded the most promising results. This model achieved an accuracy score of 0.910 for composition information extraction and an F1 score of 0.863 for property name information extraction. With the most conservative evaluation for the property extraction requiring exact match in all the details, we obtained an F1 score of 0.419. We observed that by allowing varying degrees of fexibility in the evaluation, the score can increase to 0.769. We envision that the results and analysis from this study will promote further research directions in developing information extraction strategies from materials information sources. 
    more » « less
  4. Abstract Imaging algorithms form powerful analysis tools for very long baseline interferometry (VLBI) data analysis. However, these tools cannot measure certain image features (e.g., ring diameter) by their nonparametric nature. This is unfortunate since these image features are often related to astrophysically relevant quantities such as black hole mass. This paper details a new general image feature-extraction technique that applies to a wide variety of VLBI image reconstructions calledvariational image domain analysis. Unlike previous tools, variational image domain analysis can be applied to any image reconstruction regardless of its structure. To demonstrate its flexibility, we analyze thousands of reconstructions from previous Event Horizon Telescope synthetic data sets and recover image features such as diameter, orientation, and ellipticity. By measuring these features, our technique can help extract astrophysically relevant quantities such as the mass and orientation of the central black hole in M87. 
    more » « less
  5. null (Ed.)
    This method is adapted and updated from methods originally published in Grottoli et al. (2004) and is based on the original methods of Folch & Stanley (1957), and Bligh & Dyer (1959). There are five parts to extracting lipids from ground corals: 1) grind and sub-sample the coral and store at -80 °C until ready to extract, 2) freeze-dry the sample, 3) extract the lipids from the freeze-dried samples, 4) standardize the lipid concentration to ash-free dry weight (AFDW), and 5) resuspend the extracted lipid for long-term storage and possible later analysis of lipid classes or isotopes. The lipid extraction procedure must be conducted in a fume hood with the sash as low as possible with the researcher wearing protective eyewear, gloves, and lab coat at all times. Important considerations regarding lipid analysis were gained from reading Chapter 1.3 “Lipid extraction, storage, and sample handling” from the textbook Lipid Analysis by Christie (2003). This method was originally developed by Andréa Grottoli and refined by Rowan McLachlan (06-11-18) with the guidance of Dr. Agus Muñoz-Garcia at The Ohio State University. This protocol was written by Rowan McLachlan (03-12-2020). dx.doi.org/10.17504/protocols.io.bc4qiyvw 
    more » « less