skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: AI-ready data in space science and solar physics: problems, mitigation and action plan
In the domain of space science, numerous ground-based and space-borne data of various phenomena have been accumulating rapidly, making analysis and scientific interpretation challenging. However, recent trends in the application of artificial intelligence (AI) have been shown to be promising in the extraction of information or knowledge discovery from these extensive data sets. Coincidentally, preparing these data for use as inputs to the AI algorithms, referred to as AI-readiness, is one of the outstanding challenges in leveraging AI in space science. Preparation of AI-ready data includes, among other aspects: 1) collection (accessing and downloading) of appropriate data representing the various physical parameters associated with the phenomena under study from different repositories; 2) addressing data formats such as conversion from one format to another, data gaps, quality flags and labeling; 3) standardizing metadata and keywords in accordance with NASA archive requirements or other defined standards; 4) processing of raw data such as data normalization, detrending, and data modeling; and 5) documentation of technical aspects such as processing steps, operational assumptions, uncertainties, and instrument profiles. Making all existing data AI-ready within a decade is impractical and data from future missions and investigations exacerbates this. This reveals the urgency to set the standards and start implementing them now. This article presents our perspective on the AI-readiness of space science data and mitigation strategies including definition of AI-readiness for AI applications; prioritization of data sets, storage, and accessibility; and identifying the responsible entity (agencies, private sector, or funded individuals) to undertake the task.  more » « less
Award ID(s):
2026579
PAR ID:
10661255
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Frontiers in Astronomy and Space Sciences
Date Published:
Journal Name:
Frontiers in Astronomy and Space Sciences
Volume:
10
ISSN:
2296-987X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A team of literacy, science, and theatre educators have been working to engage children in an urban public school system in the United States through embodied performances, where students embody and dramatise science ideas. This study focuses on one fourth‐grade classroom when instruction was done remotely due to Covid‐19. Children in the class were asked to compose videos of themselves acting out and/or exploring science phenomena and concepts, and we analysed the affordances of these multimodal compositions. We situate the need for this study in claims from the Next Generation Science Standards that literacy skills are necessary to build and communicate science knowledge. In doing so, we center social semiotics perspectives that conceive of composition broadly as production‐oriented processes drawing from various semiotic resources. The multimodal compositions in Mr. M's science class included both primarily embodied compositions and primarily digital compositions, and we elaborate on one focal example of each in the findings. Intertwined affordances of the focal children and their classmates' multimodal science compositions include opportunities to creatively engage with and negotiate science ideas, to draw from personal and social knowledge during meaning‐making, and to intentionally make rhetorical choices. 
    more » « less
  2. A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology:Imageomics. Like most other -omics fields, Imageomics also uses emerging technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning (ML). In 2019, we started working on a United States National Science Foundation (NSF) funded project, known as Biology Guided Neural Networks (BGNN) with the purpose of extracting information about biology by using neural networks and biological guidance such as species descriptions, identifications, phylogenetic trees and morphological annotations (Bart et al. 2021). Even though the variety and abundance of biological data is satisfactory for some ML analysis and the data are openly accessible, researchers still spend up to 80% of their time preparing data into a usable, AI-ready format, leaving only 20% for exploration and modeling (Long and Romanoff 2023). For this reason, we have built a dataset composed of digitized fish specimens, taken either directly from collections or from specialized repositories. The range of digital representations we cover is broad and growing, from photographs and radiographs, to CT scans, and even illustrations. We have added new groups of vocabularies to the dataset management system including image quality metadata, extended image metadata and batch metadata. With the image quality metadata and extended image metadata, we aimed to extract information from the digital objects that can possibly help ML scientists in their research with filtering, image processing and object recognition routines. Image quality metadata provides information about objects contained in the image, features and condition of the specimen, and some basic visual properties of the image, while extended image metadata provides information about technical properties of the digital file and the digital multimedia object (Bakış et al. 2021, Karnani et al. 2022, Leipzig et al. 2021, Pepper et al. 2021, Wang et al. 2021) (see details on Fish-AIR vocabulary web page). Batch metadata is used for separating different datasets and facilitates downloading and uploading data in batches with additional batch information and supplementary files. Additional flexibility, built into the database infrastructure using an RDF framework, will enable the system to host different taxonomic groups, which might require new metadata features (Jebbia et al. 2023). By the combination of these features, along with FAIR (Findable, Accessable, Interoperable, Reusable) principles, and reproducibility, we provide Artificial Intelligence Readiness (AIR; Long and Romanoff 2023) to the dataset. Fish-AIR provides an easy-to-access, filtered, annotated and cleaned biological dataset for researchers from different backgrounds and facilitates the integration of biological knowledge based on digitized preserved specimens into ML pipelines. Because of the flexible database infrastructure and addition of new datasets, researchers will also be able to access additional types of data—such as landmarks, specimen outlines, annotated parts, and quality scores—in the near future. Already, the dataset is the largest and most detailed AI-ready fish image dataset with integrated Image Quality Management System (Jebbia et al. 2023, Wang et al. 2021). 
    more » « less
  3. Understanding the world around us is a growing necessity for the whole public, as citizens are required to make informed decisions in their everyday lives about complex issues. Systems thinking (ST) is a promising approach for developing solutions to various problems that society faces and has been acknowledged as a crosscutting concept that should be integrated across educational science disciplines. However, studies show that engaging students in ST is challenging, especially concerning aspects like change over time and feedback. Using computational system models and a system dynamics approach can support students in overcoming these challenges when making sense of complex phenomena. In this paper, we describe an empirical study that examines how 10th grade students engage in aspects of ST through computational system modeling as part of a Next Generation Science Standards-aligned project-based learning unit on chemical kinetics. We show students’ increased capacity to explain the underlying mechanism of the phenomenon in terms of change over time that goes beyond linear causal relationships. However, student models and their accompanying explanations were limited in scope as students did not address feedback mechanisms as part of their modeling and explanations. In addition, we describe specific challenges students encountered when evaluating and revising models. In particular, we show epistemological barriers to fruitful use of real-world data for model revision. Our findings provide insights into the opportunities of a system dynamics approach and the challenges that remain in supporting students to make sense of complex phenomena and nonlinear mechanisms. 
    more » « less
  4. Despite significant contributions to various aspects of cybersecurity, cyber-attacks remain on the unfortunate rise. Increasingly, internationally recognized entities such as the National Science Foundation and National Science & Technology Council have noted Artificial Intelligence can help analyze billions of log files, Dark Web data, malware, and other data sources to help execute fundamental cybersecurity tasks. Our objective for the 1st Workshop on Artificial Intelligence-enabled Cybersecurity Analytics (half-day; co-located with ACM KDD) was to gather academic and practitioners to contribute recent work pertaining to AI-enabled cybersecurity analytics. We composed an outstanding, inter-disciplinary Program Committee with significant expertise in various aspects of AI-enabled Cybersecurity Analytics to evaluate the submitted work. Significant contributions to the half-day workshop were made in the areas of CTI, vulnerability assessment, and malware analysis. 
    more » « less
  5. Meng, X-L (Ed.)
    Many data science students and practitioners are reluctant to adopt good coding practices as long as the code ‘works.’ However, code standards are an important part of modern data science practice, and they play an essential role in the development of data acumen. Good coding practices lead to more reliable code and save more time than they cost, making them important even for beginners. We believe that principled coding is vital for quality data science practice. To effectively instill these practices within academic programs, instructors and programs need to begin establishing these practices early, to reinforce them often, and to hold themselves to a higher standard while guiding students. We describe key aspects of good coding practices for data science, illustrating with examples in R and in Python, though similar standards are applicable to other software environments. Practical coding guidelines are organized into a top ten list. 
    more » « less