skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Review on functional data classification
Abstract A fundamental problem in functional data analysis is to classify a functional observation based on training data. The application of functional data classification has gained immense popularity and utility across a wide array of disciplines, encompassing biology, engineering, environmental science, medical science, neurology, social science, and beyond. The phenomenal growth of the application of functional data classification indicates the urgent need for a systematic approach to develop efficient classification methods and scalable algorithmic implementations. Therefore, we here conduct a comprehensive review of classification methods for functional data. The review aims to bridge the gap between the functional data analysis community and the machine learning community, and to intrigue new principles for functional data classification. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and ClassificationStatistical Models > Classification ModelsData: Types and Structure > Time Series, Stochastic Processes, and Functional Data  more » « less
Award ID(s):
2319342
PAR ID:
10526733
Author(s) / Creator(s):
; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
WIREs Computational Statistics
Volume:
16
Issue:
1
ISSN:
1939-5108
Page Range / eLocation ID:
e1638
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general. This article is categorized under:Data: Types and Structure > Traditional Statistical DataStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Information Theoretic MethodsStatistical Models > Model Selection 
    more » « less
  2. Abstract Optimal transport (OT) methods seek a transformation map (or plan) between two probability measures, such that the transformation has the minimum transportation cost. Such a minimum transport cost, with a certain power transform, is called the Wasserstein distance. Recently, OT methods have drawn great attention in statistics, machine learning, and computer science, especially in deep generative neural networks. Despite its broad applications, the estimation of high‐dimensional Wasserstein distances is a well‐known challenging problem owing to the curse‐of‐dimensionality. There are some cutting‐edge projection‐based techniques that tackle high‐dimensional OT problems. Three major approaches of such techniques are introduced, respectively, the slicing approach, the iterative projection approach, and the projection robust OT approach. Open challenges are discussed at the end of the review. This article is categorized under:Statistical and Graphical Methods of Data Analysis > Dimension ReductionStatistical Learning and Exploratory Methods of the Data Sciences > Manifold Learning 
    more » « less
  3. Abstract Fusion learning methods, developed for the purpose of analyzing datasets from many different sources, have become a popular research topic in recent years. Individualized inference approaches through fusion learning extend fusion learning approaches to individualized inference problems over a heterogeneous population, where similar individuals are fused together to enhance the inference over the target individual. Both classical fusion learning and individualized inference approaches through fusion learning are established based on weighted aggregation of individual information, but the weight used in the latter is localized to thetargetindividual. This article provides a review on two individualized inference methods through fusion learning,iFusion andiGroup, that are developed under different asymptotic settings. Both procedures guarantee optimal asymptotic theoretical performance and computational scalability. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Manifold LearningStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Nonparametric MethodsData: Types and Structure > Massive Data 
    more » « less
  4. Abstract ChemMLis an open machine learning (ML) and informatics program suite that is designed to support and advance the data‐driven research paradigm that is currently emerging in the chemical and materials domain.ChemMLallows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general‐purpose utility, versatility, and user‐friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community.ChemMLis also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data‐driven in silico research. This article is categorized under:Software > Simulation MethodsComputer and Information Science > ChemoinformaticsStructure and Mechanism > Computational Materials ScienceSoftware > Molecular Modeling 
    more » « less
  5. Abstract To date, many AI initiatives (eg, AI4K12, CS for All) developed standards and frameworks as guidance for educators to create accessible and engaging Artificial Intelligence (AI) learning experiences for K‐12 students. These efforts revealed a significant need to prepare youth to gain a fundamental understanding of how intelligence is created, applied, and its potential to perpetuate bias and unfairness. This study contributes to the growing interest in K‐12 AI education by examining student learning of modelling real‐world text data. Four students from an Advanced Placement computer science classroom at a public high school participated in this study. Our qualitative analysis reveals that the students developed nuanced and in‐depth understandings of how text classification models—a type of AI application—are trained. Specifically, we found that in modelling texts, students: (1) drew on their social experiences and cultural knowledge to create predictive features, (2) engineered predictive features to address model errors, (3) described model learning patterns from training data and (4) reasoned about noisy features when comparing models. This study contributes to an initial understanding of student learning of modelling unstructured data and offers implications for scaffolding in‐depth reasoning about model decision making. Practitioner notesWhat is already known about this topicScholarly attention has turned to examining Artificial Intelligence (AI) literacy in K‐12 to help students understand the working mechanism of AI technologies and critically evaluate automated decisions made by computer models.While efforts have been made to engage students in understanding AI through building machine learning models with data, few of them go in‐depth into teaching and learning of feature engineering, a critical concept in modelling data.There is a need for research to examine students' data modelling processes, particularly in the little‐researched realm of unstructured data.What this paper addsResults show that students developed nuanced understandings of models learning patterns in data for automated decision making.Results demonstrate that students drew on prior experience and knowledge in creating features from unstructured data in the learning task of building text classification models.Students needed support in performing feature engineering practices, reasoning about noisy features and exploring features in rich social contexts that the data set is situated in.Implications for practice and/or policyIt is important for schools to provide hands‐on model building experiences for students to understand and evaluate automated decisions from AI technologies.Students should be empowered to draw on their cultural and social backgrounds as they create models and evaluate data sources.To extend this work, educators should consider opportunities to integrate AI learning in other disciplinary subjects (ie, outside of computer science classes). 
    more » « less