Abstract The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general. This article is categorized under:Data: Types and Structure > Traditional Statistical DataStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Information Theoretic MethodsStatistical Models > Model Selection
more »
« less
Machine learning for hydrologic sciences: An introductory overview
Abstract The hydrologic community has experienced a surge in interest in machine learning in recent years. This interest is primarily driven by rapidly growing hydrologic data repositories, as well as success of machine learning in various academic and commercial applications, now possible due to increasing accessibility to enabling hardware and software. This overview is intended for readers new to the field of machine learning. It provides a non‐technical introduction, placed within a historical context, to commonly used machine learning algorithms and deep learning architectures. Applications in hydrologic sciences are summarized next, with a focus on recent studies. They include the detection of patterns and events such as land use change, approximation of hydrologic variables and processes such as rainfall‐runoff modeling, and mining relationships among variables for identifying controlling factors. The use of machine learning is also discussed in the context of integrated with process‐based modeling for parameterization, surrogate modeling, and bias correction. Finally, the article highlights challenges of extrapolating robustness, physical interpretability, and small sample size in hydrologic applications. This article is categorized under:Science of Water
more »
« less
- Award ID(s):
- 1931297
- PAR ID:
- 10446851
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- WIREs Water
- Volume:
- 8
- Issue:
- 5
- ISSN:
- 2049-1948
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Cryo‐electron microscopy (cryo‐EM) has become a major experimental technique to determine the structures of large protein complexes and molecular assemblies, as evidenced by the 2017 Nobel Prize. Although cryo‐EM has been drastically improved to generate high‐resolution three‐dimensional maps that contain detailed structural information about macromolecules, the computational methods for using the data to automatically build structure models are lagging far behind. The traditional cryo‐EM model building approach is template‐based homology modeling. Manual de novo modeling is very time‐consuming when no template model is found in the database. In recent years, de novo cryo‐EM modeling using machine learning (ML) and deep learning (DL) has ranked among the top‐performing methods in macromolecular structure modeling. DL‐based de novo cryo‐EM modeling is an important application of artificial intelligence, with impressive results and great potential for the next generation of molecular biomedicine. Accordingly, we systematically review the representative ML/DL‐based de novo cryo‐EM modeling methods. Their significances are discussed from both practical and methodological viewpoints. We also briefly describe the background of cryo‐EM data processing workflow. Overall, this review provides an introductory guide to modern research on artificial intelligence for de novo molecular structure modeling and future directions in this emerging field. This article is categorized under:Structure and Mechanism > Molecular StructuresStructure and Mechanism > Computational Biochemistry and BiophysicsData Science > Artificial Intelligence/Machine Learningmore » « less
-
Abstract ChemMLis an open machine learning (ML) and informatics program suite that is designed to support and advance the data‐driven research paradigm that is currently emerging in the chemical and materials domain.ChemMLallows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general‐purpose utility, versatility, and user‐friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community.ChemMLis also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data‐driven in silico research. This article is categorized under:Software > Simulation MethodsComputer and Information Science > ChemoinformaticsStructure and Mechanism > Computational Materials ScienceSoftware > Molecular Modelingmore » « less
-
Abstract The Institute for Foundations of Machine Learning (IFML) focuses on core foundational tools to power the next generation of machine learning models. Its research underpins the algorithms and data sets that make generative artificial intelligence (AI) more accurate and reliable. Headquartered at The University of Texas at Austin, IFML researchers collaborate across an ecosystem that spans University of Washington, Stanford, UCLA, Microsoft Research, the Santa Fe Institute, and Wichita State University. Over the past year, we have witnessed incredible breakthroughs in AI on topics that are at the heart of IFML's agenda, such as foundation models, LLMs, fine‐tuning, and diffusion with game‐changing applications influencing almost every area of science and technology. In this article, we seek to highlight seek to highlight the application of foundational machine learning research on key use‐inspired topics:Fairness in Imaging with Deep Learning: designing the correct metrics and algorithms to make deep networks less biased.Deep proteins: using foundational machine learning techniques to advance protein engineering and launch a biomanufacturing revolution.Sounds and Space for Audio‐Visual Learning: building agents capable of audio‐visual navigation in complex 3D environments via new data augmentations.Improving Speed and Robustness of Magnetic Resonance Imaging: using deep learning algorithms to develop fast and robust MRI methods for clinical diagnostic imaging.IFML is also responding to explosive industry demand for an AI‐capable workforce. We have launched an accessible, affordable, and scalable new degree program—the MSAI—that looks to wholly reshape the AI/ML workforce pipeline.more » « less
-
Abstract The potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability. This article is categorized under:Data Science > Artificial Intelligence/Machine LearningMolecular and Statistical Mechanics > Molecular InteractionsSoftware > Molecular Modelingmore » « less
An official website of the United States government
