Abstract ChemMLis an open machine learning (ML) and informatics program suite that is designed to support and advance the data‐driven research paradigm that is currently emerging in the chemical and materials domain.ChemMLallows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general‐purpose utility, versatility, and user‐friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community.ChemMLis also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data‐driven in silico research. This article is categorized under:Software > Simulation MethodsComputer and Information Science > ChemoinformaticsStructure and Mechanism > Computational Materials ScienceSoftware > Molecular Modeling
more »
« less
A data ecosystem to support machine learning in materials science
Facilitating the application of machine learning (ML) to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materials-specific ML models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with ML models and how users can access those capabilities through web and programmatic interfaces.
more »
« less
- Award ID(s):
- 1636950
- PAR ID:
- 10134745
- Date Published:
- Journal Name:
- MRS Communications
- Volume:
- 9
- Issue:
- 4
- ISSN:
- 2159-6859
- Page Range / eLocation ID:
- 1125 to 1133
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The predictive capabilities of computational materials science today derive from overlapping advances in simulation tools, modeling techniques, and best practices. We outline this ecosystem of molecular simulations by explaining how important contributions in each of these areas have fed into each other. The combined output of these tools, techniques, and practices is the ability for researchers to advance understanding by efficiently combining simple models with powerful software. As specific examples, we show how the prediction of organic photovoltaic morphologies have improved by orders of magnitude over the last decade, and how the processing of reacting epoxy thermosets can now be investigated with million-particle models. We discuss these two materials systems and the training of materials simulators through the lens of cognitive load theory. For students, the broad view of ecosystem components should facilitate understanding how the key parts relate to each other first, followed by targeted exploration. In this way, the paper is organized in loose analogy to a coarse-grained model: The main components provide basic framing and accelerated sampling from which deeper research is better contextualized. For mentors, this paper is organized to provide a snapshot in time of the current simulation ecosystem and an on-ramp for simulation experts into the literature on pedagogical practice.more » « less
-
Rathje, E.; Montoya, B.; Wayne, M. (Ed.)The rise of data capture and storage capabilities have led to greater data granularity and sharing of data sets in geotechnical earthquake engineering. This broader shift to big data requires ways to process and extract value from it and is aided by the progress in methodologies from the computer science domain and advancements in computer hardware capabilities. General machine learning (ML) models typically receive a set of input parameters and run them through an algorithm to gain outputs with no constraints on the parameters or algorithm process. Three topic areas of ML applications in geotechnical earthquake engineering are reviewed and summarized in this paper: seismic response, liquefaction triggering analysis, and performance-based assessments (lateral displacements and settlement analysis). The current progress of ML is summarized, while the challenges and potential in adopting such approaches are addressed.more » « less
-
Data leakage remains a pervasive issue in machine learning (ML), especially when applied to science, leading to overly optimistic performance estimates and irreproducible findings. Despite its prevalence, data leakage receives limited attention in ML education, in part due to the lack of accessible, hands-on teaching resources. To address this gap, we developed interactive learning modules in which students reproduce examples from academic publications that are affected by data leakage, then repeat the evaluation without the data leakage error to see how the finding is affected. These modules were deployed by the authors in two introductory machine learning courses, enabling students to explore common forms of leakage and their impact on model reliability. Following their engagement with these materials, student feedback highlighted increased awareness of subtle pitfalls that can compromise machine learning workflows.more » « less
-
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.more » « less
An official website of the United States government

