- Authors:
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publication Date:
- NSF-PAR ID:
- 10340565
- Journal Name:
- Frontiers in Big Data
- Volume:
- 5
- ISSN:
- 2624-909X
- Sponsoring Org:
- National Science Foundation
More Like this
-
The rapid development and application of machine learning (ML) techniques in materials science have led to new tools for machine-enabled and autonomous/high-throughput materials design and discovery. Alongside, efforts to extract data from traditional experiments in the published literature with natural language processing (NLP) algorithms provide opportunities to develop tremendous data troves for these in silico design and discovery endeavors. While NLP is used in all aspects of society, its application in materials science is still in the very early stages. This perspective provides a case study on the application of NLP to extract information related to the preparation of organic materials. We present the case study at a basic level with the aim to discuss these technologies and processes with researchers from diverse scientific backgrounds. We also discuss the challenges faced in the case study and provide an assessment to improve the accuracy of NLP techniques for materials science with the aid of community contributions.
-
In 2020, the White House released the “Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset,” wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. We present automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on the graph mining and transformer models. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover ongoing research findings such as the relationship between COVID-19 and oxytocin hormone.All code, details, andmore »
-
Abstract Plants, and the biological systems around them, are key to the future health of the planet and its inhabitants. The Plant Science Decadal Vision 2020–2030 frames our ability to perform vital and far‐reaching research in plant systems sciences, essential to how we value participants and apply emerging technologies. We outline a comprehensive vision for addressing some of our most pressing global problems through discovery, practical applications, and education. The Decadal Vision was developed by the participants at the Plant Summit 2019, a community event organized by the Plant Science Research Network. The Decadal Vision describes a holistic vision for the next decade of plant science that blends recommendations for research, people, and technology. Going beyond discoveries and applications, we, the plant science community, must implement bold, innovative changes to research cultures and training paradigms in this era of automation, virtualization, and the looming shadow of climate change. Our vision and hopes for the next decade are encapsulated in the phrase reimagining the potential of plants for a healthy and sustainable future. The Decadal Vision recognizes the vital intersection of human and scientific elements and demands an integrated implementation of strategies for research (Goals 1–4), people (Goals 5 and 6),more »
-
Zero to a trillion: Advancing surface process studies with open access to high resolution topographyTarolli, P. ; Mudd, S. (Ed.)High-resolution topography (HRT) is a powerful observational tool for studying the Earth's surface, vegetation, and urban landscapes, with broad scientific, engineering, and education-based applications. Submeter resolution imaging is possible when collected with laser and photogrammetric techniques using the ground, air, and space-based platforms. Open access to these data and a cyberinfrastructure platform that enables users to discover, manage, share, and process then increases the impact of investments in data collection and catalyzes scientific discovery. Furthermore, open and online access to data enables broad interdisciplinary use of HRT across academia and in communities such as education, public agencies, and the commercial sector. OpenTopography, supported by the US National Science Foundation, aims to democratize access to Earth science-oriented, HRT data and processing tools. We utilize cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient web-based visualization and access to large, HRT datasets. OT colocates data with processing tools to enable users to quickly access custom data and derived products for their application, with the ultimate goal of making these powerful data easier to use. OT's rapidly growing data holdings currently include 283 lidar and photogrammetric, point cloud datasets (>1.2 trillion points) covering 236,364km2. As a testament to OT'smore »
-
Abstract Benchmark datasets and benchmark problems have been a key aspect for the success of modern machine learning applications in many scientific domains. Consequently, an active discussion about benchmarks for applications of machine learning has also started in the atmospheric sciences. Such benchmarks allow for the comparison of machine learning tools and approaches in a quantitative way and enable a separation of concerns for domain and machine learning scientists. However, a clear definition of benchmark datasets for weather and climate applications is missing with the result that many domain scientists are confused. In this paper, we equip the domain of atmospheric sciences with a recipe for how to build proper benchmark datasets, a (nonexclusive) list of domain-specific challenges for machine learning is presented, and it is elaborated where and what benchmark datasets will be needed to tackle these challenges. We hope that the creation of benchmark datasets will help the machine learning efforts in atmospheric sciences to be more coherent, and, at the same time, target the efforts of machine learning scientists and experts of high-performance computing to the most imminent challenges in atmospheric sciences. We focus on benchmarks for atmospheric sciences (weather, climate, and air-quality applications). However, many aspectsmore »