The massive surge in the amount of observational field data demands richer and more meaningful collab-oration between data scientists and geoscientists. This document was written by members of the Working Group on Case Studies of the NSF-funded RCN on Intelli-gent Systems Research To Support Geosciences (IS-GEO, https:// is-geo.org/ ) to describe our vision to build and enhance such collaboration through the use of specially-designed benchmark datasets. Benchmark datasets serve as summary descriptions of problem areas, providing a simple interface between disciplines without requiring extensive background knowledge. Benchmark data intend to address a number of overarching goals. First, they are concrete, identifiable, and public, which results in a natural coordination of research efforts across multiple disciplines and institutions. Second, they provide multi-fold opportunities for objective comparison of various algorithms in terms of computational costs, accuracy, utility and other measurable standards, to address a particular question in geoscience. Third, as materials for education, the benchmark data cultivate future human capital and interest in geoscience problems and data science methods. Finally, a concerted effort to produce and publish benchmarks has the potential to spur the development of new data science methods, while provid-ing deeper insights into many fundamental problems in modern geosciences. That is, similarly to the critical role the genomic and molecular biology data archives serve in facilitating the field of bioinformatics, we expect that the proposed geosciences data repository will serve as “catalysts” for the new discicpline of geoinformatics. We describe specifications of a high quality geoscience bench-mark dataset and discuss some of our first benchmark efforts. We invite the Climate Informatics community to join us in creating additional benchmarks that aim to address important climate science problems.
more »
« less
A vision for the development of benchmarks to bridge geoscience and data science
The massive surge in the amount of observational field data demands richer and more meaningful collab- oration between data scientists and geoscientists. This document was written by members of the Working Group on Case Studies of the NSF-funded RCN on Intelli- gent Systems Research To Support Geosciences (IS-GEO, https://is-geo.org/) to describe our vision to build and enhance such collaboration through the use of specially- designed benchmark datasets. Benchmark datasets serve as summary descriptions of problem areas, providing a simple interface between disciplines without requiring extensive background knowledge. Benchmark data intend to address a number of overarching goals. First, they are concrete, identifiable, and public, which results in a natural coordination of research efforts across multiple disciplines and institutions. Second, they provide multi- fold opportunities for objective comparison of various algorithms in terms of computational costs, accuracy, utility and other measurable standards, to address a particular question in geoscience. Third, as materials for education, the benchmark data cultivate future human capital and interest in geoscience problems and data science methods. Finally, a concerted effort to produce and publish benchmarks has the potential to spur the development of new data science methods, while provid- ing deeper insights into many fundamental problems in modern geosciences. That is, similarly to the critical role the genomic and molecular biology data archives serve in facilitating the field of bioinformatics, we expect that the proposed geosciences data repository will serve as “catalysts” for the new discicpline of geoinformatics. We describe specifications of a high quality geoscience bench- mark dataset and discuss some of our first benchmark efforts. We invite the Climate Informatics community to join us in creating additional benchmarks that aim to address important climate science problems.
more »
« less
- Award ID(s):
- 1632211
- PAR ID:
- 10057023
- Date Published:
- Journal Name:
- 7th International Workshop on Climate Informatics
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Despite the increasingly successful application of neural networks to many problems in the geosciences, their complex and nonlinear structure makes the interpretation of their predictions difficult, which limits model trust and does not allow scientists to gain physical insights about the problem at hand. Many different methods have been introduced in the emerging field of eXplainable Artificial Intelligence (XAI), which aims at attributing the network’s prediction to specific features in the input domain. XAI methods are usually assessed by using benchmark datasets (such as MNIST or ImageNet for image classification). However, an objective, theoretically derived ground truth for the attribution is lacking for most of these datasets, making the assessment of XAI in many cases subjective. Also, benchmark datasets specifically designed for problems in geosciences are rare. Here, we provide a framework, based on the use of additively separable functions, to generate attribution benchmark datasets for regression problems for which the ground truth of the attribution is known a priori. We generate a large benchmark dataset and train a fully connected network to learn the underlying function that was used for simulation. We then compare estimated heatmaps from different XAI methods to the ground truth in order to identify examples where specific XAI methods perform well or poorly. We believe that attribution benchmarks as the ones introduced herein are of great importance for further application of neural networks in the geosciences, and for more objective assessment and accurate implementation of XAI methods, which will increase model trust and assist in discovering new science.more » « less
-
Abstract Benchmark datasets and benchmark problems have been a key aspect for the success of modern machine learning applications in many scientific domains. Consequently, an active discussion about benchmarks for applications of machine learning has also started in the atmospheric sciences. Such benchmarks allow for the comparison of machine learning tools and approaches in a quantitative way and enable a separation of concerns for domain and machine learning scientists. However, a clear definition of benchmark datasets for weather and climate applications is missing with the result that many domain scientists are confused. In this paper, we equip the domain of atmospheric sciences with a recipe for how to build proper benchmark datasets, a (nonexclusive) list of domain-specific challenges for machine learning is presented, and it is elaborated where and what benchmark datasets will be needed to tackle these challenges. We hope that the creation of benchmark datasets will help the machine learning efforts in atmospheric sciences to be more coherent, and, at the same time, target the efforts of machine learning scientists and experts of high-performance computing to the most imminent challenges in atmospheric sciences. We focus on benchmarks for atmospheric sciences (weather, climate, and air-quality applications). However, many aspects of this paper will also hold for other aspects of the Earth system sciences or are at least transferable. Significance Statement Machine learning is the study of computer algorithms that learn automatically from data. Atmospheric sciences have started to explore sophisticated machine learning techniques and the community is making rapid progress on the uptake of new methods for a large number of application areas. This paper provides a clear definition of so-called benchmark datasets for weather and climate applications that help to share data and machine learning solutions between research groups to reduce time spent in data processing, to generate synergies between groups, and to make tool developments more targeted and comparable. Furthermore, a list of benchmark datasets that will be needed to tackle important challenges for the use of machine learning in atmospheric sciences is provided.more » « less
-
Traditional Knowledge (TK) is a qualitative and quantitative living body of knowledge developed locally and regionally across generations over thousands of years. This study aims to show through authentic voice the importance of centering TK systems and cultural needs to provide equitable geoscience education programs. TK can be communicated through a variety of methods, such as story and song, dance, paintings, carvings, structures, and textiles. TK is interdisciplinary within anthropological and ecological subsistence and provide enhanced cultural and spiritual context. Research findings are enhanced by the exploratory and inquiry-based design of TK and provide insight into the anthropogenic impacts on the environment allowing researchers to gain a rich understanding of human behaviors and patterns when collecting and analyzing data. This study examines factors influencing Indigenous students’ participation and retention in the geosciences, specifically gauging opinions on the incorporation of TK systems into geoscience education. Data was collected using an electronic survey to identify factors that inform students’ decision to enter geoscience disciplines and better understand the importance of role models and mentors for retention. Our findings indicate that Indigenous students were interested in using both TK and Western science in geoscience learning spaces, Indigenous role models played an important role in sense of belonging and identity in the geosciences, and the incorporation of culture into learning experiences played an important role in retention. Findings from this study, if operationalized, would allow geoscience departments to increase retention of Indigenous students and faculty, provide equitable educational opportunities, and to better understand how to effect cultural change in the geosciences by providing a welcoming and affirming space for Indigenous scholars.more » « less
-
There have been many efforts to broaden participation and diversity in the geosciences with varying degrees of success. The goal of the National Science Foundation-funded GeoScholar Program in the School of the Earth, Ocean & Environment (SEOE) at the University of South Carolina was to increase geoscience exposure and the number of geoscience undergraduate majors (environmental, geological, and marine sciences) from low-income, minority, and first-generation college backgrounds.more » « less
An official website of the United States government

