The era of ‘big data’ promises to provide new hydrologic insights, and open web‐based platforms are being developed and adopted by the hydrologic science community to harness these datasets and data services. This shift accompanies advances in hydrology education and the growth of web‐based hydrology learning modules, but their capacity to utilize emerging open platforms and data services to enhance student learning through data‐driven activities remains largely untapped. Given that generic equations may not easily translate into local or regional solutions, teaching students to explore how well models or equations work in particular settings or to answer specific problems using real data is essential. This article introduces an open web‐based module developed to advance data‐driven hydrologic process learning, targeting upper level undergraduate and early graduate students in hydrology and engineering. The module was developed and deployed on the HydroLearn open educational platform, which provides a formal pedagogical structure for developing effective problem‐based learning activities. We found that data‐driven learning activities utilizing collaborative open web platforms like CUAHSI HydroShare and JupyterHub to store and run computational notebooks allowed students to access and work with datasets for systems of personal interest and promoted critical evaluation of results and assumptions. Initial student feedback was generally positive, but also highlighted challenges including trouble‐shooting and future‐proofing difficulties and some resistance to programming and new software. Opportunities to further enhance hydrology learning include better articulating the benefits of coding and open web platforms upfront, incorporating additional user‐support tools, and focusing methods and questions on implementing and adapting notebooks to explore fundamental processes rather than tools and syntax. The profound shift in the field of hydrology toward big data, open data services and reproducible research practices requires hydrology instructors to rethink traditional content delivery and focus instruction on harnessing these datasets and practices in the preparation of future hydrologists and engineers.
Abstract. Recently, deep learning (DL) has emerged as a revolutionary andversatile tool transforming industry applications and generating new andimproved capabilities for scientific discovery and model building. Theadoption of DL in hydrology has so far been gradual, but the field is nowripe for breakthroughs. This paper suggests that DL-based methods can open up acomplementary avenue toward knowledge discovery in hydrologic sciences. Inthe new avenue, machine-learning algorithms present competing hypotheses thatare consistent with data. Interrogative methods are then invoked to interpretDL models for scientists to further evaluate. However, hydrology presentsmany challenges for DL methods, such as data limitations, heterogeneityand co-evolution, and the general inexperience of the hydrologic field withDL. The roadmap toward DL-powered scientific advances will require thecoordinated effort from a large community involving scientists and citizens.Integrating process-based models with DL models will help alleviate datalimitations. The sharing of data and baseline models will improve theefficiency of the community as a whole. Open competitions could serve as theorganizing events to greatly propel growth and nurture data science educationin hydrology, which demands a grassroots collaboration. The area ofhydrologic DL presents numerous research opportunities that could, in turn,stimulate advances in machine learning as well.
more » « less- Award ID(s):
- 1832294
- PAR ID:
- 10086784
- Date Published:
- Journal Name:
- Hydrology and Earth System Sciences
- Volume:
- 22
- Issue:
- 11
- ISSN:
- 1607-7938
- Page Range / eLocation ID:
- 5639 to 5656
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call
data synergy , where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL modelsbenefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing. -
null (Ed.)Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.more » « less
-
Abstract Machine learning (ML) has become a central focus of the computational chemistry community. I will first discuss my personal history in the field. Then I will provide a broader view of how this resurgence in ML interest echoes and advances upon earlier efforts. Although numerous changes have brought about this latest wave, one of the most significant is the increased accuracy and efficiency of low‐cost methods (e. g., density functional theory or DFT) that have made it possible to generate large data sets for ML models. ML has also been used to bypass, guide, or improve DFT. The field of computational chemistry thus finds itself at a crossroads as ML both augments and supersedes traditional efforts. I will present what I believe the role of the computational chemist will be in this evolving landscape, with specific focus on my experience in the development of autonomous workflows in computational materials discovery for open‐shell transition‐metal chemistry.
-
null (Ed.)Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at this https://tdcommons.ai.more » « less