Abstract As a key water quality parameter, dissolved oxygen (DO) concentration, and particularly changes in bottom water DO is fundamental for understanding the biogeochemical processes in lake ecosystems. Based on two machine learning (ML) models, Gradient Boost Regressor (GBR) and long‐short‐term‐memory (LSTM) network, this study developed three ML model approaches: direct GBR; direct LSTM; and a 2‐step mixed ML model workflow combining both GBR and LSTM. They were used to simulate multi‐year surface and bottom DO concentrations in five lakes. All approaches were trained with readily available environmental data as predictors. Indices of lake thermal structure and mixing provided by a one‐dimensional (1‐D) hydrodynamic model were also included as predictors in the ML models. The advantages of each ML approach were not consistent for all the tested lakes, but the best one of them was defined that can estimate DO concentration with coefficient of determination (R2) up to 0.6–0.7 in each lake. All three approaches have normalized mean absolute error (NMAE) under 0.15. In a polymictic lake, the 2‐step mixed model workflow showed better representation of bottom DO concentrations, with a highest true positive rate (TPR) of hypolimnetic hypoxia detection of over 90%, while the other workflows resulted in, TPRs are around 50%. In most of the tested lakes, the predicted surface DO concentrations and variables indicating stratified conditions (i.e., Wedderburn number and the temperature difference between surface and bottom water) are essential for simulating bottom DO. The ML approaches showed promising results and could be used to support short‐ and long‐term water management plans.
more »
« less
"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning
Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering---how do MLEs do MLOps, and what are their unaddressed challenges? To address these questions, we conducted semi-structured ethnographic interviews with 18 MLEs working on various applications, including chatbots, autonomous vehicles, and finance. We find that MLEs engage in a workflow of (i) data preparation, (ii) experimentation, (iii) evaluation throughout a multi-staged deployment, and (iv) continual monitoring and response. Throughout this workflow, MLEs collaborate extensively with data scientists, product stakeholders, and one another, supplementing routine verbal exchanges with communication tools ranging from Slack to organization-wide ticketing and reporting systems. We introduce the 3Vs of MLOps: velocity, visibility, and versioning --- three virtues of successful ML deployments that MLEs learn to balance and grow as they mature. Finally, we discuss design implications and opportunities for future work.
more »
« less
- Award ID(s):
- 1940757
- PAR ID:
- 10531529
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Human-Computer Interaction
- Volume:
- 8
- Issue:
- CSCW1
- ISSN:
- 2573-0142
- Page Range / eLocation ID:
- 1 to 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Machine learning (ML) plays an increasingly important role in improving a user's experience. However, most UX practitioners face challenges in understanding ML's capabilities or envisioning what it might be. We interviewed 13 designers who had many years of experience designing the UX of ML-enhanced products and services. We probed them to characterize their practices. They shared they do not view themselves as ML experts, nor do they think learning more about ML would make them better designers. Instead, our participants appeared to be the most successful when they engaged in ongoing collaboration with data scientists to help envision what to make and when they embraced a data-centric culture. We discuss the implications of these findings in terms of UX education and as opportunities for additional design research in support of UX designers working with ML.more » « less
-
The performance of inference with machine learning (ML) models and its integration with analytical query processing have become critical bottlenecks for data analysis in many organizations. An ML inference pipeline typically consists of a preprocessing workflow followed by prediction with an ML model. Current approaches for in-database inference implement preprocessing operators and ML algorithms in the database either natively, by transpiling code to SQL, or by executing user-defined functions in guest languages such as Python. In this work, we present a radically different approach that approximates an end-to-end inference pipeline (preprocessing plus prediction) using a light-weight embedding that discretizes a carefully selected subset of the input features and an index that maps data points in the embedding space to aggregated predictions of an ML model. We replace a complex preprocessing workflow and model-based inference with a simple feature transformation and an index lookup. Our framework improves inference latency by several orders of magnitude while maintaining similar prediction accuracy compared to the pipeline it approximates.more » « less
-
Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models that often do not capture any underlying continuous dynamics—either of the system of interest, or indeed of any related system. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates an underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.more » « less
-
Abstract Recent calls have been made for equity tools and frameworks to be integrated throughout the research and design life cycle —from conception to implementation—with an emphasis on reducing inequity in artificial intelligence (AI) and machine learning (ML) applications. Simply stating that equity should be integrated throughout, however, leaves much to be desired as industrial ecology (IE) researchers, practitioners, and decision‐makers attempt to employ equitable practices. In this forum piece, we use a critical review approach to explain how socioecological inequities emerge in ML applications across their life cycle stages by leveraging the food system. We exemplify the use of a comprehensive questionnaire to delineate unfair ML bias across data bias, algorithmic bias, and selection and deployment bias categories. Finally, we provide consolidated guidance and tailored strategies to help address AI/ML unfair bias and inequity in IE applications. Specifically, the guidance and tools help to address sensitivity, reliability, and uncertainty challenges. There is also discussion on how bias and inequity in AI/ML affect other IE research and design domains, besides the food system—such as living labs and circularity. We conclude with an explanation of the future directions IE should take to address unfair bias and inequity in AI/ML. Last, we call for systemic equity to be embedded throughout IE applications to fundamentally understand domain‐specific socioecological inequities, identify potential unfairness in ML, and select mitigation strategies in a manner that translates across different research domains.more » « less
An official website of the United States government

