Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available July 1, 2026
-
Public records requests are a central mechanism for government transparency. In practice, they are slow, complex processes that require analyzing large amounts of messy, unstructured data. In this paper, we introduce RequestAtlas, a system that helps investigative journalists review large quantities of unstructured data that result from submitting many public records requests. RequestAtlas was developed through a year-long participatory design collaboration with the California Reporting Project (CRP), a journalistic collective researching police use of force and police misconduct in California. RequestAtlas helps journalists evaluate the results of public records requests for completeness and negotiate with agencies for additional information. RequestAtlas has had significant real-world impact. It has been deployed for more than a year to identify missing data in response to public records requests and to facilitate negotiation with public records request officers. Through the process of designing and observing the use of RequestAtlas, we explore the technical challenges associated with the public records request process and the design needs of investigative journalists more generally. We argue that public records requests represent an instance of an adversarialtechnical relationshipin which two entities engage in a prolonged, iterative, often adversarial exchange of information. Technologists can support information-gathering efforts within these adversarial technical relationships by building flexible local solutions that help both entities account for the state of the ongoing information exchange. Additionally, we offer insights on ways to design applications that can assist investigative journalists in the inevitably significant data cleaning phase of processing large documents while supporting journalistic norms of verification and human review. Finally, we reflect on the ways that this participatory design process, despite its success, lays bare some of the limitations inherent in the public records request process and in the ''request and respond'' model of transparency more generally.more » « lessFree, publicly-accessible full text available May 2, 2026
-
Free, publicly-accessible full text available May 19, 2026
-
Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering---how do MLEs do MLOps, and what are their unaddressed challenges? To address these questions, we conducted semi-structured ethnographic interviews with 18 MLEs working on various applications, including chatbots, autonomous vehicles, and finance. We find that MLEs engage in a workflow of (i) data preparation, (ii) experimentation, (iii) evaluation throughout a multi-staged deployment, and (iv) continual monitoring and response. Throughout this workflow, MLEs collaborate extensively with data scientists, product stakeholders, and one another, supplementing routine verbal exchanges with communication tools ranging from Slack to organization-wide ticketing and reporting systems. We introduce the 3Vs of MLOps: velocity, visibility, and versioning --- three virtues of successful ML deployments that MLEs learn to balance and grow as they mature. Finally, we discuss design implications and opportunities for future work.more » « less
-
Large language models (LLMs) are being increasingly deployed as part of pipelines that repeatedly process or generate data of some sort. However, a common barrier to deployment are the frequent and often unpredictable errors that plague LLMs. Acknowledging the inevitability of these errors, we proposedata quality assertionsto identify when LLMs may be making mistakes. We present spade, a method for automatically synthesizing data quality assertions that identify bad LLM outputs. We make the observation that developers often identify data quality issues during prototyping prior to deployment, and attempt to address them by adding instructions to the LLM prompt over time. spade therefore analyzes histories of prompt versions over time to create candidate assertion functions and then selects a minimal set that fulfills both coverage and accuracy requirements. In testing across nine different real-world LLM pipelines, spade efficiently reduces the number of assertions by 14% and decreases false failures by 21% when compared to simpler baselines. spade has been deployed as an offering within LangSmith, LangChain's LLM pipeline hub, and has been used to generate data quality assertions for over 2000 pipelines across a spectrum of industries.more » « less
-
Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development of ML applications, but sustaining these applications post-deployment is difficult due to lack of real-time feedback (i.e., labels) for predictions and silent failures that could occur at any component of the ML pipeline (e.g., data distribution shift or anomalous features). We propose a new type of data management system that offers end-to-end observability , or visibility into complex system behavior, for deployed ML pipelines through assisted (1) detection, (2) diagnosis, and (3) reaction to ML-related bugs. We describe new research challenges and suggest preliminary solution ideas in all three aspects. Finally, we introduce an example architecture for a "bolt-on" ML observability system, or one that wraps around existing tools in the stack.more » « less
-
Dataframes have become universally popular as a means to represent data in various stages of structure, and manipulate it using a rich set of operators---thereby becoming an essential tool in the data scientists' toolbox. However, dataframe systems, such as pandas, scale poorly---and are non-interactive on moderate to large datasets. We discuss our experiences developing Modin, our first cut at a parallel dataframe system, which already has users across several industries and over 1M downloads. Modin translates pandas functions into a core set of operators that are individually parallelized via columnar, row-wise, or cell-wise decomposition rules that we formalize in this paper. We also introduce metadata independence to allow metadata---such as order and type---to be decoupled from the physical representation and maintained lazily. Using rule-based decomposition and metadata independence, along with careful engineering, Modin is able to support pandas operations across both rows and columns on very large dataframes---unlike Koalas and Dask DataFrames that either break down or are unable to support such operations, while also being much faster than pandas.more » « less
-
Exploratory data science largely happens in computational notebooks with dataframe APIs, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in dataframe workflows. When users print a dataframe in their notebooks, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggests promising analysis directions. Lux features a high-level language for generating visualizations on demand to encourage rapid visual experimentation with data. We demonstrate that through the use of a careful design and three system optimizations, Lux adds no more than two seconds of overhead on top of pandas for over 98% of datasets in the UCI repository. We evaluate Lux in terms of usability via interviews with early adopters, finding that Lux helps fulfill the needs of data scientists for visualization support within their dataframe workflows. Lux has already been embraced by data science practitioners, with over 3.1k stars on Github.more » « less
An official website of the United States government
