skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: From data to information: automating data science to explore the U.S. court system
The U.S. court system is the nation's arbiter of justice, tasked with the responsibility of ensuring equal protection under the law. But hurdles to information access obscure the inner workings of the system, preventing stakeholders - from legal scholars to journalists and members of the public - from understanding the state of justice in America at scale. There is an ongoing data access argument here: U.S. court records are public data and should be freely available. But open data arguments represent a half-measure; what we really need is open information. This distinction marks the difference between downloading a zip file containing a quarter-million case dockets and getting the real-time answer to a question like "Are pro se parties more or less likely to receive fee waivers?" To help bridge that gap, we introduce a novel platform and user experience that provides users with the tools necessary to explore data and drive analysis via natural language statements. Our approach leverages an ontology configuration that adds domain-relevant data semantics to database schemas to provide support for user guidance and for search and analysis without user-entered code or SQL. The system is embodied in a "natural-language notebook" user experience, and we apply this approach to the space of case docket data from the U.S. federal court system. Additionally, we provide detail on the collection, ingestion and processing of the dockets themselves, including early experiments in the use of language modeling for docket entry classification with an initial focus on motions.  more » « less
Award ID(s):
2033604
PAR ID:
10284586
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
Page Range / eLocation ID:
119 to 128
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We implemented a user-centered approach to the design of an artificial intelligence (AI) system that provides users with access to information about the workings of the United States federal court system regardless of their technical background. Presently, most of the records associated with the federal judiciary are provided through a federal system that does not support exploration aimed at discovering systematic patterns about court activities. In addition, many users lack the data analytical skills necessary to conduct their own analyses and convert data into information. We conducted interviews, observations, and surveys to uncover the needs of our users and discuss the development of an intuitive platform informed from these needs that makes it possible for legal scholars, lawyers, and journalists to discover answers to more advanced questions about the federal court system. We report on results from usability testing and discuss design implications for AI and law practitioners and researchers. 
    more » « less
  2. Public organizations, including institutions in the U.S. criminal justice (CJ) system, have been rapidly releasing information pertaining to COVID-19. Even CJ institutions typically reticent to share information, like private prisons, have released vital COVID-19 information. The boon of available pandemic-related data, however, is not without problems. Unclear conceptualizations, stakeholders’ influence on data collection and release, and a lack of experience creating public dashboards on health data are just a few of the issues plaguing CJ institutions surrounding releasing COVID-19 data. In this article, we detail issues that institutions in each arm of the CJ system face when releasing pandemic-related data. We conclude with a set of recommendations for researchers seeking to use the abundance of publicly available data on the effects of the pandemic. 
    more » « less
  3. Many publicly available datasets exist that can provide factual answers to a wide range of questions that benefit the public. Indeed, datasets created by governmental and nongovernmental organizations often have a mandate to share data with the public. However, these datasets are often underutilized by knowledge workers due to the cumbersome amount of expertise and embedded implicit information needed for everyday users to access, analyze, and utilize their information. To seek solutions to this problem, this paper discusses the design of an automated process for generating questions that provide insight into a dataset. Given a relational dataset, our prototype system architecture follows a five-step process from data extraction, cleaning, pre-processing, entity recognition using deep learning, and questions formulation. Through examples of our results, we show that the questions generated by our approach are similar and, in some cases, more accurate than the ones generated by an AI engine like ChatGPT, whose question outputs while more fluent, are often not true to the facts represented in the original data. We discuss key limitations of our approach and the work to be done to bring to life a fully generalized pipeline that can take any data set and automatically provide the user with factual questions that the data can answer. 
    more » « less
  4. Guimerà, Roger (Ed.)
    We study the U.S. Supreme Court dynamics by analyzing the temporal evolution of the underlying policy positions of the Supreme Court Justices as reflected by their actual voting data, using functional data analysis methods. The proposed fully flexible nonparametric method makes it possible to dissect the time-dynamics of policy positions at the level of individual Justices, as well as providing a comprehensive view of the ideology evolution over the history of Supreme Court since its establishment. In addition to quantifying individual Justice’s policy positions, we uncover average changes over time and also the major patterns of change over time. Additionally, our approach allows for representing highly complex dynamic trajectories by a few principal components which complements other models of analyzing and predicting court behavior. 
    more » « less
  5. Abstract The significance and influence of U.S. Supreme Court majority opinions derive in large part from opinions’ roles as precedents for future opinions. A growing body of literature seeks to understand what drives the use of opinions as precedents through the study of Supreme Court case citation patterns. We raise two limitations of existing work on Supreme Court citations. First, dyadic citations are typically aggregated to the case level before they are analyzed. Second, citations are treated as if they arise independently. We present a methodology for studying citations between Supreme Court opinions at the dyadic level, as a network, that overcomes these limitations. This methodology—the citation exponential random graph model, for which we provide user-friendly software—enables researchers to account for the effects of case characteristics and complex forms of network dependence in citation formation. We then analyze a network that includes all Supreme Court cases decided between 1950 and 2015. We find evidence for dependence processes, including reciprocity, transitivity, and popularity. The dependence effects are as substantively and statistically significant as the effects of exogenous covariates, indicating that models of Supreme Court citations should incorporate both the effects of case characteristics and the structure of past citations. 
    more » « less