skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using artificial intelligence to identify administrative errors in unemployment insurance
Administrative errors in unemployment insurance (UI) decisions give rise to a public values conflict between efficiency and efficacy. We analyze whether artificial intelligence (AI) – in particular, methods in machine learning (ML) – can be used to detect administrative errors in UI claims decisions, both in terms of accuracy and normative tradeoffs. We use 16 years of US Department of Labor audit and policy data on UI claims to analyze the accuracy of 7 different random forest and deep learning models. We further test weighting schemas and synthetic data approaches to correcting imbalances in the training data. A random forest model using gradient descent boosting is more accurate, along several measures, and preferable in terms of public values, than every deep learning model tested. Adjusting model weights produces significant recall improvements for low-n outcomes, at the expense of precision. Synthetic data produces attenuated improvements and drawbacks relative to weights.  more » « less
Award ID(s):
2047224
PAR ID:
10544029
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Government Information Quarterly
Volume:
39
Issue:
4
ISSN:
0740-624X
Page Range / eLocation ID:
101758
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Guyon, Isabelle (Ed.)
    As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like random forests have an established track record of off-the-shelf success and even offer various strategies for analyzing the underlying relationships among variables. Here, motivated by recent insights into random forest behavior, we introduce the simple idea of augmented bagging (AugBagg), a procedure that operates in an identical fashion to classical bagging and random forests, but which operates on a larger, augmented space containing additional randomly generated noise features. Surprisingly, we demonstrate that this simple act of including extra noise variables in the model can lead to dramatic improvements in out-of-sample predictive accuracy, sometimes outperforming even an optimally tuned traditional random forest. As a result, intuitive notions of variable importance based on improved model accuracy may be deeply flawed, as even purely random noise can routinely register as statistically significant. Numerous demonstrations on both real and synthetic data are provided along with a proposed solution. 
    more » « less
  2. Although the application of deep learning to automatic speech recognition (ASR) has resulted in dramatic reductions in word error rate for languages with abundant training data, ASR for languages with few resources has yet to benefit from deep learning to the same extent. In this paper, we investigate various methods of acoustic modeling and data augmentation with the goal of improving the accuracy of a deep learning ASR framework for a low-resource language with a high baseline word error rate. We compare several methods of generating synthetic acoustic training data via voice transformation and signal distortion, and we explore several strategies for integrating this data into the acoustic training pipeline. We evaluate our methods on an indigenous language of North America with minimal training resources. We show that training initially via transfer learning from an existing high-resource language acoustic model, refining weights using a heavily concentrated synthetic dataset, and finally fine-tuning to the target language using limited synthetic data reduces WER by 15% over just transfer learning using deep recurrent methods. Further, we show improvements over traditional frameworks by 19% using a similar multistage training with deep convolutional approaches. 
    more » « less
  3. null (Ed.)
    Intertemporal choices involve assessing options with different reward amounts available at different time delays. The similarity approach to intertemporal choice focuses on judging how similar amounts and delays are. Yet we do not fully understand the cognitive process of how these judgments are made. Here, we use machine-learning algorithms to predict similarity judgments to (1) investigate which algorithms best predict these judgments, (2) assess which predictors are most useful in predicting participants’ judgments, and (3) determine the minimum number of judgments required to accurately predict future judgments. We applied eight algorithms to similarity judgments for reward amount and time delay made by participants in two data sets. We found that neural network, random forest, and support vector machine algorithms generated the highest out-of-sample accuracy. Though neural networks and support vector machines offer little clarity in terms of a possible process for making similarity judgments, random forest algorithms generate decision trees that can mimic the cognitive computations of human judgment making. We also found that the numerical difference between amount values or delay values was the most important predictor of these judgments, replicating previous work. Finally, the best performing algorithms such as random forest can make highly accurate predictions of judgments with relatively small sample sizes (~ 15), which will help minimize the numbers of judgments required to extrapolate to new value pairs. In summary, machine-learning algorithms provide both theoretical improvements to our understanding of the cognitive computations involved in similarity judgments and intertemporal choices as well as practical improvements in designing better ways of collecting data. 
    more » « less
  4. High-flow events that significantly impact Water Resource Recovery Facility (WRRF) operations are rare, but accurately predicting these flows could improve treatment operations. Data-driven modeling approaches could be used; however, high flow events that impact operation are an infrequent occurrence, providing limited data from which to learn meaningful patterns. The performance of a statistical model (logistic regression) and two machine learning (ML) models (support vector machine and random forest) were evaluated to predict high flow events one-day-ahead to two plants located in different parts of the United States, Northern Virginia and the Gulf Coast of Texas, with combined and separate sewers, respectively. We compared baseline models (no synthetic data added) to models trained with synthetic data added from two different sampling techniques (SMOTE and ADASYN) that increased the representation of rare events in the training data. Both techniques enhanced the sample size of the very high-flow class, but ADASYN, which focused on generating synthetic samples near decision boundaries, led to greater improvements in model performance (reduced misclassification rates). Random forest combined with ADASYN achieved the best overall performance for both plants, demonstrating its robustness in identifying one-day-ahead extreme flow events to treatment plants. These results suggest that combining sampling techniques with ML has the potential to significantly improve the modeling of high-flow events at treatment plants. Our work will prove useful in building reliable predictive models that can inform management decisions needed for the better control of treatment operations. 
    more » « less
  5. A deep neural network (DNN)-based adaptive controller with a real-time and concurrent learning (CL)-based adaptive update law is developed for a class of uncertain, nonlinear dynamic systems. The DNN in the control law is used to approximate the uncertain nonlinear dynamic model. The inner-layer weights of the DNN are updated offline using data collected in real-time; whereas, the output-layer DNN weights are updated online (i.e., in real-time) using the Lyapunov- and CL-based adaptation law. Specifically, the inner-layer weights of the DNN are trained offline (concurrent to real-time execution) after a sufficient amount of data is collected in real-time to improve the performance of the system, and after training is completed the inner-layer DNN weights are updated in batch-updates. The key development in this work is that the output-layer DNN update law is augmented with CL-based terms to ensure that the output-layer DNN weight estimates converge to within a ball of their optimal values. A Lyapunov-based stability analysis is performed to ensure semi-global exponential convergence to an ultimate bound for the trajectory tracking errors and the output-layer DNN weight estimation errors. 
    more » « less