skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction
Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the high acquisition costs for collecting PWUD-focused substance use data. Thus, there has been a paucity of machine learning models for accurately predicting short-term substance use behaviors of PWUDs. In this paper, using longitudinal survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we design a novel GAN that deals with high-dimensional low-sample-size tabular data and survey skip logic to augment existing data to improve classification models' prediction on (A) whether the PWUDs would increase usage and (B) at which ordinal frequency they would use a particular drug within the next 12 months. Our evaluation results show that, when trained on augmented data from our proposed GAN, the classification models improve their predictive performance (AUROC) by up to 13.4% in Problem (A) and 15.8% in Problem (B) for usage of marijuana, meth, amphetamines, and cocaine, which outperform state-of-the-art generative models.  more » « less
Award ID(s):
2414554 2302999
PAR ID:
10632702
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
International Joint Conferences on Artificial Intelligence Organization
Date Published:
ISBN:
978-1-956792-04-1
Page Range / eLocation ID:
7474 to 7482
Format(s):
Medium: X
Location:
Jeju, South Korea
Sponsoring Org:
National Science Foundation
More Like this
  1. De_Luca, Vincenzo (Ed.)
    BackgroundSubstance use induces large economic and societal costs in the U.S. Understanding the change in substance use behaviors of persons who use drugs (PWUDs) over time, therefore, is important in order to inform healthcare providers, policymakers, and other stakeholders toward more efficient allocation of limited resources to at-risk PWUDs. ObjectiveThis study examines the short-term (within a year) behavioral changes in substance use of PWUDs at the population and individual levels. Methods237 PWUDs in the Great Plains of the U.S. were recruited by our team. The sample provides us longitudinal survey data regarding their individual attributes, including drug use behaviors, at two separate time periods spanning 4-12 months. At the population level, we analyze our data quantitatively for 18 illicit drugs; then, at the individual level, we build interpretable machine learning logistic regression and decision tree models for identifying relevant attributes to predict, for a given PWUD, (i) which drug(s) they would likely use and (ii) which drug(s) they would likely increase usage within the next 12 months. All predictive models were evaluated by computing the (averaged) Area under the Receiver Operating Characteristic curve (AUROC) and Area under the Precision-Recall curve (AUPR) on multiple distinct sets of hold-out sample. ResultsAt the population level, the extent of usage change and the number of drugs exhibiting usage changes follow power-law distributions. At the individual level, AUROC’s of the models for the top-4 prevalent drugs (marijuana, methamphetamines, amphetamines, and cocaine) range 0.756-0.829 (+2.88-7.66% improvement with respect to baseline models using only current usage of the respective drugs as input) for (i) and 0.670-0.765 (+4.34-18.0%) for (ii). The corresponding AUPR’s of the said models range 0.729-0.947 (+2.49-13.6%) for (i) and 0.348-0.618 (+26.9-87.6%) for (ii). ConclusionThe observed qualitative changes in short-term substance usage and the trained predictive models for (i) and (ii) can potentially inform human decision-making toward efficient allocation of appropriate resources to PWUDs at highest risk. 
    more » « less
  2. ABSTRACT Machine learning models can greatly improve the search for strong gravitational lenses in imaging surveys by reducing the amount of human inspection required. In this work, we test the performance of supervised, semi-supervised, and unsupervised learning algorithms trained with the ResNetV2 neural network architecture on their ability to efficiently find strong gravitational lenses in the Deep Lens Survey (DLS). We use galaxy images from the survey, combined with simulated lensed sources, as labeled data in our training data sets. We find that models using semi-supervised learning along with data augmentations (transformations applied to an image during training, e.g. rotation) and Generative Adversarial Network (GAN) generated images yield the best performance. They offer 5 – 10 times better precision across all recall values compared to supervised algorithms. Applying the best performing models to the full 20 deg2 DLS survey, we find 3 Grade-A lens candidates within the top 17 image predictions from the model. This increases to 9 Grade-A and 13 Grade-B candidates when 1 per cent (∼2500 images) of the model predictions are visually inspected. This is ≳ 10 × the sky density of lens candidates compared to current shallower wide-area surveys (such as the Dark Energy Survey), indicating a trove of lenses awaiting discovery in upcoming deeper all-sky surveys. These results suggest that pipelines tasked with finding strong lens systems can be highly efficient, minimizing human effort. We additionally report spectroscopic confirmation of the lensing nature of two Grade-A candidates identified by our model, further validating our methods. 
    more » « less
  3. American Indian and Alaska Native (AI/AN) communities experience notable health disparities associated with substance use, including disproportionate rates of accidents/injuries, diabetes, liver disease, suicide, and substance use disorders. Effective treatments for substance use are needed to improve health equity for AI/ AN communities. However, an unfortunate history of unethical and stigmatizing research has engendered distrust and reluctance to participate in research among many Native communities. In recent years, researchers have made progress toward engaging in ethical health disparities research by using a community-based participatory research (CBPR) framework to work in close partnership with community members throughout the research process. In this methodological process paper, we discuss the collaborative development of a quantitative survey aimed at understanding risk and protective factors for substance use among a sample of tribal members residing on a rural AI reservation with numerous systems-level barriers to recovery and limited access to treatment. By using a CBPR approach and prioritizing trust and transparency with community partners and participants, we were able to successfully recruit our target sample and collect quality data from nearly 200 tribal members who self-identified as having a substance use problem. Strategies for enhancing buy-in and recruiting a community sample are discussed. 
    more » « less
  4. Traditional smart meters, which measure energy usage every 15 minutes or more and report it at least a few hours later, lack the granularity needed for real-time decision-making. To address this practical problem, we introduce a new method using generative adversarial networks (GAN) that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using convex optimization. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated historical energy data obtained from smart meters. The results demonstrate that the model can successfully create minute-by-minute temporally correlated profiles of power usage from 15-minute interval average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring and subsequently controlling such systems. 
    more » « less
  5. In many real-world classification applications such as fake news detection, the training data can be extremely imbalanced, which brings challenges to existing classifiers as the majority classes dominate the loss functions of classifiers. Oversampling techniques such as SMOTE are effective approaches to tackle the class imbalance problem by producing more synthetic minority samples. Despite their success, the majority of existing oversampling methods only consider local data distributions when generating minority samples, which can result in noisy minority samples that do not fit global data distributions or interleave with majority classes. Hence, in this paper, we study the class imbalance problem by simultaneously exploring local and global data information since: (i) the local data distribution could give detailed information for generating minority samples; and (ii) the global data distribution could provide guidance to avoid generating outliers or samples that interleave with majority classes. Specifically, we propose a novel framework GL-GAN, which leverages the SMOTE method to explore local distribution in a learned latent space and employs GAN to capture the global information, so that synthetic minority samples can be generated under even extremely imbalanced scenarios. Experimental results on diverse real data sets demonstrate the effectiveness of our GL-GAN framework in producing realistic and discriminative minority samples for improving the classification performance of various classifiers on imbalanced training data. Our code is available at https://github.com/wentao-repo/GL-GAN. 
    more » « less