skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Joint Learning Approach for Semi-supervised Neural Topic Modeling
Topic models are some of the most popular ways to represent textual data in an interpret- able manner. Recently, advances in deep gen- erative models, specifically auto-encoding vari- ational Bayes (AEVB), have led to the intro- duction of unsupervised neural topic models, which leverage deep generative models as op- posed to traditional statistics-based topic mod- els. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi- supervised neural topic model. We find that LI- NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative la- bels; furthermore, our jointly learned classi- fier outperforms baseline classifiers in ablation studies.  more » « less
Award ID(s):
1750358
PAR ID:
10398690
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Sixth Workshop on Structured Prediction for NLP
Page Range / eLocation ID:
40 to 51
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Rationale: Nontuberculous mycobacteria (NTM) are ubiquitous environmental bacteria that may cause chronic lung disease and are one of the most difficult-to-treat infections among persons with cystic fibrosis (pwCF). Environmental factors likely contribute to increased NTM densities, with higher potential for exposure and infection. Objective: To identify water-quality constituents that influence odds of NTM infection among pwCF in Colorado. Methods: We conducted a population-based nested case–control study using patient data from the Colorado CF Center NTM database. We associated data from pwCF and water-quality data extracted from the Water Quality Portal to estimate odds of NTM infection. Using Bayesian generalized linear models with binomial-distributed discrete responses, we modeled three separate outcomes; any NTM infection, infections due to Mycobacterium avium complex species, and infections due to M. abscessus group species. Results: We observed a consistent association with molybdenum in the source water and M. abscessus group species infection among pwCF in all models. For every 1-unit increase in the log concentration of molybdenum in surface water, the odds of infection for those with M. abscessus group species compared to those who were NTM culture-negative increased by 79%. The odds of M. abscessus group infection varied by county; the counties with the highest probability of infection are located along the major rivers. Conclusions: We have identified molybdenum in the source water as the most predictive factor of M. abscessus group infection among pwCF in Colorado. This finding will help inform patients at risk for NTM of their relative risks in residing within specific regions. 
    more » « less
  2. Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments. 
    more » « less
  3. Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments. 
    more » « less
  4. Deep learning models have been used in creating various effective image classification applications. However, they are vulnerable to adversarial attacks that seek to misguide the models into predicting incorrect classes. Our study of major adversarial attack models shows that they all specifically target and exploit the neural networking structures in their designs. This understanding led us to develop a hypothesis that most classical machine learning models, such as random forest (RF), are immune to adversarial attack models because they do not rely on neural network design at all. Our experimental study of classical machine learning models against popular adversarial attacks supports this hypothesis. Based on this hypothesis, we propose a new adversarial-aware deep learning system by using a classical machine learning model as the secondary verification system to complement the primary deep learning model in image classification. Although the secondary classical machine learning model has less accurate output, it is only used for verification purposes, which does not impact the output accuracy of the primary deep learning model, and, at the same time, can effectively detect an adversarial attack when a clear mismatch occurs. Our experiments based on the CIFAR-100 dataset show that our proposed approach outperforms current state-of-the-art adversarial defense systems. 
    more » « less
  5. Semrau, Jeremy D. (Ed.)
    ABSTRACT Nontuberculous mycobacteria (NTM) are opportunistic pathogens that cause chronic pulmonary disease (PD). NTM infections are thought to be acquired from the environment; however, the basal environmental factors that drive and sustain NTM prevalence are not well understood. The highest prevalence of NTM PD cases in the United States is reported from Hawai’i, which is unique in its climate and soil composition, providing an opportunity to investigate the environmental drivers of NTM prevalence. We used microbiological sampling and spatial logistic regression complemented with fine-scale soil mineralogy to model the probability of NTM presence across the natural landscape of Hawai’i. Over 7 years, we collected and microbiologically cultured 771 samples from 422 geographic sites in natural areas across the Hawaiian Islands for the presence of NTM. NTM were detected in 210 of these samples (27%), with Mycobacterium abscessus being the most frequently isolated species. The probability of NTM presence was highest in expansive soils (those that swell with water) with a high water balance (>1-m difference between rainfall and evapotranspiration) and rich in Fe-oxides/hydroxides. We observed a positive association between NTM presence and iron in wet soils, supporting past studies, but no such association in dry soils. High soil-water balance may facilitate underground movement of NTM into the aquifer system, potentially compounded by expansive capabilities allowing crack formation under drought conditions, representing further possible avenues for aquifer infiltration. These results suggest both precipitation and soil properties are mechanisms by which surface NTM may reach the human water supply. IMPORTANCE Nontuberculous mycobacteria (NTM) are ubiquitous in the environment, being found commonly in soils and natural bodies of freshwater. However, little is known about the environmental niches of NTM and how they relate to NTM prevalence in homes and other human-dominated areas. To characterize NTM environmental associations, we collected and cultured 771 samples from 422 geographic sites in natural areas across Hawai’i, the U.S. state with the highest prevalence of NTM pulmonary disease. We show that the environmental niches of NTM are most associated with highly expansive, moist soils containing high levels of iron oxides/hydroxides. Understanding the factors associated with NTM presence in the natural environment will be crucial for identifying potential mechanisms and risk factors associated with NTM infiltration into water supplies, which are ultimately piped into homes where most exposure risk is thought to occur. 
    more » « less