Abstract Neural topic modeling is a scalable automated technique for text data mining. In various downstream tasks of topic modeling, it is preferred that the discovered topics well align with labels. However, due to the lack of guidance from labels, unsupervised neural topic models are less powerful in this situation. Existing supervised neural topic models often adopt a label-free prior to generate the latent document-topic distributions and use them to predict the labels and thus achieve label-topic alignment indirectly. Such a mechanism faces the following issues: 1) The label-free prior leads to topics blending the latent patterns of multiple labels; and 2) One is unable to intuitively identify the explicit relationships between labels and the discovered topics. To tackle these problems, we develop a novel supervised neural topic model which utilizes a chain-structured graphical model with a label-conditioned prior. Soft indicators are introduced to explicitly construct the label-topic relationships. To obtain well-organized label-topic relationships, we formalize an entropy-regularized optimal transport problem on the embedding space and model them as the transport plan. Moreover, our proposed method can be flexibly integrated with most existing unsupervised neural topic models. Experimental results on multiple datasets demonstrate that our model can greatly enhance the alignment between labels and topics while maintaining good topic quality.
more »
« less
A Joint Learning Approach for Semi-supervised Neural Topic Modeling
Topic models are some of the most popular ways to represent textual data in an interpret- able manner. Recently, advances in deep gen- erative models, specifically auto-encoding vari- ational Bayes (AEVB), have led to the intro- duction of unsupervised neural topic models, which leverage deep generative models as op- posed to traditional statistics-based topic mod- els. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi- supervised neural topic model. We find that LI- NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative la- bels; furthermore, our jointly learned classi- fier outperforms baseline classifiers in ablation studies.
more »
« less
- Award ID(s):
- 1750358
- PAR ID:
- 10398690
- Date Published:
- Journal Name:
- Proceedings of the Sixth Workshop on Structured Prediction for NLP
- Page Range / eLocation ID:
- 40 to 51
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Rationale: Nontuberculous mycobacteria (NTM) are ubiquitous environmental bacteria that may cause chronic lung disease and are one of the most difficult-to-treat infections among persons with cystic fibrosis (pwCF). Environmental factors likely contribute to increased NTM densities, with higher potential for exposure and infection. Objective: To identify water-quality constituents that influence odds of NTM infection among pwCF in Colorado. Methods: We conducted a population-based nested case–control study using patient data from the Colorado CF Center NTM database. We associated data from pwCF and water-quality data extracted from the Water Quality Portal to estimate odds of NTM infection. Using Bayesian generalized linear models with binomial-distributed discrete responses, we modeled three separate outcomes; any NTM infection, infections due to Mycobacterium avium complex species, and infections due to M. abscessus group species. Results: We observed a consistent association with molybdenum in the source water and M. abscessus group species infection among pwCF in all models. For every 1-unit increase in the log concentration of molybdenum in surface water, the odds of infection for those with M. abscessus group species compared to those who were NTM culture-negative increased by 79%. The odds of M. abscessus group infection varied by county; the counties with the highest probability of infection are located along the major rivers. Conclusions: We have identified molybdenum in the source water as the most predictive factor of M. abscessus group infection among pwCF in Colorado. This finding will help inform patients at risk for NTM of their relative risks in residing within specific regions.more » « less
-
Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.more » « less
-
Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.more » « less
-
Semrau, Jeremy D. (Ed.)ABSTRACT Nontuberculous mycobacteria (NTM) are opportunistic pathogens that cause chronic pulmonary disease (PD). NTM infections are thought to be acquired from the environment; however, the basal environmental factors that drive and sustain NTM prevalence are not well understood. The highest prevalence of NTM PD cases in the United States is reported from Hawai’i, which is unique in its climate and soil composition, providing an opportunity to investigate the environmental drivers of NTM prevalence. We used microbiological sampling and spatial logistic regression complemented with fine-scale soil mineralogy to model the probability of NTM presence across the natural landscape of Hawai’i. Over 7 years, we collected and microbiologically cultured 771 samples from 422 geographic sites in natural areas across the Hawaiian Islands for the presence of NTM. NTM were detected in 210 of these samples (27%), with Mycobacterium abscessus being the most frequently isolated species. The probability of NTM presence was highest in expansive soils (those that swell with water) with a high water balance (>1-m difference between rainfall and evapotranspiration) and rich in Fe-oxides/hydroxides. We observed a positive association between NTM presence and iron in wet soils, supporting past studies, but no such association in dry soils. High soil-water balance may facilitate underground movement of NTM into the aquifer system, potentially compounded by expansive capabilities allowing crack formation under drought conditions, representing further possible avenues for aquifer infiltration. These results suggest both precipitation and soil properties are mechanisms by which surface NTM may reach the human water supply. IMPORTANCE Nontuberculous mycobacteria (NTM) are ubiquitous in the environment, being found commonly in soils and natural bodies of freshwater. However, little is known about the environmental niches of NTM and how they relate to NTM prevalence in homes and other human-dominated areas. To characterize NTM environmental associations, we collected and cultured 771 samples from 422 geographic sites in natural areas across Hawai’i, the U.S. state with the highest prevalence of NTM pulmonary disease. We show that the environmental niches of NTM are most associated with highly expansive, moist soils containing high levels of iron oxides/hydroxides. Understanding the factors associated with NTM presence in the natural environment will be crucial for identifying potential mechanisms and risk factors associated with NTM infiltration into water supplies, which are ultimately piped into homes where most exposure risk is thought to occur.more » « less
An official website of the United States government

