skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Review of Stability in Topic Modeling: Metrics for Assessing and Techniques for Improving Stability
Topic modeling includes a variety of machine learning techniques for identifying latent themes in a corpus of documents. Generating an exact solution (i.e., finding global optimum) is often computationally intractable. Various optimization techniques (e.g., Variational Bayes or Gibbs Sampling) are employed to generate topic solutions approximately by finding local optima. Such an approximation often begins with a random initialization, which leads to different results with different initializations. The term “stability” refers to a topic model’s ability to produce solutions that are partially or completely identical across multiple runs with different random initializations. Although a variety of work has been done analyzing, measuring, or improving stability, no single paper has provided a thorough review of different stability metrics nor of various techniques that improved the stability of a topic model. This paper fills that gap and provides a systematic review of different approaches to measure stability and of various techniques that are intended to improve stability. It also describes differences and similarities between stability measures and other metrics (e.g., generality, coherence). Finally, the paper discusses the importance of analyzing both stability and quality metrics to assess and to compare topic models.  more » « less
Award ID(s):
1814909
PAR ID:
10464095
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM Computing Surveys
ISSN:
0360-0300
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Environmental justice research has shown that people's experiences and perceptions of water differ because systematic inequalities shape the extent to which people access clean water and are exposed to water hazards. Q‐methodology is one technique that has been used to aggregate multifaceted subjective narratives and understand different perspectives on a topic. In this paper, we systematically review 77 case study articles applying Q‐methodology to water‐related topics, to inventory how people perceive their relationships with water. We create a classification system based on environmental justice theory to examine (1) distributive justice issues around alternative water sources and agricultural and urban water scarcity, (2) procedural justice issues around Integrated Water Resource Management (IWRM) and trust; and (3) recognition justice issues regarding misrecognition, underrecognition, and the intersectionality of the environmental justice principles. Notably, only eight articles in our dataset found just two factors on a topic, with most finding three or more factors, suggesting that most audiences are not polarized or opposed in a binary sense but range along a spectrum of perspectives on water issues. This finding means water conflicts are complex, but also that people may share core water values on disputed topics. Learning from people from various backgrounds can provide an understanding of the different relationships people have with water, which can help water managers predict where conflicts may occur, empathize with minority viewpoints, and innovate water solutions that could be used to advance environmental justice goals. 
    more » « less
  2. Abstract Empirical diagnosis of stability has received considerable attention, often focused on variance metrics for early warning signals of abrupt system change or delicate techniques measuring Lyapunov spectra. The theoretical foundation for the popular early warning signal approach has been limited to relatively simple system changes such as bifurcating fixed points where variability is extrinsic to the steady state. We offer a novel measurement of stability that applies in wide ranging systems that contain variability in both internal steady state dynamics and in response to external perturbations. Utilizing connections between stability, dissipation, and phase space flow, we show that stability correlates with temporal asymmetry in a measure of phase space flow contraction. Our method is general as it reveals stability variation independent of assumptions about the nature of system variability or attractor shape. After showing efficacy in a variety of model systems, we apply our technique for measuring stability to monthly returns of the S&P 500 index in the time periods surrounding the global stock market crash of October 1987. Market stability is shown to be higher in the several years preceding and subsequent to the 1987 market crash. We anticipate our technique will have wide applicability in climate, ecological, financial, and social systems where stability is a pressing concern. 
    more » « less
  3. Two general approaches are common for evaluating automatically generated labels in topic modeling: direct human assessment; or performance metrics that can be calculated without, but still correlate with, human assessment. However, both approaches implicitly assume that the quality of a topic label is single-dimensional. In contrast, this paper provides evidence that human assessments about the quality of topic labels consist of multiple latent dimensions. This evidence comes from human assessments of four simple labeling techniques. For each label, study participants responded to several items asking them to assess each label according to a variety of different criteria. Exploratory factor analysis shows that these human assessments of labeling quality have a two-factor latent structure. Subsequent analysis demonstrates that this multi-item, two-factor assessment can reveal nuances that would be missed using either a single-item human assessment of perceived label quality or established performance metrics. The paper concludes by sug- gesting future directions for the development of human-centered approaches to evaluating NLP and ML systems more broadly. 
    more » « less
  4. In interactive IR (IIR), users often seek to achieve different goals (e.g. exploring a new topic, finding a specific known item) at different search iterations and thus may evaluate system performances differently. Without state-aware approach, it would be extremely difficult to simulate and achieve real-time adaptive search evaluation and recommendation. To address this gap, our work identifies users' task states from interactive search sessions and meta-evaluates a series of online and offline evaluation metrics under varying states based on a user study dataset consisting of 1548 unique query segments from 450 search sessions. Our results indicate that: 1) users' individual task states can be identified and predicted from search behaviors and implicit feedback; 2) the effectiveness of mainstream evaluation measures (measured based upon their respective correlations with user satisfaction) vary significantly across task states. This study demonstrates the implicit heterogeneity in user-oriented IR evaluation and connects studies on complex search tasks with evaluation techniques. It also informs future research on the design of state-specific, adaptive user models and evaluation metrics. 
    more » « less
  5. Assessing the stability of biological system models has aided in uncovering a plethora of new insights in genetics, neuroscience, and medicine. In this paper, we focus on analyzing the stability of neurological signals, including electroencephalogram (EEG) signals. Interestingly, spatiotemporal discrete-time linear fractional-order systems (DTLFOS) have been shown to accurately and efficiently represent a variety of neurological and physiological signals. Here, we leverage the conditions for stability of DTLFOS to assess a real-world EEG data set. By analyzing the stability of EEG signals during movement and rest tasks, we provide evidence of the usefulness of the quantification of stability as a bio-marker for cognitive motor control. 
    more » « less