One Rating to Rule Them All?: Evidence of Multidimensionality in Human Assessment of Topic Labeling Quality

Hosseiny Marani, Amin; Levine, Joshua; Baumer, Eric P.S.

doi:10.1145/3511808.3557410

Citation Details

One Rating to Rule Them All?: Evidence of Multidimensionality in Human Assessment of Topic Labeling Quality

Two general approaches are common for evaluating automatically generated labels in topic modeling: direct human assessment; or performance metrics that can be calculated without, but still correlate with, human assessment. However, both approaches implicitly assume that the quality of a topic label is single-dimensional. In contrast, this paper provides evidence that human assessments about the quality of topic labels consist of multiple latent dimensions. This evidence comes from human assessments of four simple labeling techniques. For each label, study participants responded to several items asking them to assess each label according to a variety of different criteria. Exploratory factor analysis shows that these human assessments of labeling quality have a two-factor latent structure. Subsequent analysis demonstrates that this multi-item, two-factor assessment can reveal nuances that would be missed using either a single-item human assessment of perceived label quality or established performance metrics. The paper concludes by sug- gesting future directions for the development of human-centered approaches to evaluating NLP and ML systems more broadly. more »

Award ID(s):: 1757787

PAR ID:: 10386523

Author(s) / Creator(s):: Hosseiny Marani, Amin; Levine, Joshua; Baumer, Eric P.S.

Date Published:: 2022-10-17

Journal Name:: Proceedings of the 31st ACM International Conference on Information and Knowledge Management

Page Range / eLocation ID:: 768 to 779

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3511808.3557410

More Like this