NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Judicial self fashioning: Rhetorical performance in Supreme Court opinions

https://doi.org/10.1177/14614456241281142

Thalken, Rosamond Elizabeth; Mimno, David; Wilkens, Matthew (September 2024, Discourse Studies)

Justices on the United States Supreme Court use rhetorical strategies to maintain institutional legitimacy. In the court opinion, a strategy called the monologic voice presents a flattering depiction of the Court. The monologic voice occurs through two tones, the individualistic and collective, which respectively maintain the Justices’ legitimacy through critique and the Court’s legitimacy through unification. We train large language models to identify these rhetorical features in 15,291 modern Supreme Court opinions, issued between 1946 and 2022. While the fraction of collective and individualistic tones has been relatively consistent between 1946 and 2022, the Rehnquist Court used the collective tone at a higher rate than any other Court. In recent terms, 2021 and 2022, we find suggestions of another rhetorical shift, as all Associate Justices of the Roberts Court, excluding Chief Justice Roberts, used the individualistic tone at a historically high rate.
more » « less
Full Text Available
A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

https://doi.org/10.18653/v1/2024.naacl-long.179

Longpre, Shayne; Yauney, Gregory; Reif, Emily; Lee, Katherine; Roberts, Adam; Zoph, Barret; Zhou, Denny; Wei, Jason; Robinson, Kevin; Mimno, David; et al (January 2024, Association for Computational Linguistics)

Full Text Available
Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement

https://doi.org/10.18653/v1/2023.emnlp-main.575

Thalken, Rosamond; Stiglitz, Edward; Mimno, David; Wilkens, Matthew (January 2023, Association for Computational Linguistics)

Full Text Available
Tags, Borders, and Catalogs: Social Re-Working of Genre on LibraryThing

https://doi.org/10.1145/3449103

Antoniak, Maria; Walsh, Melanie; Mimno, David (April 2021, Proceedings of the ACM on Human-Computer Interaction)

Through a computational reading of the online book reviewing community LibraryThing, we examine the dynamics of a collaborative tagging system and learn how its users refine and redefine literary genres. LibraryThing tags are overlapping and multi-dimensional, created in a shared space by thousands of users, including readers, bookstore owners, and librarians. A common understanding of genre is that it relates to the content of books, but this resource allows us to view genre as an intersection of user communities and reader values and interests. We explore different methods of computational genre measurement within the open space of user-created tags. We measure overlap between books, tags, and users, and we also measure the homogeneity of communities associated with genre tags and correlate this homogeneity with reviewing behavior.Finally, by analyzing the text of reviews, we identify the thematic signatures of genres on LibraryThing, revealing similarities and differences between them. These measurements are intended to elucidate the genre conceptions of the users, not, as in prior work, to normalize the tags or enforce a hierarchy. We find that LibraryThing users make sense of genre through a variety of values and expectations, many of which fall outside common definitions and understandings of genre.
more » « less
Full Text Available
Comparing Text Representations: A Theory-Driven Approach

https://doi.org/10.18653/v1/2021.emnlp-main.449

Yauney, Gregory; Mimno, David (January 2021, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain this effect? We adapt general tools from computational learning theory to fit the specific characteristics of text datasets and present a method to evaluate the compatibility between representations and tasks. Even though many tasks can be easily solved with simple bag-of-words (BOW) representations, BOW does poorly on hard natural language inference tasks. For one such task we find that BOW cannot distinguish between real and randomized labelings, while pre-trained MLM representations show 72x greater distinction between real and random labelings than BOW. This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task, enabling comparisons between representations without requiring empirical evaluations that may be sensitive to initializations and hyperparameters. The method provides a fresh perspective on the patterns in a dataset and the alignment of those patterns with specific labels.
more » « less
Full Text Available
Bad Seeds: Evaluating Lexical Methods for Bias Measurement

https://doi.org/10.18653/v1/2021.acl-long.148

Antoniak, Maria; Mimno, David (January 2021, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers))

A common factor in bias measurement methods is the use of hand-curated seed lexicons, but there remains little guidance for their selection. We gather seeds used in prior work, documenting their common sources and rationales, and in case studies of three English-language corpora, we enumerate the different types of social biases and linguistic features that, once encoded in the seeds, can affect subsequent bias measurements. Seeds developed in one context are often re-used in other contexts, but documentation and evaluation remain necessary precursors to relying on seeds for sensitive measurements.
more » « less
Full Text Available
Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus

https://doi.org/10.22148/001c.13680

Storey, Grant; Mimno, David (July 2020, Journal of Cultural Analytics)
null (Ed.)
One commonly recognized feature of the Ancient Greek corpus is that later texts frequently imitate and allude to model texts from earlier time periods, but analysis of this phenomenon is mostly done for specific author pairs based on close reading and highly visible instances of imitation. In this work, we use computational techniques to examine the similarity of a wide range of Ancient Greek authors, with a focus on similarity between authors writing many centuries apart. We represent texts and authors based on their usage of high-frequency words to capture author signatures rather than document topics and measure similarity using Jensen- Shannon Divergence. We then analyze author similarity across centuries, finding high similarity between specific authors and across the corpus that is not common to all languages.
more » « less
Full Text Available
Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents

https://doi.org/10.18653/v1/2020.emnlp-main.160

Yauney, Gregory; Hessel, Jack; Mimno, David (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations. Such granular annotation is rare, expensive, and unavailable in most domain-specific contexts. In contrast, unlabeled multi-image, multi-sentence documents are abundant. Can lexical grounding be learned from such documents, even though they have significant lexical and visual overlap? Working with a case study dataset of real estate listings, we demonstrate the challenge of distinguishing highly correlated grounded terms, such as “kitchen” and “bedroom”, and introduce metrics to assess this document similarity. We present a simple unsupervised clustering-based method that increases precision and recall beyond object detection and image tagging baselines when evaluated on labeled subsets of the dataset. The proposed method is particularly effective for local contextual meanings of a word, for example associating “granite” with countertops in the real estate dataset and with rocky landscapes in a Wikipedia dataset.
more » « less
Full Text Available
Combatting The Challenges of Local Privacy for Distributional Semantics with Compression

Schofield, Alexandra; Yauney, Gregory; Mimno, David (December 2019, PriML workshop at NeurIPS)

Traditional methods for adding locally private noise to bag-of-words features overwhelm the true signal in the text data, removing the properties of sparsity and non-negativity often relied upon by distributional semantic models. We argue the formulation of limited-precision local privacy, which guarantees privacy between documents of less than a user-specified maximum distance, is a more appropriate framework for bag-of-words features. To reduce the number of features to which we must add random noise, we also compress word features before adding noise, then decompress those features before model inference. We test randomized methods of aggregation as well as methods informed by distributional properties of words. Applying LDA and LSA to synthetic and real data, we show that these approaches produce distributional models closer to those in the original data.
more » « less
Full Text Available
Narrative Paths and Negotiation of Power in Birth Stories

https://doi.org/10.1145/3359190

Antoniak, Maria; Mimno, David; Levy, Karen (November 2019, Proceedings of the ACM on Human-Computer Interaction)

Full Text Available

« Prev Next »

Search for: All records