Compositional generalization, the ability of intelligent models to extrapolate understanding of components to novel compositions, is a fundamental yet challenging facet in AI research, especially within multimodal environments. In this work, we address this challenge by exploiting the syntactic structure of language to boost compositional generalization. This paper elevates the importance of syntactic grounding, particularly through attention masking techniques derived from text input parsing. We introduce and evaluate the merits of using syntactic information in the multimodal grounding problem. Our results on grounded compositional generalization underscore the positive impact of dependency parsing across diverse tasks when utilized with Weight Sharing across the Transformer encoder. The results push the state-of-the-art in multimodal grounding and parameter-efficient modeling and provide insights for future research.
more »
« less
Syntactic measurement of governance networks from textual data, with application to water management plans
Abstract This paper demonstrates an automated workflow for extracting network data from policy documents. We use natural language processing tools, part‐of‐speech tagging, and syntactic dependency parsing, to represent relationships between real‐world entities based on how they are described in text. Using a corpus of regional groundwater management plans, we demonstrate unique graph motifs created through parsing syntactic relationships and how document‐level syntax can be aggregated to develop large‐scale graphs. This approach complements and extends existing methods in public management and governance research by (1) expanding the feasible geographic and temporal scope of data collection and (2) allowing for customized representations of governance systems to fit different research applications, particularly by creating graphs with many different node and edge types. We conclude by reflecting on the challenges, limitations, and future directions of automated, text‐based methods for governance research.
more »
« less
- Award ID(s):
- 2205239
- PAR ID:
- 10530476
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Policy Studies Journal
- ISSN:
- 0190-292X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a particular gender, but models for part-of-speech tagging and dependency parsing have still not adapted to account for these differences. To address this, we annotate the Wall Street Journal part of the Penn Treebank with the gender information of the articles' authors, and build taggers and parsers trained on this data that show performance differences in text written by men and women. Further analyses reveal numerous part-of-speech tags and syntactic relations whose prediction performances benefit from the prevalence of a specific gender in the training data. The results underscore the importance of accounting for gendered differences in syntactic tasks, and outline future venues for developing more accurate taggers and parsers. We release our data to the research community.more » « less
-
Existing automated code checking methods/tools are unable to automatically analyze and represent all types of requirements (e.g., requirements that are too complex or that require human judgement). Recent efforts in the area of augmented data analytics have proposed the use of templates to facilitate the analysis of text. However, most of these efforts have constructed such templates manually, which is labor-intensive. More importantly, it is difficult for manually-developed templates to capture the linguistic variations in building codes. More research is, thus, needed to automate the generation of templates to support the tagging and extraction of information from building codes. To address this need, this paper proposes an unsupervised machine-learning based method to extract sentence templates that describe syntactic and semantic features and patterns from building codes. The proposed method is composed of four main steps: (1) data preprocessing; (2) identifying the different groups of sentence fragments using clustering; (3) identifying the fixed parts and the slots in the templates based on the syntactic and semantic patterns of the sentence fragment groups; and (4) evaluating the extracted templates. The proposed method was implemented and tested on a corpus of text from the International Building Code. An accuracy of 0.76 was achieved.more » « less
-
Representing discourse as argument graphs facilitates robust analysis. Although computational frameworks for constructing graphs from monologues exist, there is a lack of frameworks for parsing dialogue. Inference Anchoring Theory (IAT) is a theoretical framework for extracting graphical argument structures and relationships from dialogues. Here, we introduce computational models for implementing the IAT framework for parsing dialogues. We experiment with a classification-based biaffine parser and Large Language Model (LLM)-based generative methods and compare them. Our results demonstrate the utility of finetuning LLMs for constructing IAT-based argument graphs from dialogues, which is a nuanced task.more » « less
-
Abstract The Institutional Grammar (IG) is a rigorous tool for analyzing the laws and policies governing nonprofit organizations; however, its use was limited due to the time-consuming nature of hand-coding. We introduce an advance in Natural Language Processing using a semantic role labeling (SRL) classifier that reliably codes rules governing and guiding nonprofit organizations. This paper provides guidance for how to hand-code using the IG, preprocess text for machine learning, and demonstrates the SRL classifier for automated IG coding. We then compare the hand-coding to the SRL coding to demonstrate its accuracy. The advances in machine learning now make it feasible to utilize the IG for nonprofit research questions focused on inter-organizational collaborations, government contracts, federated nonprofit organizational compliance, and nonprofit governance, among others. An added benefit is that the IG is adaptable for different languages, thus enabling cross-national comparative research. By providing examples throughout the paper, we demonstrate how to use the IG and the SRL classifier to address research questions of interest to nonprofit scholars.more » « less
An official website of the United States government

