Many migrants are vulnerable due to noncitizenship, linguistic or cultural barriers, and inadequate safety-net infrastructures. Immigrant-oriented nonprofits can play an important role in improving immigrant well-being. However, progress on systematically evaluating the impact of nonprofits has been hampered by the difficulty in efficiently and accurately identifying immigrant-oriented nonprofits in large administrative data sets. We tackle this challenge by employing natural language processing (NLP) and machine learning (ML) techniques. Seven NLP algorithms are applied and trained in supervised ML models. The bidirectional encoder representations from transformers (BERT) technique offers the best performance, with an impressive accuracy of .89. Indeed, the model outperformed two nonmachine methods used in existing research, namely, identification of organizations via National Taxonomy of Exempt Entities codes or keyword searches of nonprofit names. We thus demonstrate the viability of computer-based identification of hard-to-identify nonprofits using organizational name data, a technique that may be applicable to other research requiring categorization based on short labels. We also highlight limitations and areas for improvement.
more »
« less
This content will become publicly available on September 11, 2026
Advancing Text Analysis for Nonprofit Research: Using Semantic Role Labeling to Automate Institutional Grammar Coding of Nonprofit Laws and Policies
Abstract The Institutional Grammar (IG) is a rigorous tool for analyzing the laws and policies governing nonprofit organizations; however, its use was limited due to the time-consuming nature of hand-coding. We introduce an advance in Natural Language Processing using a semantic role labeling (SRL) classifier that reliably codes rules governing and guiding nonprofit organizations. This paper provides guidance for how to hand-code using the IG, preprocess text for machine learning, and demonstrates the SRL classifier for automated IG coding. We then compare the hand-coding to the SRL coding to demonstrate its accuracy. The advances in machine learning now make it feasible to utilize the IG for nonprofit research questions focused on inter-organizational collaborations, government contracts, federated nonprofit organizational compliance, and nonprofit governance, among others. An added benefit is that the IG is adaptable for different languages, thus enabling cross-national comparative research. By providing examples throughout the paper, we demonstrate how to use the IG and the SRL classifier to address research questions of interest to nonprofit scholars.
more »
« less
- PAR ID:
- 10639621
- Publisher / Repository:
- AOP
- Date Published:
- Journal Name:
- Nonprofit Policy Forum
- ISSN:
- 2154-3348
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Institutions—defined as strategies, norms and rules (Ostrom Understanding institutional diversity, Princeton University Press, Princeton, 2005)—are omnipresent in third sector contexts. In this paper, we present the Institutional Grammar (IG) as a theoretically informed approach to support institutional analysis in third sector research. More specifically, the IG coding syntax allows the researcher to systematically wade through rich text and (transcribed) spoken language to identify and dissect institutional statements into finer syntactical segments of interest to the researcher. It is a versatile method that can generate data for small- or large-N research projects and can be integrated with mixed-method research designs. After first introducing and describing the IG, we present a case study to illustrate how a IG-based syntactic analysis can be leveraged to inform third sector research. In the case, we ask: Do the rules embedded in regulatory text addressing the involuntary dissolution of charity organizations differ between bifurcated and unitary jurisdictions in the United States? Using IG’s ABDICO 2.0 syntax, we identify eleven “Activation Condition” (AC) categories that trigger action and assess variation among the 46 jurisdictions. We ultimately conclude that the rules do not differ between bifurcated and unitary jurisdictions, but that finding is not the primary concern. The case demonstrates IG as an important methodological advance that yields granular, structured analyses of rules, norms and strategies in third sector settings that may be difficult to identify with other methods. We then emphasize four areas of third sector research that could benefit from the addition of IG-based methods: analysis of (1) rule compliance, (2) inter-organizational collaboration, (3) comparative study of institutional design, and (4) the study of institutional change. We close the paper with some reflections on where IG-based analysis is headed.more » « less
-
Researchers in the learning sciences have demonstrated the benefits of effective self-regulated learning (SRL) in improving learning outcomes. The search-as-learning community aims to improve learning outcomes during search, but offers limited research exploring the impact of SRL on learning during search. Current limited research in search-as-learning explores only \textit{perceptions} of SRL processes \textit{after} the search process~\cite{crescenzi_supporting_2021}. Results from such analyses are limited in that SRL is a dynamic, active process and participant perceptions of SRL can be unreliable~\cite{winne_exploring_2002, greene_domain-specificity_2015}. In this paper, we propose the implementation of an SRL coding framework to capture SRL processes as they unfold throughout a search session. Additionally, we offer several implications for future work using the proposed methodology.more » « less
-
Abstract Qualitative coding, or content analysis, is more than just labeling text: it is a reflexive interpretive practice that shapes research questions, refines theoretical insights, and illuminates subtle social dynamics. As large language models (LLMs) become increasingly adept at nuanced language tasks, questions arise about whether—and how—they can assist in large-scale coding without eroding the interpretive depth that distinguishes qualitative analysis from traditional machine learning and other quantitative approaches to natural language processing. In this paper, we present a hybrid approach that preserves hermeneutic value while incorporating LLMs to scale the application of codes to large data sets that are impractical for manual coding. Our workflow retains the traditional cycle of codebook development and refinement, adding an iterative step to adapt definitions for machine comprehension, before ultimately replacing manual with automated text categorization. We demonstrate how to rewrite code descriptions for LLM-interpretation, as well as how structured prompts and prompting the model to explain its coding decisions (chain-of-thought) can substantially improve fidelity. Empirically, our case study of socio-historical codes highlights the promise of frontier AI language models to reliably interpret paragraph-long passages representative of a humanistic study. Throughout, we emphasize ethical and practical considerations, preserving space for critical reflection, and the ongoing need for human researchers’ interpretive leadership. These strategies can guide both traditional and computational scholars aiming to harness automation effectively and responsibly—maintaining the creative, reflexive rigor of qualitative coding while capitalizing on the efficiency afforded by LLMs.more » « less
-
Social Networking Sites (SNS) offer youth activists and youth empowerment organizations (where adults help youth address community issues) opportunities for civic action. Impression management is critical to youth empowerment organizations’ work online, as they attempt to influence the opinions of their audience. However, there is a dearth of research characterizing online impression management in the context of youth empowerment organizations. To address this research gap, we conducted a qualitative study investigating the use of SNS in a youth empowerment organization. Using Goffman’s dramaturgical model, we characterized how youth tried to hack SNS algorithms, and their desire to better identify their audience. Our findings reveal how youth use SNS to create authentic images and connections with their audience. On the other hand, we discuss adults’ desire to convey a curated organizational image and challenges that arose. We conclude with design implications for tools that support impression management online for youth activists.more » « less
An official website of the United States government
