Global policy goals for halting biodiversity loss and climate change depend on each other to be successful. Marine biodiversity and climate change are intertwined through foodwebs that cycle and transport carbon and contribute to carbon sequestration. Yet, biodiversity conservation and fisheries management seldom explicitly include ocean carbon transport and sequestration. In order to effectively manage and govern human activities that affect carbon cycling and sequestration, international biodiversity and climate agreements need to address both biodiversity and climate issues. International agreements that address issues for climate and biodiversity are best poised to facilitate the protection of ocean carbon with existing policies. The degree to which the main international biodiversity and climate agreements make reference to multiple issues has however not been documented. Here, we used a text mining analysis of over 2,700 binding and non-binding policy documents from ten global ocean-related agreements to identify keywords related to biodiversity, climate, and ocean carbon. While climate references were mostly siloed within climate agreements, biodiversity references were included in most agreements. Further, we found that six percent of policy documents (n=166) included ocean carbon keywords. In light of our results, we highlight opportunities to strengthen the protection of ocean carbon in upcoming negotiations of international agreements, and via area-based management, environmental impact assessment and strategic environmental assessment.
more »
« less
Inferring missing metadata from environmental policy texts
The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmental policy by extracting and organizing metadata from the text of NEPA documents. Our contributions include collecting more than 40,000 NEPA-related documents, and evaluating rule-based baselines that establish the difficulty of three important tasks: identifying lead agencies, aligning document versions, and detecting reused text.
more »
« less
- Award ID(s):
- 1831551
- PAR ID:
- 10113356
- Date Published:
- Journal Name:
- Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
- Page Range / eLocation ID:
- 46 to 51
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Institutional arrangements that guide collective action between entities create benefits and burdens for collaborating entities and can encourage cooperation or create coordination dilemmas. There is an abundance of research in public policy, public administration, and nonprofit management on cross‐sector alliances, co‐production, and collaborative networks. We contribute to advancing this research by introducing a methodological approach that combines two text‐based methods: institutional network analysis and cost–benefit analysis. We utilize the Institutional Grammar to code policy documents that govern relationships between actors. The coded text is then used to identify Networks of Prescribed Interactions to analyze institutional relationships between policy actors. We then utilize the coded text in a cost–benefit analysis to assess benefit and burden distributive effects. This integrated methodological framework provides researchers with a tool to elucidate both the institutional patterns of interaction and distributive implications embedded in policy documents, revealing insights that single‐method approaches cannot capture. We then utilize the coded text in a cost–benefit analysis to assess benefit and burden distributive effects. This integrated methodological framework provides researchers with a tool to elucidate both the institutional patterns of interaction and distributive implications embedded in policy documents, revealing insights that single‐method approaches cannot capture. To demonstrate the utility of this integrated approach, we examine the policy design of two nonprofit open‐source software (OSS) incubation programs with contrasting characteristics: the Apache Software Foundation (ASF) and the Open Source Geospatial Foundation (OSGeo). We select these cases because: (1) they are co‐production alliances and have policy documents that articulate support for collective action; (2) their policy documents and group discussions are open access, creating an opportunity to advance text‐based policy analysis methods; and (3) they represent juxtaposed examples of high and low risk for collaboration settings, thereby providing two illustrative cases of the combined network and cost–benefit text‐based methodological approach. The network analysis finds that ASF policies, as a high‐risk setting, emphasize bonding structures, particularly higher reciprocity, which creates a context for cooperation. OSGeo, a low‐risk setting, has policies creating a context for bridging structures, evident in high brokerage efficiency, to facilitate coordination. The cost–benefit analysis finds that ASF policies balance the distribution of costs and benefits between ASF and projects, while in OSGeo, projects bear both costs and benefits. These findings demonstrate that the combination of network and cost–benefit analysis is an effective tool for utilizing text to compare policy designs.more » « less
-
null (Ed.)Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of hierarchical topic structures in massive text corpora. Although related studies have achieved satisfying performance in fully supervised hierarchical document classification, they usually require massive human-annotated training data and only utilize text information. However, in many domains, (1) annotations are quite expensive where very few training samples can be acquired; (2) documents are accompanied by metadata information. Hence, this paper studies how to integrate the label hierarchy, metadata, and text signals for document categorization under weak supervision. We develop HiMeCat, an embedding-based generative framework for our task. Specifically, we propose a novel joint representation learning module that allows simultaneous modeling of category dependencies, metadata information and textual semantics, and we introduce a data augmentation module that hierarchically synthesizes training documents to complement the original, small-scale training set. Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.more » « less
-
Abstract Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi‐file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real‐world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections.more » « less
-
Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for text classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical text classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical text classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.more » « less
An official website of the United States government

