skip to main content


Title: Text as Policy: Measuring Policy Similarity through Bill Text Reuse
Award ID(s):
1637089
NSF-PAR ID:
10121488
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Policy Studies Journal
ISSN:
0190-292X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Website privacy policies sometimes provide users the option to opt-out of certain collections and uses of their personal data. Unfortunately, many privacy policies bury these instructions deep in their text, and few web users have the time or skill necessary to discover them. We describe a method for the automated detection of opt-out choices in privacy policy text and their presentation to users through a web browser extension. We describe the creation of two corpora of opt-out choices, which enable the training of classifiers to identify opt-outs in privacy policies. Our overall approach for extracting and classifying opt-out choices combines heuristics to identify commonly found opt-out hyperlinks with supervised machine learning to automatically identify less conspicuous instances. Our approach achieves a precision of 0.93 and a recall of 0.9. We introduce Opt-Out Easy, a web browser extension designed to present available opt-out choices to users as they browse the web. We evaluate the usability of our browser extension with a user study. We also present results of a large-scale analysis of opt-outs found in the text of thousands of the most popular websites. 
    more » « less
  2. Because science advances incrementally, scientists often need to repeat material included in their prior work when composing new texts. Such “text recycling” is a common but complex writing practice, so authors and editors need clear and consistent guidance about what constitutes appropriate practice. Unfortunately, publishers’ policies on text recycling to date have been incomplete, unclear, and sometimes internally inconsistent. Building on 4 years of research on text recycling in scientific writing, the Text Recycling Research Project has developed a model text recycling policy that should be widely applicable for research publications in scientific fields. This article lays out the challenges text recycling poses for editors and authors, describes key factors that were addressed in developing the policy, and explains the policy’s main features. 
    more » « less
  3. null (Ed.)
    Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors’ annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index. 
    more » « less