skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protecting Sensitive Data Early in the Research Data Lifecycle
How do researchers in fieldwork-intensive disciplines protect sensitive data in the field, how do they assess their own practices, and how do they arrive at them? This article reports the results of a qualitative study with 36 semi-structured interviews with qualitative and multi-method researchers in political science and humanitarian aid/migration studies. We find that researchers frequently feel ill-prepared to handle the management of sensitive data in the field and find that formal institutions provide little support. Instead, they use a patchwork of sources to devise strategies for protecting their informants and their data. We argue that this carries substantial risks for the security of the data as well as their potential for later sharing and re-use. We conclude with some suggestions for effectively supporting data management in fieldwork-intensive research without unduly adding to the burden on researchers conducting it.  more » « less
Award ID(s):
1823950 2116935
PAR ID:
10522956
Author(s) / Creator(s):
; ;
Publisher / Repository:
Cornell Labor Dynamics Institute
Date Published:
Journal Name:
Journal of Privacy and Confidentiality
Volume:
13
Issue:
2
ISSN:
2575-8527
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This essay draws on qualitative social science to propose a critical intellectual infrastructure for data science of social phenomena. Qualitative sensibilities— interpretivism, abductive reasoning, and reflexivity in particular—could address methodological problems that have emerged in data science and help extend the frontiers of social knowledge. First, an interpretivist lens—which is concerned with the construction of meaning in a given context—can enable the deeper insights that are requisite to understanding high-level behavioral patterns from digital trace data. Without such contextual insights, researchers often misinterpret what they find in large-scale analysis. Second, abductive reasoning—which is the process of using observations to generate a new explanation, grounded in prior assumptions about the world—is common in data science, but its application often is not systematized. Incorporating norms and practices from qualitative traditions for executing, describing, and evaluating the application of abduction would allow for greater transparency and accountability. Finally, data scientists would benefit from increased reflexivity—which is the process of evaluating how researchers’ own assumptions, experiences, and relationships influence their research. Studies demonstrate such aspects of a researcher’s experience that typically are unmentioned in quantitative traditions can influence research findings. Qualitative researchers have long faced these same concerns, and their training in how to deconstruct and document personal and intellectual starting points could prove instructive for data scientists. We believe these and other qualitative sensibilities have tremendous potential to facilitate the production of data science research that is more meaningful, reliable, and ethical. 
    more » « less
  2. The criminogenic dimensions of conservation are highly relevant to contemporary protected area management. Research on crime target suitability in the field of criminology has built new understanding regarding how the characteristics of the crime targets affect their suitability for being targeted by offenders. In the last decade, criminologists have sought to apply and adapt target suitability frameworks to explain wildlife related crimes. This study seeks to build upon the extant knowledge base and advance adaptation and application of target suitability research. First, we drew on research, fieldwork, and empirical evidence from conservation science to develop a poaching-stage model with a focus on live specimens or wild animals - rather than a market stage and wildlife product -focused target suitability model. Second, we collected data in the Intensive Protection Zone of Bukit Barisan Selatan National Park (BBSNP), Sumatra, Indonesia through surveys with local community members (n=400), and a three-day focus group with conservation practitioners (n= 25). Our target suitability model, IPOACHED, predicts that species that are in-demand , passive , obtainable , all-purpose , conflict-prone , hideable , extractable , and disposable are more suitable species for poaching and therefore more vulnerable. When applying our IPOACHED model, we find that the most common response to species characteristics that drive poaching in BBSNP was that they are in-demand , with support for cultural or symbolic value (n=101 of respondents, 25%), ecological value (n=164, 35%), and economic value (n=234, 59%). There was moderate support for the conflict-prone dimension of the IPOACHED model (n=70, 18%). Other factors, such as a species lack of passiveness , obtainability and extractability , hamper poaching regardless of value. Our model serves as an explanatory or predictive tool for understanding poaching within a conservation-based management unit (e.g., a protected area) rather than for a specific use market (e.g., pets). Conservation researchers and practitioners can use and adapt our model and survey instruments to help explain and predict poaching of species through the integration of knowledge and opinions from local communities and conservation professionals, with the ultimate goal of preventing wildlife poaching. 
    more » « less
  3. Who conducts biological research, where they do it and how results are disseminated vary among geographies and identities. Identifying and documenting these forms of bias by research communities is a critical step towards addressing them. We documented perceived and observed biases in movement ecology, a rapidly expanding sub-discipline of biology, which is strongly underpinned by fieldwork and technology use. We surveyed attendees before an international conference to assess a baseline within-discipline perceived bias (uninformed perceived bias). We analysed geographic patterns inMovement Ecologyarticles, finding discrepancies between the country of the authors’ affiliation and study site location, related to national economics. We analysed race-gender identities of USA biology researchers (the closest to our sub-discipline with data available), finding that they differed from national demographics. Finally, we discussed the quantitatively observed bias at the conference, to assess within-discipline perceived bias informed with observational data (informed perceived bias). Although the survey indicated most conference participants as bias-aware, conversations only covered a subset of biases. We discuss potential causes of bias (parachute-science, fieldwork accessibility), solutions and the need to evaluate mitigatory action effectiveness. Undertaking data-driven analysis of bias within sub-disciplines can help identify specific barriers and move towards the inclusion of a greater diversity of participants in the scientific process. 
    more » « less
  4. Data sharing is increasingly an expectation in health research as part of a general move toward more open sciences. In the United States, in particular, the implementation of the 2023 National Institutes of Health Data Management and Sharing Policy has made it clear that qualitative studies are not exempt from this data sharing requirement. Recognizing this trend, the Palliative Care Research Cooperative Group (PCRC) realized the value of creating a de-identified qualitative data repository to complement its existing de-identified quantitative data repository. The PCRC Data Informatics and Statistics Core leadership partnered with the Qualitative Data Repository (QDR) to establish the first serious illness and palliative care qualitative data repository in the U.S. We describe the processes used to develop this repository, called the PCRC-QDR, as well as our outreach and education among the palliative care researcher community, which led to the first ten projects to share the data in the new repository. Specifically, we discuss how we co-designed the PCRC-QDR and created tailored guidelines for depositing and sharing qualitative data depending on the original research context, establishing uniform expectations for key components of relevant documentation, and the use of suitable access controls for sensitive data. We also describe how PCRC was able to leverage its existing community to recruit and guide early depositors and outline lessons learned in evaluating the experience. This work advances the establishment of best practices in qualitative data sharing. 
    more » « less
  5. null (Ed.)
    Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT. While most efforts have used deidentified EHR, many researchers have access to large sets of sensitive, non-deidentified EHR with which they might train a BERT model (or similar). Would it be safe to release the weights of such a model if they did? In this work, we design a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR. However, more sophisticated “attacks” may succeed in doing so: To facilitate such research, we make our experimental setup and baseline probing models available at https://github.com/elehman16/exposing_patient_data_release. 
    more » « less