skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Excavating awareness and power in data science: A manifesto for trustworthy pervasive data research
Frequent public uproar over forms of data science that rely on information about people demonstrates the challenges of defining and demonstrating trustworthy digital data research practices. This paper reviews problems of trustworthiness in what we term pervasive data research: scholarship that relies on the rich information generated about people through digital interaction. We highlight the entwined problems of participant unawareness of such research and the relationship of pervasive data research to corporate datafication and surveillance. We suggest a way forward by drawing from the history of a different methodological approach in which researchers have struggled with trustworthy practice: ethnography. To grapple with the colonial legacy of their methods, ethnographers have developed analytic lenses and researcher practices that foreground relations of awareness and power. These lenses are inspiring but also challenging for pervasive data research, given the flattening of contexts inherent in digital data collection. We propose ways that pervasive data researchers can incorporate reflection on awareness and power within their research to support the development of trustworthy data science.  more » « less
Award ID(s):
1704369 1704598
PAR ID:
10547193
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
Big Data & Society
Volume:
8
Issue:
2
ISSN:
2053-9517
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This essay draws on qualitative social science to propose a critical intellectual infrastructure for data science of social phenomena. Qualitative sensibilities— interpretivism, abductive reasoning, and reflexivity in particular—could address methodological problems that have emerged in data science and help extend the frontiers of social knowledge. First, an interpretivist lens—which is concerned with the construction of meaning in a given context—can enable the deeper insights that are requisite to understanding high-level behavioral patterns from digital trace data. Without such contextual insights, researchers often misinterpret what they find in large-scale analysis. Second, abductive reasoning—which is the process of using observations to generate a new explanation, grounded in prior assumptions about the world—is common in data science, but its application often is not systematized. Incorporating norms and practices from qualitative traditions for executing, describing, and evaluating the application of abduction would allow for greater transparency and accountability. Finally, data scientists would benefit from increased reflexivity—which is the process of evaluating how researchers’ own assumptions, experiences, and relationships influence their research. Studies demonstrate such aspects of a researcher’s experience that typically are unmentioned in quantitative traditions can influence research findings. Qualitative researchers have long faced these same concerns, and their training in how to deconstruct and document personal and intellectual starting points could prove instructive for data scientists. We believe these and other qualitative sensibilities have tremendous potential to facilitate the production of data science research that is more meaningful, reliable, and ethical. 
    more » « less
  2. null (Ed.)
    The growing prevalence of data-rich networked information technologies—such as social media platforms, smartphones, wearable devices, and the internet of things —brings an increase in the flow of rich, deep, and often identifiable personal information available for researchers. More than just “big data,” these datasets reflect people’s lives and activities, bridge multiple dimensions of a person’s life, and are often collected, aggregated, exchanged, and mined without them knowing. We call this data “pervasive data,” and the increased scale, scope, speed, and depth of pervasive data available to researchers require that we confront the ethical frameworks that guide such research activities. Multiple stakeholders are embroiled in the challenges of research ethics in pervasive data research: researchers struggle with questions of privacy and consent, user communities may not even be aware of the widespread harvesting of their data for scientific study, platforms are increasingly restricting researcher’s access to data over fears of privacy and security, and ethical review boards face increasing difficulties in properly considering the complexities of research protocols relying on user data collected online. The results presented in this paper expand our understanding of how ethical review board members think about pervasive data research. It provides insights into how IRB professionals make decisions about the use of pervasive data in cases not obviously covered by traditional research ethics guidelines, and points to challenges for IRBs when reviewing research protocols relying on pervasive data. 
    more » « less
  3. null (Ed.)
    In citizen science, data stewards and data producers are often not the same people. When those who have labored on data collection are not in control of the data, ethical problems could arise from this basic structural feature. In this Perspective, we advance the proposition that stewarding data sets generated by volunteers involves the typical technical decisions in conventional research plus a suite of ethical decisions stemming from the relationship between professionals and volunteers. Differences in power, priorities, values, and vulnerabilities are features of the relationship between professionals and volunteers. Thus, ethical decisions about open data practices in citizen science include, but are not limited to, questions grounded in respect for volunteers: who decides data governance structures, who receives attribution for a data set, which data are accessible and to whom, and whose interests are served by the data use/re-use. We highlight ethical issues that citizen science practitioners should consider when making data governance decisions, particularly with respect to open data. 
    more » « less
  4. While research has been conducted with and in marginalized or vulnerable groups, explicit guidelines and best practices centering on specific communities are nascent. An excellent case study to engage within this aspect of research is Black Twitter. This research project considers the history of research with Black communities, combined with empirical work that explores how people who engage with Black Twitter think about research and researchers in order to suggest potential good practices and what researchers should know when studying Black Twitter or other digital traces from marginalized or vulnerable online communities. From our interviews, we gleaned that Black Twitter users feel differently about their content contributing to a research study depending on, for example, the type of content and the positionality of the researcher. Much of the advice participants shared for researchers involved an encouragement to cultivate cultural competency, get to know the community before researching it, and conduct research transparently. Aiming to improve the experience of research for both Black Twitter and researchers, this project is a stepping stone toward future work that further establishes and expands user perceptions of research ethics for online communities composed of vulnerable populations. 
    more » « less
  5. This check sheet offers guidance on writing a data management plan based on National Science Foundation requirements. This guidance can help researchers who are applying for external funding and who intend to publish their data, data collection protocols, or instruments via the DesignSafe Cyberinfrastructure. About the CONVERGE Extreme Events Research Check Sheets Series: The National Science Foundation-supported CONVERGE facility at the Natural Hazards Center at the University of Colorado Boulder has developed a series of short, graphical check sheets that are meant to be used as researchers design their studies, prepare to enter the field, conduct field research, and exit the field. The series offers best practices for extreme events research and includes check sheets that are free to the research community. More information is available at: https://converge.colorado.edu/resources/check-sheets. 
    more » « less