skip to main content


Title: Trial by File Formats: Exploring Public Defenders' Challenges Working with Novel Surveillance Data
In the United States, public defenders (lawyers assigned to people accused of crimes who cannot afford a private attorney) serve as an essential bulwark against wrongful arrest and incarceration for low-income and marginalized people. Public defenders have long been overworked and under-resourced. However, these issues have been compounded by increases in the volume and complexity of data in modern criminal cases. We explore the technology needs of public defenders through a series of semi-structured interviews with public defenders and those who work with them. We find that public defenders' ability to reason about novel surveillance data is woefully inadequate not only due to a lack of resources and knowledge, but also due to the structure of the criminal justice system, which gives prosecutors and police (in partnership with private companies) more control over the type of information used in criminal cases than defense attorneys. We find that public defenders may be able to create fairer situations for their clients with better tools for data interpretation and access. Therefore, we call on technologists to attend to the needs of public defenders and the people they represent when designing systems that collect data about people. Our findings illuminate constraints that technologists and privacy advocates should consider as they pursue solutions. In particular, our work complicates notions of individual privacy as the only value in protecting users' rights, and demonstrates the importance of data interpretation alongside data visibility. As data sources become more complex, control over the data cannot be separated from access to the experts and technology to make sense of that data. The growing surveillance data ecosystem may systematically oppress not only those who are most closely observed, but groups of people whose communities and advocates have been deprived of the storytelling power over their information.  more » « less
Award ID(s):
2129008
NSF-PAR ID:
10357950
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Volume:
6
Issue:
CSCW1
ISSN:
2573-0142
Page Range / eLocation ID:
1 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. People who are blind share their images and videos with companies that provide visual assistance technologies (VATs) to gain access to information about their surroundings. A challenge is that people who are blind cannot independently validate the content of the images and videos before they share them, and their visual data commonly contains private content. We examine privacy concerns for blind people who share personal visual data with VAT companies that provide descriptions authored by humans or artifcial intelligence (AI) . We frst interviewed 18 people who are blind about their perceptions of privacy when using both types of VATs. Then we asked the participants to rate 21 types of image content according to their level of privacy concern if the information was shared knowingly versus unknowingly with human- or AI-powered VATs. Finally, we analyzed what information VAT companies communicate to users about their collection and processing of users’ personal visual data through their privacy policies. Our fndings have implications for the development of VATs that safeguard blind users’ visual privacy, and our methods may be useful for other camera-based technology companies and their users. 
    more » « less
  2. Within the ongoing disruption of the COVID-19 pandemic, technologically mediated health surveillance programs have vastly intensified and expanded to new spaces. Popular understandings of medical and health data protections came into question as a variety of institutions introduced new tools for symptom tracking, contact tracing, and the management of related data. These systems have raised complex questions about who should have access to health information, under what circumstances, and how people and institutions negotiate relationships between privacy, public safety, and care during times of crisis. In this paper, we take up the case of a large public university working to keep campus productive during COVID-19 through practices of placemaking, symptom screeners, and vaccine mandate compliance databases. Drawing on a multi-methods study including thirty-eight interviews, organizational documents, and discursive analysis, we show where and for whom administrative care infrastructures either misrecognized or torqued (Bowker and Star 1999) the care relationships that made life possible for people in the university community. We argue that an analysis of care—including the social relations that enable it and those that attempt to hegemonically define it—opens important questions for how people relate to data they produce about their bodies as well as to the institutions that manage them. Furthermore, we argue that privacy frameworks that rely on individual rights, essential categories of “sensitive information,” or the normative legitimacy of institutional practices are not equipped to reveal how people negotiate privacy and care in times of crisis. 
    more » « less
  3. The emergence of the novel SARS-CoV-2 (Covid-19) virus in 2019 has led to continuous monitoring of the outbreak attempting to generate accurate reports of people's health information to understand the pandemic's impact. It is likely that more variants will emerge since not all countries and populations have been vaccinated. Thus, with SARS-CoV-2's constant mutation, researchers need to collect individuals' health data to study these variants and vaccine efficacy, especially those who show symptoms. However, researchers have difficulties building comprehensive datasets because people are unwilling to release their health information or have no way to report their health statuses (i.e., at-home testing). This problem stems from a lack of complete control over who assesses their health data. Hence, they cannot guarantee the security, privacy, and integrity of the disclosed health information. As the problem of building secure databases persists, researchers find it challenging to accurately report any evolving variants within a short period. In this work, we propose a blockchain architecture that can guarantee patients' health data integrity, privacy, and security, encouraging individuals to disclose their health information freely. This solution gives patients complete control over who assesses their health information. The framework proposed access management to patients' health data for researchers and contact tracers. This solution classifies patient health information to different sensitivity levels and manages access based on this sensitivity. In case of unauthorized access, the proposed solution detects and prevents such access, thereby ensuring the patient's health information's security, integrity, and privacy. 
    more » « less
  4. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a National Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv, nedc_tuh_dpath@googlegroups.com, to be kept informed of the latest developments in this project. You can learn more from our project website: https://www.isip.piconepress.com/projects/nsf_dpath. 
    more » « less
  5. We investigate the privacy practices of labor organizers in the computing technology industry and explore the changes in these practices as a response to remote work. Our study is situated at the intersection of two pivotal shifts in workplace dynamics: (a) the increase in online workplace communications due to remote work, and (b) the resurgence of the labor movement and an increase in collective action in workplaces-especially in the tech industry, where this phenomenon has been dubbed the tech worker movement. The shift of work-related communications to online digital platforms in response to an increase in remote work is creating new opportunities for and risks to the privacy of workers. These risks are especially significant for organizers of collective action, with several well-publicized instances of retaliation against labor organizers by companies. Through a series of qualitative interviews with 29 tech workers involved in collective action, we investigate how labor organizers assess and mitigate risks to privacy while engaging in these actions. Among the most common risks that organizers experienced are retaliation from their employer, lateral worker conflict, emotional burnout, and the possibility of information about the collective effort leaking to management. Depending on the nature and source of the risk, organizers use a blend of digital security practices and community-based mechanisms. We find that digital security practices are more relevant when the threat comes from management, while community management and moderation are central to protecting organizers from lateral worker conflict. Since labor organizing is a collective rather than individual project, individual privacy and collective privacy are intertwined, sometimes in conflict and often mutually constitutive. Notions of privacy that solely center individuals are often incompatible with the needs of organizers, who noted that safety in numbers could only be achieved when workers presented a united front to management. Based on our interviews, we identify key topics for future research, such as the growing prevalence of surveillance software and the needs of international and gig worker organizers.We conclude with design recommendations that can help create safer, more secure and more private tools to better address the risks that organizers face. 
    more » « less