skip to main content


Title: Anonymous Collocation Discovery: Harnessing Privacy to Tame the Coronavirus
Successful containment of the Coronavirus pandemic rests on the ability to quickly and reliably identify those who have been in close proximity to a contagious individual. Existing tools for doing so rely on the collection of exact location information of individuals over lengthy time periods, and combining this information with other personal information. This unprecedented encroachment on individual privacy at national scales has created an outcry and risks rejection of these tools. We propose an alternative: an extremely simple scheme for providing fine-grained and timely alerts to users who have been in the close vicinity of an infected individual. Crucially, this is done while preserving the anonymity of all individuals, and without collecting or storing any personal information or location history. Our approach is based on using short-range communication mechanisms, like Bluetooth, that are available in all modern cell phones. It can be deployed with very little infrastructure, and incurs a relatively low false-positive rate compared to other collocation methods. We also describe a number of extensions and tradeoffs. We believe that the privacy guarantees provided by the scheme will encourage quick and broad voluntary adoption. When combined with sufficient testing capacity and existing best practices from healthcare professionals, we hope that this may significantly reduce the infection rate.  more » « less
Award ID(s):
1915763 1931714 1718135 1801564
NSF-PAR ID:
10156173
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ArXivorg
Volume:
2003
Issue:
13670
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. When releasing data to the public, a vital concern is the risk of exposing personal information of the individuals who have contributed to the data set. Many mechanisms have been proposed to protect individual privacy, though less attention has been dedicated to practically conducting valid inferences on the altered privacy-protected data sets. For frequency tables, the privacy-protection-oriented perturbations often lead to negative cell counts. Releasing such tables can undermine users’ confidence in the usefulness of such data sets. This paper focuses on releasing one-way frequency tables. We recommend an optimal mechanism that satisfies ϵ-differential privacy (DP) without suffering from having negative cell counts. The procedure is optimal in the sense that the expected utility is maximized under a given privacy constraint. Valid inference procedures for testing goodness-of-fit are also developed for the DP privacy-protected data. In particular, we propose a de-biased test statistic for the optimal procedure and derive its asymptotic distribution. In addition, we also introduce testing procedures for the commonly used Laplace and Gaussian mechanisms, which provide a good finite sample approximation for the null distributions. Moreover, the decaying rate requirements for the privacy regime are provided for the inference procedures to be valid. We further consider common users’ practices such as merging related or neighboring cells or integrating statistical information obtained across different data sources and derive valid testing procedures when these operations occur. Simulation studies show that our inference results hold well even when the sample size is relatively small. Comparisons with the current field standards, including the Laplace, the Gaussian (both with/without post-processing of replacing negative cell counts with zeros), and the Binomial-Beta McClure-Reiter mechanisms, are carried out. In the end, we apply our method to the National Center for Early Development and Learning’s (NCEDL) multi-state studies data to demonstrate its practical applicability. 
    more » « less
  2. null (Ed.)
    Abstract Background Personal privacy is a significant concern in the era of big data. In the field of health geography, personal health data are collected with geographic location information which may increase disclosure risk and threaten personal geoprivacy. Geomasking is used to protect individuals’ geoprivacy by masking the geographic location information, and spatial k-anonymity is widely used to measure the disclosure risk after geomasking is applied. With the emergence of individual GPS trajectory datasets that contains large volumes of confidential geospatial information, disclosure risk can no longer be comprehensively assessed by the spatial k-anonymity method. Methods This study proposes and develops daily activity locations (DAL) k-anonymity as a new method for evaluating the disclosure risk of GPS data. Instead of calculating disclosure risk based on only one geographic location (e.g., home) of an individual, the new DAL k-anonymity is a composite evaluation of disclosure risk based on all activity locations of an individual and the time he/she spends at each location abstracted from GPS datasets. With a simulated individual GPS dataset, we present case studies of applying DAL k-anonymity in various scenarios to investigate its performance. The results of applying DAL k-anonymity are also compared with those obtained with spatial k-anonymity under these scenarios. Results The results of this study indicate that DAL k-anonymity provides a better estimation of the disclosure risk than does spatial k-anonymity. In various case-study scenarios of individual GPS data, DAL k-anonymity provides a more effective method for evaluating the disclosure risk by considering the probability of re-identifying an individual’s home and all the other daily activity locations. Conclusions This new method provides a quantitative means for understanding the disclosure risk of sharing or publishing GPS data. It also helps shed new light on the development of new geomasking methods for GPS datasets. Ultimately, the findings of this study will help to protect individual geoprivacy while benefiting the research community by promoting and facilitating geospatial data sharing. 
    more » « less
  3. null (Ed.)
    In recent years, studies in engineering education have begun to intentionally integrate disability into discussions of diversity, inclusion, and equity. To broaden and advocate for the participation of this group in engineering, researchers have identified a variety of factors that have kept people with disabilities at the margins of the field. Such factors include the underrepresentation of disabled individuals within research and industry; systemic and personal barriers, and sociocultural expectations within and beyond engineering education-related contexts. These findings provide a foundational understanding of the external and environmental influences that can shape how students with disabilities experience higher education, develop a sense of belonging, and ultimately form professional identities as engineers. Prior work examining the intersections of disability identity and professional identity is limited, with little to no studies examining the ways in which students conceptualize, define, and interpret disability as a category of identity during their undergraduate engineering experience. This lack of research poses problems for recruitment, retention, and inclusion, particularly as existing studies have shown that the ways in which students perceive and define themselves in relation to their college major is crucial for the development of a professional engineering identity. Further, due to variation in defining ‘disability’ across national agencies (e.g., the National Institutes of Health, and the Department of Justice) and disability communities (with different models of disability), the term “disability” is broad and often misunderstood, frequently referring to a group of individuals with a wide range of conditions and experiences. Therefore, the purpose of this study is to gain deeper insights into the ways students define disability and disability identity within their own contexts as they develop professional identities. Specifically, we ask the following research question: How do students describe and conceptualize non-apparent disabilities? To answer this research question, we draw from emergent findings from an on-going grounded theory exploration of professional identity formation of undergraduate civil engineering students with disabilities. In this paper, we focus our discussion on the grounded theory analyses of 4 semi-structured interviews with participants who have disclosed a non-apparent disability. Study participants consist of students currently enrolled in undergraduate civil engineering programs, students who were initially enrolled in undergraduate civil engineering programs and transferred to another major, and students who have recently graduated from a civil engineering program within the past year. Sensitizing concepts emerged as findings from the initial grounded theory analysis to guide and initiate our inquiry: 1) the medical model of disability, 2) the social model of disability, and 3) personal experience. First, medical models of disability position physical, cognitive, and developmental difference as a “sickness” or “condition” that must be “treated”. From this perspective, disability is perceived as an impairment that must be accommodated so that individuals can obtain a dominantly-accepted sense of normality. An example of medical models within the education context include accommodations procedures in which students must obtain an official diagnosis in order to access tools necessary for academic success. Second, social models of disability position disability as a dynamic and fluid identity that consists of a variety of physical, cognitive, or developmental differences. Dissenting from assumptions of normality and the focus on individual bodily conditions (hallmarks of the medical model), the social model focuses on the political and social structures that inherently create or construct disability. An example of a social model within the education context includes the universal design of materials and tools that are accessible to all students within a given course. In these instances, students are not required to request accommodations and may, consequently, bypass medical diagnoses. Lastly, participants referred to their own life experiences as a way to define, describe, and consider disability. Fernando considers his stutter to be a disability because he is often interrupted, spoken over, or silenced when engaging with others. In turn, he is perceived as unintelligent and unfit to be a civil engineer by his peers. In contrast, David, who identifies as autistic, does not consider himself to be disabled. These experiences highlight the complex intersections of medical and social models of disability and their contextual influences as participants navigate their lives. While these sensitizing concepts are not meant to scope the research, they provide a useful lens for initiating research and provides markers on which a deeper, emergent analysis is expanded. Findings from this work will be used to further explore the professional identity formation of undergraduate civil engineering students with disabilities. These findings will provide engineering education researchers and practitioners with insights regarding the ways individuals with disabilities interpret their in- and out-of-classroom experiences and navigate their disability identities. For higher education, broadly, this work aims to reinforce the complex and diverse nature of disability experience and identity, particularly as it relates to accommodations and accessibility within the classroom, and expand the inclusiveness of our programs and institutions. 
    more » « less
  4. Purpose Existing algorithms for predicting suicide risk rely solely on data from electronic health records, but such models could be improved through the incorporation of publicly available socioeconomic data – such as financial, legal, life event and sociodemographic data. The purpose of this study is to understand the complex ethical and privacy implications of incorporating sociodemographic data within the health context. This paper presents results from a survey exploring what the general public’s knowledge and concerns are about such publicly available data and the appropriateness of using it in suicide risk prediction algorithms. Design/methodology/approach A survey was developed to measure public opinion about privacy concerns with using socioeconomic data across different contexts. This paper presented respondents with multiple vignettes that described scenarios situated in medical, private business and social media contexts, and asked participants to rate their level of concern over the context and what factor contributed most to their level of concern. Specific to suicide prediction, this paper presented respondents with various data attributes that could potentially be used in the context of a suicide risk algorithm and asked participants to rate how concerned they would be if each attribute was used for this purpose. Findings The authors found considerable concern across the various contexts represented in their vignettes, with greatest concern in vignettes that focused on the use of personal information within the medical context. Specific to the question of incorporating socioeconomic data within suicide risk prediction models, the results of this study show a clear concern from all participants in data attributes related to income, crime and court records, and assets. Data about one’s household were also particularly concerns for the respondents, suggesting that even if one might be comfortable with their own being used for risk modeling, data about other household members is more problematic. Originality/value Previous studies on the privacy concerns that arise when integrating data pertaining to various contexts of people’s lives into algorithmic and related computational models have approached these questions from individual contexts. This study differs in that it captured the variation in privacy concerns across multiple contexts. Also, this study specifically assessed the ethical concerns related to a suicide prediction model and determining people’s awareness of the publicness of select data attributes, as well as which of these data attributes generated the most concern in such a context. To the best of the authors’ knowledge, this is the first study to pursue this question. 
    more » « less
  5. Abstract This article studies the effect of corporate and personal taxes on innovation in the United States over the twentieth century. We build a panel of the universe of inventors who patented since 1920, and a historical state-level corporate tax database with corporate tax rates and tax base information, which we link to existing data on state-level personal income taxes and other economic outcomes. Our analysis focuses on the effect of personal and corporate income taxes on individual inventors (the micro level) and on states (the macro level), considering the quantity and quality of innovation, its location, and the share produced by the corporate rather than the noncorporate sector. We propose several identification strategies, all of which yield consistent results. We find that higher taxes negatively affect the quantity and the location of innovation, but not average innovation quality. The state-level elasticities to taxes are large and consistent with the aggregation of the individual-level responses of innovation produced and cross-state mobility. Corporate taxes tend to especially affect corporate inventors’ innovation production and cross-state mobility. Personal income taxes significantly affect the quantity of innovation overall and the mobility of inventors. 
    more » « less