The computer science literature on identification of people using personal information paints a wide spectrum, from aggregate information that doesn’t contain information about individual people, to information that itself identifies a person. However, privacy laws and regulations often distinguish between only two types, often called personally identifiable information and de-identified information. We show that the collapse of this technological spectrum of identifiability into only two legal definitions results in the failure to encourage privacy-preserving practices. We propose a set of legal definitions that spans the spectrum. We start with anonymous information. Computer science has created anonymization algorithms, including differential privacy, that provide mathematical guarantees that a person cannot be identified. Although the California Consumer Privacy Act (CCPA) defines aggregate information, it treats aggregate information the same as de-identified information. We propose a definition of anonymous information based on the technological possibility of logical association of the information with other information. We argue for the exclusion of anonymous information from notice and consent requirements. We next consider de-identified information. Computer science has created de-identification algorithms, including generalization, that minimize (but not eliminate) the risk of re-identification. GDPR defines anonymous information but not de-identified information, and CCPA defines de-identified information but not anonymous information. The definitions do not align. We propose a definition of de-identified information based on the reasonableness of association with other information. We propose legal controls to protect against re-identification. We argue for the inclusion of de-identified information in notice requirements, but the exclusion of de-identified information from choice requirements. We next address the distinction between trackable and non-trackable information. Computer science has shown how one-time identifiers can be used to protect reasonably linkable information from being tracked over time. Although both GDPR and CCPA discuss profiling, neither formally defines it as a form of personal information, and thus both fail to adequately protect against it. We propose definitions of trackable information and non-trackable information based on the likelihood of association with information from other contexts. We propose a set of legal controls to protect against tracking. We argue for requiring stronger forms of user choice for trackable information, which will encourage the use of non-trackable information. Finally, we address the distinction between pseudonymous and reasonably identifiable information. Computer science has shown how pseudonyms can be used to reduce identification. Neither GDPR nor CCPA makes a distinction between pseudonymous and reasonable identifiable information. We propose definitions based on the reasonableness of identifiability of the information, and we propose a set of legal controls to protect against identification. We argue for requiring stronger forms of user choice for reasonably identifiable information, which will encourage the use of pseudonymous information. Our definitions of anonymous information, de-identified information, non-trackable information, trackable information, and reasonably identifiable information can replace the over-simplified distinction between personally identifiable information versus de-identified information. We hope that this full spectrum of definitions can be used in a comprehensive privacy law to tailor notice and consent requirements to the characteristics of each type of information.
more »
« less
Ten simple rules for managing laboratory information
Information is the cornerstone of research, from experimental (meta)data and computational processes to complex inventories of reagents and equipment. These 10 simple rules discuss best practices for leveraging laboratory information management systems to transform this large information load into useful scientific findings.
more »
« less
- Award ID(s):
- 2123367
- PAR ID:
- 10559999
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Editor(s):
- Markel, Scott
- Publisher / Repository:
- Public Library of Science
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 19
- Issue:
- 12
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1011652
- Subject(s) / Keyword(s):
- laboratory informatics, data management, reproducibility, tutorial
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Development of a comprehensive legal privacy framework in the United States should be based on identification of the common deficiencies of privacy policies. We attempt to delineate deficiencies by critically analyzing the privacy policies of mobile apps, application suites, social networks, Internet Service Providers, and Internet-of-Things devices. Whereas many studies have examined readability of privacy policies, few have specifically identified the information that should be provided in privacy policies but is not. Privacy legislation invariably starts a definition of personally identifiable information. We find that privacy policies’ definitions of personally identifiable information are far too restrictive, excluding information that does not itself identify a person but which can be used to reasonably identify a person, and excluding information paired with a device identifier which can be reasonably linked to a person. Legislation should define personally identifiable information to include such information, and should differentiate between information paired with a name versus information paired with a device identifier. Privacy legislation often excludes anonymous and de-identified information from notice and choice requirements. We find that privacy policies’ descriptions of anonymous and de-identified information are far too broad, including information paired with advertising identifiers. Computer science has repeatedly demonstrated that such information is reasonably linkable. Legislation should define these categories of information to align with technological abilities. Legislation should also not exempt de-identified information from notice requirements, to increase transparency. Privacy legislation relies heavily on notice requirements. We find that, because privacy policies’ disclosures of the uses of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are used for which purposes. Often, we cannot determine whether location or web browsing history is used solely for functional purposes or also for advertising. Legislation should require the disclosure of the purposes for each type of personal information collected. We also find that, because privacy policies disclosures of sharing of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are shared. Legislation should require the disclosure of the types of personal information shared. Finally, privacy legislation relies heavily on user choice. We find that free services often require the collection and sharing of personal information. As a result, users often have no choices. We find that whereas some paid services afford users a wide variety of choices, paid services in less competitive sectors often afford users few choices over use and sharing of personal information for purposes unrelated to the service. As a result, users are often unable to dictate which types of information they wish to allow to be shared, and which types they wish to allow to be used for advertising. Legislation should differentiate between take-it-or-leave it, opt-out, and opt-in approaches based on the type of use and on whether the information is shared. Congress should consider whether user choices should be affected by the presence of market power.more » « less
-
Tauman_Kalai, Yael (Ed.)We study a setting where Bayesian agents with a common prior have private information related to an event’s outcome and sequentially make public announcements relating to their information. Our main result shows that when agents' private information is independent conditioning on the event’s outcome whenever agents have similar beliefs about the outcome, their information is aggregated. That is, there is no false consensus. Our main result has a short proof based on a natural information-theoretic framework. A key ingredient of the framework is the equivalence between the sign of the "interaction information" and a super/sub-additive property of the value of people’s information. This provides an intuitive interpretation and an interesting application of the interaction information, which measures the amount of information shared by three random variables. We illustrate the power of this information-theoretic framework by reproving two additional results within it: 1) that agents quickly agree when announcing (summaries of) beliefs in round-robin fashion [Aaronson 2005], and 2) results from [Chen et al 2010] on when prediction market agents should release information to maximize their payment. We also interpret the information-theoretic framework and the above results in prediction markets by proving that the expected reward of revealing information is the conditional mutual information of the information revealed.more » « less
-
null (Ed.)Building information modeling (BIM) provides a novel way of information management for all lifecycle phases of a building project. It is facilitating the processes of a construction project, such as architectural design, structural analysis, and construction management. Industry foundation classes (IFC) is an open standard for information exchange between different BIM applications in the architecture, engineering, and construction (AEC) domain. It represents project information in an interoperable way that contains geometric information, material information, and other physical and functional information needed of analyzing and managing a project. Structural analysis aims to simulate the structural performance of a building under different types of loads to make sure the structure is safe. The needed information for structural analysis mainly include geometric, material, and load information. These information come from architectural design and selected analysis scenarios. The information should be represented in an interoperable way to allow information transfer between different phases and different stakeholders. Information missing is a crucial problem during the interoperable use of BIM, which may cause misunderstandings between different stakeholders and therefore erroneous structural analysis result and misleading information to feed construction process later on. In this paper, the authors focus on analyzing the use of IFC at three stages in structural analysis, namely, intrinsic modeling stage, extrinsic modeling stage, and the analysis stage. The authors compared IFC files at these three stages with original BIM software text files in terms of information coverage, and identified information missing cases. This is the first systematic investigation of BIM interoperability at detailed work stages of structural analysis and provides insights in how BIM usage should be improved in this domain.more » « less
-
Background People’s health-related knowledge influences health outcomes, as this knowledge may influence whether individuals follow advice from their doctors or public health agencies. Yet, little attention has been paid to where people obtain health information and how these information sources relate to the quality of knowledge. Objective We aim to discover what information sources people use to learn about health conditions, how these sources relate to the quality of their health knowledge, and how both the number of information sources and health knowledge change over time. Methods We surveyed 200 different individuals at 12 time points from March through September 2020. At each time point, we elicited participants’ knowledge about causes, risk factors, and preventative interventions for 8 viral (Ebola, common cold, COVID-19, Zika) and nonviral (food allergies, amyotrophic lateral sclerosis [ALS], strep throat, stroke) illnesses. Participants were further asked how they learned about each illness and to rate how much they trust various sources of health information. Results We found that participants used different information sources to obtain health information about common illnesses (food allergies, strep throat, stroke) compared to emerging illnesses (Ebola, common cold, COVID-19, Zika). Participants relied mainly on news media, government agencies, and social media for information about emerging illnesses, while learning about common illnesses from family, friends, and medical professionals. Participants relied on social media for information about COVID-19, with their knowledge accuracy of COVID-19 declining over the course of the pandemic. The number of information sources participants used was positively correlated with health knowledge quality, though there was no relationship with the specific source types consulted. Conclusions Building on prior work on health information seeking and factors affecting health knowledge, we now find that people systematically consult different types of information sources by illness type and that the number of information sources people use affects the quality of individuals’ health knowledge. Interventions to disseminate health information may need to be targeted to where individuals are likely to seek out information, and these information sources differ systematically by illness type.more » « less
An official website of the United States government

