skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Value of Data Records
Many e-commerce platforms use buyers’ personal data to intermediate their transactions with sellers. How much value do such intermediaries derive from the data record of each single individual? We characterize this value and find that one of its key components is a novel externality between records, which arises when the intermediary pools some records to withhold the information they contain. Our analysis has several implications about compensating individuals for the use of their data, guiding companies’ investments in data acquisition, and more broadly studying the demand side of data markets. Our methods combine modern information design with classic duality theory and apply to a large class of principal-agent problems.  more » « less
Award ID(s):
2149315
PAR ID:
10510221
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford Academic
Date Published:
Journal Name:
The Review of Economic Studies
ISSN:
0034-6527
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We conducted a meta-analysis to determine how people blindly comply with, rely on, and depend on diagnostic automation. We searched three databases using combinations of human behavior keywords with automation keywords. The period ranges from January 1996 to June 2021. In total, 8 records and a total of 68 data points were identified. As data points were nested within research records, we built multi-level models (MLM) to quantify relationships between blind compliance and positive predictive value (PPV), blind reliance and negative predictive value (NPV), and blind dependence and overall success likelihood (OSL).Results show that as the automation’s PPV, NPV, and OSL increase, human operators are more likely to blindly follow the automation’s recommendation. Operators appear to adjust their reliance behaviors more than their compliance and dependence. We recommend that researchers report specific automation trial information (i.e., hits, false alarms, misses, and correct rejections) and human behaviors (compliance and reliance) rather than automation OSL and dependence. Future work could examine how operator behaviors change when operators are not blind to raw data. Researchers, designers, and engineers could leverage understanding of operator behaviors to inform training procedures and to benefit individual operators during repeated automation use. 
    more » « less
  2. null (Ed.)
    People around the world own digital media devices that mediate and are in close proximity to their daily behaviours and situational contexts. These devices can be harnessed as sensing technologies to collect information from sensor and metadata logs that provide fine–grained records of everyday personality expression. In this paper, we present a conceptual framework and empirical illustration for personality sensing research, which leverages sensing technologies for personality theory development and assessment. To further empirical knowledge about the degree to which personality–relevant information is revealed via such data, we outline an agenda for three research domains that focus on the description, explanation, and prediction of personality. To illustrate the value of the personality sensing research agenda, we present findings from a large smartphone–based sensing study ( N = 633) characterizing individual differences in sensed behavioural patterns (physical activity, social behaviour, and smartphone use) and mapping sensed behaviours to the Big Five dimensions. For example, the findings show associations between behavioural tendencies and personality traits and daily behaviours and personality states. We conclude with a discussion of best practices and provide our outlook on how personality sensing will transform our understanding of personality and the way we conduct assessment in the years to come. © 2020 European Association of Personality Psychology 
    more » « less
  3. Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots. As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life. However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature. Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers. To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data. In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information. To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period. Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information; a PB-HIDE model to protect users from potentially leaked information. Specifically, we systematically measure the user uniqueness on users' finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces). Our measurement results reveal (i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses; (ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers. Moreover, the evaluation results show our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility. 
    more » « less
  4. Abstract PremiseDigitized biodiversity data offer extensive information; however, obtaining and processing biodiversity data can be daunting. Complexities arise during data cleaning, such as identifying and removing problematic records. To address these issues, we created the R package Geographic And Taxonomic Occurrence R‐based Scrubbing (gatoRs). Methods and ResultsThe gatoRs workflow includes functions that streamline downloading records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). We also created functions to clean downloaded specimen records. Unlike previous R packages, gatoRs accounts for differences in download structure between GBIF and iDigBio and allows for user control via interactive cleaning steps. ConclusionsOur pipeline enables the scientific community to process biodiversity data efficiently and is accessible to the R coding novice. We anticipate that gatoRs will be useful for both established and beginning users. Furthermore, we expect our package will facilitate the introduction of biodiversity‐related concepts into the classroom via the use of herbarium specimens. 
    more » « less
  5. The healthcare sector is constantly improving patient health record systems. However, these systems face a significant challenge when confronted with patient health record (PHR) data due to its sensitivity. In addition, patient’s data is stored and spread generally across various healthcare facilities and among providers. This arrangement of distributed data becomes problematic whenever patients want to access their health records and then share them with their care provider, which yields a lack of interoperability among various healthcare systems. Moreover, most patient health record systems adopt a centralized management structure and deploy PHRs to the cloud, which raises privacy concerns when sharing patient information over a network. Therefore, it is vital to design a framework that considers patient privacy and data security when sharing sensitive information with healthcare facilities and providers. This paper proposes a blockchain framework for secured patient health records sharing that allows patients to have full access and control over their health records. With this novel approach, our framework applies the Ethereum blockchain smart contracts, the Inter-Planetary File System (IPFS) as an off-chain storage system, and the NuCypher protocol, which functions as key management and blockchain-based proxy re-encryption to create a secured on-demand patient health records sharing system effectively. Results show that the proposed framework is more secure than other schemes, and the PHRs will not be accessible to unauthorized providers or users. In addition, all encrypted data will only be accessible to and readable by verified entities set by the patient. 
    more » « less