Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR), the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Synthetic data that retains the statistical properties of the original data but does not include actual individuals’ information provides a suitable alternative to sharing sensitive datasets. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats, such as re-identification and linkage issues, and secure data based on applicable regulatory policy rules.
more »
« less
A STATISTICAL OVERVIEW ON DATA PRIVACY
The eruption of big data with the increasing collection and processing of vast volumes and variety of data have led to breakthrough discoveries and innovation in science, engineering, medicine, commerce, criminal justice, and national security that would not have been possible in the past. While there are many benefits to the collection and usage of big data, there are also growing concerns among the general public on what personal information is collected and how it is used. In addition to legal policies and regulations, technological tools and statistical strategies also exist to promote and safeguard individual privacy, while releasing and sharing useful population-level information. In this overview, I introduce some of these approaches, as well as the existing challenges and opportunities in statistical data privacy research and applications to better meet the practical needs of privacy protection and information sharing.
more »
« less
- Award ID(s):
- 1717417
- PAR ID:
- 10187183
- Date Published:
- Journal Name:
- Notre Dame journal of law ethics public policy
- Volume:
- 34
- Issue:
- 2
- ISSN:
- 0883-3648
- Page Range / eLocation ID:
- 477-500
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In graph machine learning, data collection, sharing, and analysis often involve multiple parties, each of which may require varying levels of data security and privacy. To this end, preserving privacy is of great importance in protecting sensitive information. In the era of big data, the relationships among data entities have become unprecedentedly complex, and more applications utilize advanced data structures (i.e., graphs) that can support network structures and relevant attribute information. To date, many graph-based AI models have been proposed (e.g., graph neural networks) for various domain tasks, like computer vision and natural language processing. In this paper, we focus on reviewing privacypreserving techniques of graph machine learning. We systematically review related works from the data to the computational aspects. We rst review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information (e.g., graph model parameters) to realize the optimization-based computation when data sharing among multiple parties is risky or impossible. In addition to discussing relevant theoretical methodology and software tools, we also discuss current challenges and highlight several possible future research opportunities for privacy-preserving graph machine learning. Finally, we envision a uni ed and comprehensive secure graph machine learning system.more » « less
-
Fitness trackers are an increasingly popular tool for tracking one’s health and physical activity. While research has evaluated the potential benefits of these devices for health and well-being, few studies have empirically evaluated users’ behaviors when sharing personal fitness information (PFI) and the privacy concerns that stem from the collection, aggregation, and sharing of PFI. In this study, we present findings from a survey of Fitbit and Jawbone users (N=361) to understand how concerns about privacy in general and user- generated data in particular affect users’ mental models of PFI privacy, tracking, and sharing. Findings highlight the complex relationship between users’ demographics, sharing behaviors, privacy concerns, and internet skills with how valuable and sensitive they rate their PFI. We conclude with a discussion of opportunities to increase user awareness of privacy and PFI.more » « less
-
Fitness trackers are an increasingly popular tool for tracking one’s health and physical activity. While research has evaluated the potential benefits of these devices for health and well-being, few studies have empirically evaluated users’ behaviors when sharing personal fitness information (PFI) and the privacy concerns that stem from the collection, aggregation, and sharing of PFI. In this study, we present findings from a survey of Fitbit and Jawbone users (N=361) to understand how concerns about privacy in general and user- generated data in particular affect users’ mental models of PFI privacy, tracking, and sharing. Findings highlight the complex relationship between users’ demographics, sharing behaviors, privacy concerns, and internet skills with how valuable and sensitive they rate their PFI. We conclude with a discussion of opportunities to increase user awareness of privacy and PFI.more » « less
-
Abstract One of the major challenges in ensuring global food security is the ever‐changing biotic risk affecting the productivity and efficiency of the global food supply system. Biotic risks that threaten food security include pests and diseases that affect pre‐ and postharvest terrestrial agriculture and aquaculture. Strategies to minimize this risk depend heavily on plant and animal disease research. As data collected at high spatial and temporal resolutions become increasingly available, epidemiological models used to assess and predict biotic risks have become more accurate and, thus, more useful. However, with the advent of Big Data opportunities, a number of challenges have arisen that limit researchers’ access to complex, multi‐sourced, multi‐scaled data collected on pathogens, and their associated environments and hosts. Among these challenges, one of the most limiting factors is data privacy concerns from data owners and collectors. While solutions, such as the use of de‐identifying and anonymizing tools that protect sensitive information are recognized as effective practices for use by plant and animal disease researchers, there are comparatively few platforms that include data privacy by design that are accessible to researchers. We describe how the general thinking and design used for data sharing and analysis platforms can intrinsically address a number of these data privacy‐related challenges that are a barrier to researchers wanting to access data. We also describe how some of the data privacy concerns confronting plant and animal disease researchers are addressed by way of the GEMS informatics platform.more » « less
An official website of the United States government

