NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Constrained-Based Differential Privacy

https://doi.org/10.4230/LIPIcs.CP.2021.2

Fioretto, Ferdinando (January 2021, International Conference on Principles and Practice of Constraint Programming (CP 2021))

Data sets and statistics about groups of individuals are increasingly collected and released, feeding many optimization and learning algorithms. In many cases, the released data contain sensitive information whose privacy is strictly regulated. For example, in the U.S., the census data is regulated under Title 13, which requires that no individual be identified from any data released by the Census Bureau. In Europe, data release is regulated according to the General Data Protection Regulation, which addresses the control and transfer of personal data. Differential privacy has emerged as the de-facto standard to protect data privacy. In a nutshell, differentially private algorithms protect an individual’s data by injecting random noise into the output of a computation that involves such data. While this process ensures privacy, it also impacts the quality of data analysis, and, when private data sets are used as inputs to complex machine learning or optimization tasks, they may produce results that are fundamentally different from those obtained on the original data and even rise unintended bias and fairness concerns. In this talk, I will first focus on the challenge of releasing privacy-preserving data sets for complex data analysis tasks. I will introduce the notion of Constrained-based Differential Privacy (C-DP), which allows casting the data release problem to an optimization problem whose goal is to preserve the salient features of the original data. I will review several applications of C-DP in the context of very large hierarchical census data, data streams, energy systems, and in the design of federated data-sharing protocols. Next, I will discuss how errors induced by differential privacy algorithms may propagate within a decision problem causing biases and fairness issues. This is particularly important as privacy-preserving data is often used for critical decision processes, including the allocation of funds and benefits to states and jurisdictions, which ideally should be fair and unbiased. Finally, I will conclude with a roadmap to future work and some open questions.
more » « less
Full Text Available
The Data Minimization Principle in Machine Learning

https://doi.org/10.1145/3715275.3732195

Ganesh, Prakhar; Tran, Cuong; Shokri, Reza; Fioretto, Ferdinando (June 2025, ACM)

Full Text Available
Fairness Issues and Mitigations in (Differentially Private) Socio-Demographic Data Processes

https://doi.org/10.1609/aaai.v39i27.35035

Ko, Joonhyuk; Ziani, Juba; Das, Saswat; Williams, Matt; Fioretto, Ferdinando (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Statistical agencies rely on sampling techniques to collect socio-demographic data crucial for policy-making and resource allocation. This paper shows that surveys of important societal relevance introduce sampling errors that unevenly impact group-level estimates, thereby compromising fairness in downstream decisions. To address these issues, this paper introduces an optimization approach modeled on real-world survey design processes, ensuring sampling costs are optimized while maintaining error margins within prescribed tolerances. Additionally, privacy-preserving methods used to determine sampling rates can further impact these fairness issues. This paper explores the impact of differential privacy on the statistics informing the sampling process, revealing a surprising effect: not only is the expected negative effect from the addition of noise for differential privacy negligible, but also this privacy noise can in fact reduce unfairness as it positively biases smaller counts. These findings are validated over an extensive analysis using datasets commonly applied in census statistics.
more » « less
Full Text Available
FairDP: Achieving Fairness Certification with Differential Privacy

https://doi.org/10.1109/SaTML64287.2025.00058

Tran, Khang; Fioretto, Ferdinando; Khalil, Issa; Thai, My T; Phan, Linh_Thi Xuan; Phan, NhatHai (April 2025, 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML))

Full Text Available
Low-rank finetuning for LLMs: A fairness perspective

Das, Saswat; Romanelli, Marco; Tran, Cuong; Kailkhura, Bhavya; Fioretto, Ferdinando (February 2025, AAAI CoLoRA Workshop, 2025)

Full Text Available
Differentially Private Graph Data Release: Inefficiencies Unfairness

Fioretto, Ferdinando; Sen, Diptangshu; Ziani, Juba (February 2025, In International Conference for Artificial Intelligence and Statistics, 2025)

Full Text Available
Differential Privacy in Artificial Intelligence: From Theory to Practice

https://doi.org/10.1561/9781638284772

Fioretto, Ferdinando; Van_Hentenryck, Pascal (January 2025, Now Publishers)

Full Text Available
On the Effects of Fairness to Adversarial Vulnerability

https://doi.org/10.24963/ijcai.2024/58

Tran, Cuong; Zhu, Keyu; Van_Hentenryck, Pascal; Fioretto, Ferdinando (August 2024, International Joint Conferences on Artificial Intelligence (IJCAI))

Fairness and robustness are two important notions of learning models. Fairness ensures that models do not disproportionately harm (or benefit) some groups over others, while robustness measures the models' resilience against small input perturbations. While equally important properties, this paper illustrates a dichotomy between fairness and robustness, and analyzes when striving for fairness decreases the model robustness to adversarial samples. The reported analysis sheds light on the factors causing such contrasting behavior, suggesting that distance to the decision boundary across groups as a key factor. Experiments on non-linear models and different architectures validate the theoretical findings. In addition to the theoretical analysis, the paper also proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.
more » « less
Full Text Available
The Data Minimization Principle in Machine Learning

Ganesh, Prakhar; Tran, Cuong; Shokri, Reza; Fioretto, Ferdinando (May 2024, In ICML 2024 Workshop on Generative AI and Law, 2024)

Full Text Available
Finding ε and δ of Traditional Disclosure Control Systems

https://doi.org/10.1609/aaai.v38i20.30204

Das, Saswat; Zhu, Keyu; Task, Christine; Van_Hentenryck, Pascal; Fioretto, Ferdinando (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

This paper analyzes the privacy of traditional Statistical Disclosure Control (SDC) systems under a differential privacy interpretation. SDCs, such as cell suppression and swapping, promise to safeguard the confidentiality of data and are routinely adopted in data analyses with profound societal and economic impacts. Through a formal analysis and empirical evaluation of demographic data from real households in the U.S., the paper shows that widely adopted SDC systems not only induce vastly larger privacy losses than classical differential privacy mechanisms, but, they may also come at a cost of larger accuracy and fairness.
more » « less
Full Text Available

« Prev Next »

Search for: All records