NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Intervention and Conditioning in Causal Bayesian Networks

Galhotra, Sainyam; Halpern, Joseph Y (December 2024, NeurIPS)

Free, publicly-accessible full text available December 10, 2025
k-Clustering with Comparison and Distance Oracles

https://doi.org/10.1145/3695830

Galhotra, Sainyam; Raychaudhury, Rahul; Sintos, Stavros (November 2024, Proceedings of the ACM on Management of Data)

In this paper, we address clustering problems in scenarios where accurate direct access to the full dataset is impractical or impossible. Instead, we leverage oracle-based methods, which are particularly valuable in real-world applications where the data may be noisy, restricted due to privacy concerns or sheer volume. We utilize two oracles, the quadruplet and the distance oracle. The quadruplet oracle is a weaker oracle that only approximately compares the distances of two pairs of vertices. In practice, these oracles can be implemented using crowdsourcing or training classifiers or other predictive models. On the other hand, the distance oracle returns exactly the distance of two vertices, so it is a stronger and more expensive oracle to implement. We consider two noise models for the quadruplet oracle. In the adversarial noise model, if two pairs have similar distances, the response is chosen by an adversary. In the probabilistic noise model, the pair with the smaller distance is returned with a constant probability. We consider a set V of n vertices in a metric space that supports the quadruplet and the distance oracle. For each of the k-center, k-median, and k-means clustering problem on V, we design constant approximation algorithms that perform roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle in both noise models. When the dataset has low intrinsic dimension, we significantly improve the approximation factors of our algorithms by performing a few additional calls to the distance oracle. We also show that for k-median and k-means clustering there is no hope to return any sublinear approximation using only the quadruplet oracle. Finally, we give constant approximation algorithms for estimating the clustering cost induced by any set of k vertices, performing roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle.
more » « less
Free, publicly-accessible full text available November 4, 2025
Faster Algorithms for Fair Max-Min Diversification in R ^d

https://doi.org/10.1145/3654940

Kurkure, Yash; Shamo, Miles; Wiseman, Joseph; Galhotra, Sainyam; Sintos, Stavros (May 2024, Proceedings of the ACM on Management of Data)

The task of extracting a diverse subset from a dataset, often referred to as maximum diversification, plays a pivotal role in various real-world applications that have far-reaching consequences. In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size k from a large collection of n data points (FairDiv). The FairDiv problem is well-studied in the data management and theory community. In this work, we develop the first constant approximation algorithm for FairDiv that runs in near-linear time using only linear space. In contrast, all previously known constant approximation algorithms run in super-linear time (with respect to n or k) and use super-linear space. Our approach achieves this efficiency by employing a novel combination of the Multiplicative Weight Update method and advanced geometric data structures to implicitly and approximately solve a linear program. Furthermore, we improve the efficiency of our techniques by constructing a coreset. Using our coreset, we also propose the first efficient streaming algorithm for the FairDiv problem whose efficiency does not depend on the distribution of data points. Empirical evaluation on million-sized datasets demonstrates that our algorithm achieves the best diversity within a minute. All prior techniques are either highly inefficient or do not generate a good solution.
more » « less
Full Text Available
Causal What-If and How-To Analysis Using HypeR

https://doi.org/10.1109/ICDE55515.2023.00293

Shen, Fangzhu; Heravi, Kayvon; Gomez, Oscar; Galhotra, Sainyam; Gilad, Amir; Roy, Sudeepa; Salimi, Babak (April 2023, 2023 IEEE 39th International Conference on Data Engineering (ICDE))

Full Text Available
DataPrism: Exposing Disconnect between Data and Systems

https://doi.org/10.1145/3514221.3517864

Galhotra, Sainyam; Fariha, Anna; Lourenço, Raoni; Freire, Juliana; Meliou, Alexandra; Srivastava, Divesh (June 2022, Proceedings of the 2022 International Conference on Management of Data (SIGMOD))

Full Text Available
HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach

https://doi.org/10.1145/3514221.3526149

Galhotra, Sainyam; Gilad, Amir; Roy, Sudeepa; Salimi, Babak (January 2022, SIGMOD'22: International Conference on Management of Data)

Full Text Available
Efficient and effective ER with progressive blocking

https://doi.org/10.1007/s00778-021-00656-7

Galhotra, Sainyam; Firmani, Donatella; Saha, Barna; Srivastava, Divesh (July 2021, The VLDB Journal)
null (Ed.)
Full Text Available
BEER: Blocking for Effective Entity Resolution

https://doi.org/10.1145/3448016.3452747

Galhotra, Sainyam; Firmani, Donatella; Saha, Barna; Srivastava, Divesh (June 2021, SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data)
null (Ed.)
Full Text Available
How to Design Robust Algorithms using Noisy Comparison Oracle

Addanki, Raghavendra; Galhotra, Sainyam; Saha, Barna (January 2021, Proceedings of the VLDB Endowment)
null (Ed.)
Full Text Available
How to Design Robust Algorithms using Noisy Comparison Oracle

https://doi.org/10.14778/3467861.3467862

Addanki, Raghavendra; Galhotra, Sainyam; Saha, Barna (January 2021, Proceedings of the VLDB Endowment)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records