NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Total Variation Distance Meets Probabilistic Inference

Bhattacharyya, Arnab; Gayen, Sutanu; Meel, Kuldeep; Myrisiotis, Dimitrious; Pavan, A; Vinodchandran, N V (July 2024, Proceedings of Machine Learning Research)

Full Text Available
Optimal estimation of Gaussian (poly)trees

Wang, Yuhao; Gao, Ming; Tai, Wai_Ming; Aragam, Bryon; Bhattacharyya, Arnab (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence.
more » « less
Full Text Available
Outlier Robust Multivariate Polynomial Regression

https://doi.org/10.4230/LIPIcs.ESA.2024.12

Arora, Vipul; Bhattacharyya, Arnab; Boban, Mathews; Guruswami, Venkatesan; Kelman, Esty (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)
We study the problem of robust multivariate polynomial regression: let p: ℝⁿ → ℝ be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (𝐱_i,y_i) ∈ [-1,1]ⁿ × ℝ that are noisy versions of (𝐱_i,p(𝐱_i)). More precisely, each 𝐱_i is sampled independently from some distribution χ on [-1,1]ⁿ, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most ρ < 1/2, and otherwise satisfies |y_i-p(𝐱_i)| ≤ σ. The goal is to output a polynomial p̂, of degree at most d in each variable, within an 𝓁_∞-distance of at most O(σ) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n = 1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(dⁿlog d), where the hidden constant depends on n, if χ is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(σ), and the run-time depends on log(1/σ). In the setting where each 𝐱_i and y_i are known up to N bits of precision, the run-time’s dependence on N is linear. We also show that our sample complexities are optimal in terms of dⁿ. Furthermore, we show that it is possible to have the run-time be independent of 1/σ, at the cost of a higher sample complexity.
more » « less
Full Text Available
Model Counting Meets F ₀ Estimation

https://doi.org/10.1145/3603496

Pavan, A.; Vinodchandran, N. V.; Bhattacharyya, Arnab; Meel, Kuldeep S. (September 2023, ACM Transactions on Database Systems)

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSP’s and computation of zeroth frequency moments (F₀) for data streams. Our investigations lead us to observe a striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting andF₀computation. We design a recipe for translating algorithms developed forF₀estimation to model counting, resulting in new algorithms for model counting. We also provide a recipe for transforming sampling algorithm over streams to constraint sampling algorithms. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framingF₀estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works. In particular, our view yields an algorithm for multidimensional range efficientF₀estimation with a simpler analysis.
more » « less
Full Text Available
Constraint Optimization over Semirings

https://doi.org/10.1609/aaai.v37i4.25522

Pavan, A.; Meel, Kuldeep S.; Vinodchandran, N. V.; Bhattacharyya, Arnab (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Interpretations of logical formulas over semirings (other than the Boolean semiring) have applications in various areas of computer science including logic, AI, databases, and security. Such interpretations provide richer information beyond the truth or falsity of a statement. Examples of such semirings include Viterbi semiring, min-max or access control semiring, tropical semiring, and fuzzy semiring. The present work investigates the complexity of constraint optimization problems over semirings. The generic optimization problem we study is the following: Given a propositional formula phi over n variable and a semiring (K,+, . ,0,1), find the maximum value over all possible interpretations of phi over K. This can be seen as a generalization of the well-known satisfiability problem (a propositional formula is satisfiable if and only if the maximum value over all interpretations/assignments over the Boolean semiring is 1). A related problem is to find an interpretation that achieves the maximum value. In this work, we first focus on these optimization problems over the Viterbi semiring, which we call optConfVal and optConf. We first show that for general propositional formulas in negation normal form, optConfVal and optConf are in FP^NP. We then investigate optConf when the input formula phi is represented in the conjunctive normal form. For CNF formulae, we first derive an upper bound on the value of optConf as a function of the number of maximum satisfiable clauses. In particular, we show that if r is the maximum number of satisfiable clauses in a CNF formula with m clauses, then its optConf value is at most 1/4^(m-r). Building on this we establish that optConf for CNF formulae is hard for the complexity class FP^NP[log]. We also design polynomial-time approximation algorithms and establish an inapproximability for optConfVal. We establish similar complexity results for these optimization problems over other semirings including tropical, fuzzy, and access control semirings.
more » « less
Full Text Available
On Approximating Total Variation Distance

https://doi.org/10.24963/ijcai.2023/387

Bhattacharyya, Arnab; Gayen, Sutanu; Meel, Kuldeep S.; Myrisiotis, Dimitrios; Pavan, A.; Vinodchandran, N. V. (August 2023, International Joint Conference on Artificial Intelligence)

Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.
more » « less
Full Text Available
Sample Complexity of Distinguishing Cause from Effect

Acharya, Jayadev; Bhadane, Sourbh; Bhattacharyya, Arnab; Kandasamy, Saravanan; Sun, Ziteng (April 2023, Proceedings of Machine Learning Research)
Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables X and Y, we consider the classical scenario where either X causes Y , Y causes X, or there is an unmeasured confounder between X and Y. We show that if X and Y are over a finite domain of size k and are significantly correlated, the minimum number of interventional samples needed is sublinear in k. We give a tight characterization of the tradeoff between observational and interventional data when the number of observational samples is sufficiently large. We build upon techniques for closeness testing and for non-parametric density estimation in different regimes of observational data. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.
more » « less
Full Text Available
Model Counting Meets Distinct Elements

https://doi.org/10.1145/3607824

Pavan, A.; Vinodchandran, N_V; Bhattacharyya, Arnab; Meel, Kuldeep_S (August 2023, Communications of the ACM)

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F₀) of a data stream. Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works.
more » « less
Model Counting Meets Distinct Elements in a Data Stream

https://doi.org/10.1145/3542700.3542721

Pavan, A.; Vinodchandran, N. V.; Bhattacharyya, Arnab; Meel, Kuldeep S. (May 2022, ACM SIGMOD Record)

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams.
more » « less
Full Text Available
Near-optimal learning of tree-structured distributions by Chow-Liu

https://doi.org/10.1145/3406325.3451066

Bhattacharyya, Arnab; Gayen, Sutanu; Price, Eric; Vinodchandran, N. V. (June 2021, STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records