Search for: All records

Creators/Authors contains: "Dasgupta, S."

« Prev Next »

Total Resources

26

Resource Type
Conference Paper

11

Conference Proceeding

1

Dataset

0

Journal Article

14

Workshop Report

0

Availability
Full Text / Resource Available

25

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PK2G - Declarative Construction and Quality Evaluation of Knowledge Graphs from Polystores

Zheng, X ; Dasgupta S ; Gupta, A ( September 2023 , Workshop on Knowledge Graphs Analysis on a Large Scale)
Abelló, A ; Vassiliadis, P ; Romero, O ; Wrembel, R ; Bugiotti, F ; Gamper, J ; Vargas-Solar, G ; Zumpano, E (Ed.)
Constructing knowledge graphs from heterogeneous data sources and evaluating their quality and consistency are important research questions in the field of knowledge graphs. We propose mapping rules to guide users to translate data from relational and graph sources into a meaningful knowledge graph and design a user-friendly language to specify the mapping rules. Given the mapping rules and constraints on source data, equivalent constraints on the target graph can be inferred, which is referred to as data source constraints. Besides this type of constraint, we design other two types: user-specified constraints and general rules that a high-quality knowledge graph should adhere to. We translate the three types of constraints into uniform expressions in the form of graph functional dependencies and extended graph dependencies, which can be used for consistency checking. Our approach provides a systematic way to build and evaluate knowledge graphs from diverse data sources.
more » « less
Free, publicly-accessible full text available September 4, 2024
Online k-means clustering on arbitrary data streams

Bhattacharjee, R. ; Imola, J. ; Moshkovitz, M. ; Dasgupta, S. ( April 2023 , Proceedings of Machine Learning Research)

We consider k-means clustering in an online setting where each new data point is assigned to its closest cluster center and incurs a loss equal to the squared distance to that center, after which the algorithm is allowed to update its centers. The goal over a data stream X is to achieve a total loss that is not too much larger than L(X, OPT), the best possible loss using k fixed centers in hindsight. We give the first algorithm to achieve polynomial space and time complexity in the online setting.
more » « less
Full Text Available
Online k-means clustering on arbitrary data streams

Bhattacharjee, R. ; Imola, J. ; Moshkovitz, M. ; Dasgupta, S. ( January 2023 , Proceedings of Machine Learning Research)

We consider k-means clustering in an online setting where each new data point is assigned to its closest cluster center and incurs a loss equal to the squared distance to that center, after which the algorithm is allowed to update its centers. The goal over a data stream X is to achieve a total loss that is not too much larger than the best possible loss using k fixed centers in hindsight. Ours is the first algorithm to achieve polynomial space and time complexity in the online setting. We note that our results have implications to the related streaming setting, where one final clustering is outputted, and the no-substitution setting, where center selections are permanent. We show a general reduction between the no-substitution cost of a blackbox algorithm and its online cost. Finally, we translate our algorithm to the no-substitution setting and streaming settings, and it competes with and can outperform existing work in the areas.
more » « less
Full Text Available
Realizing topological edge states with Rydberg-atom synthetic dimensions

https://doi.org/10.1038/s41467-022-28550-y

Kanungo, S. K. ; Whalen, J. D. ; Lu, Y. ; Yuan, M. ; Dasgupta, S. ; Dunning, F. B. ; Hazzard, K. R. ; Killian, T. C. ( December 2022 , Nature Communications)

Abstract A discrete degree of freedom can be engineered to match the Hamiltonian of particles moving in a real-space lattice potential. Such synthetic dimensions are powerful tools for quantum simulation because of the control they offer and the ability to create configurations difficult to access in real space. Here, in an ultracold 84 Sr atom, we demonstrate a synthetic-dimension based on Rydberg levels coupled with millimeter waves. Tunneling amplitudes between synthetic lattice sites and on-site potentials are set by the millimeter-wave amplitudes and detunings respectively. Alternating weak and strong tunneling in a one-dimensional configuration realizes the single-particle Su-Schrieffer-Heeger (SSH) Hamiltonian, a paradigmatic model of topological matter. Band structure is probed through optical excitation from the ground state to Rydberg levels, revealing symmetry-protected topological edge states at zero energy. Edge-state energies are robust to perturbations of tunneling-rates that preserve chiral symmetry, but can be shifted by the introduction of on-site potentials.
more » « less
Full Text Available
Dataland: An Informed, Situated, and Critical Approach to Data Literacy

Wang, L. ; Dasgupta, S. ( April 2022 , General Proceedings of the 2nd Annual Meeting of the International Society of the Learning Sciences 2022)
Oshima, J. ; Mochizuki, T. ; Hayashi, Y. (Ed.)
Full Text Available
Constants Matter: The Performance Gains of Active Learning

Mussmann, S. ; Dasgupta, S. ( January 2022 , Proceedings of the 39th International Conference on Machine Learning)

Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.
more » « less
Full Text Available
Convergence of online k-means

So, G. ; Mahajan, G. ; Dasgupta, S. ( January 2022 , Proceedings of The 25th International Conference on Artificial Intelligence and Statistics)

We prove asymptotic convergence for a general class of k-means algorithms performed over streaming data from a distribution— the centers asymptotically converge to the set of stationary points of the k-means objective function. To do so, we show that online k-means over a distribution can be interpreted as stochastic gradient descent with a stochastic learning rate schedule. Then, we prove convergence by extending techniques used in optimization literature to handle settings where center-specific learning rates may depend on the past trajectory of the centers.
more » « less
Full Text Available
Robust learning from discriminative feature feedback

Dasgupta, S. ; Sabato, S. ( July 2020 , International Conference on Artificial Intelligence and Statistics)

Recent work introduced the model of learning from discriminative feature feedback, in which a human annotator not only provides labels of instances, but also identifies discriminative features that highlight important differences between pairs of instances. It was shown that such feedback can be conducive to learning, and makes it possible to efficiently learn some concept classes that would otherwise be in- tractable. However, these results all relied upon perfect annotator feedback. In this pa- per, we introduce a more realistic, robust ver- sion of the framework, in which the annotator is allowed to make mistakes. We show how such errors can be handled algorithmically, in both an adversarial and a stochastic setting. In particular, we derive regret bounds in both settings that, as in the case of a perfect an- notator, are independent of the number of features. We show that this result cannot be obtained by a naive reduction from the robust setting to the non-robust setting.
more » « less
Full Text Available
A non-parametric test to detect data-copying in generative models

Meehan, C. ; Chaudhuri, K. ; Dasgupta, S. ( July 2020 , International Conference on Artificial Intelligence and Statistics)

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call data-copying – where the gener- ative model memorizes and outputs training samples or small variations thereof. We pro- vide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target dis- tribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets.
more » « less
Full Text Available
What relations are reliably embeddable in Euclidean space?

Bhattacharjee, R. ; Dasgupta, S. ( March 2020 , International Conference on Algorithmic Learning Theory)

We consider the problem of embedding a relation, represented as a directed graph, into Euclidean space. For three types of embeddings motivated by the recent literature on knowledge graphs, we obtain characterizations of which relations they are able to capture, as well as bounds on the minimal dimensionality and precision needed.
more » « less
Full Text Available

« Prev Next »