NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimizing Nested Recursive Queries

https://doi.org/10.1145/3639271

Shaikhha, Amir; Suciu, Dan; Schleich, Maximilian; Ngo, Hung (March 2024, Proceedings of the ACM on Management of Data)

Datalog is a declarative programming language that has gained popularity in various domains due to its simplicity, expressiveness, and efficiency. But pure Datalog is limited to monotone queries, and cannot be used in most practical applications. For that reason, newer systems are relaxing the language by allowing non-monotone queries to be freely combined with recursion. But by departing from the elegant fixpoint semantics of pure datalog, these systems often result in inefficient query execution, for example they perform redundant computations, or use redundant storage. In this paper, we propose Temporel, a system that allows recursion to be freely combined with non-monotone operators. Temporel optimizes the program by compiling it into a novel intermediate representation that we call TempoDL. Our experimental results show that our system outperforms a state-of-the-art Datalog engine as well as a vectorized and a compiled in-memory database system for a wide range of applications from machine learning to graph processing.
more » « less
Full Text Available
Optimizing Tensor Programs on Flexible Storage

https://doi.org/10.1145/3588717

Schleich, Maximilian; Shaikhha, Amir; Suciu, Dan (May 2023, Proceedings of the ACM on Management of Data)

Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the Coordinate Format, the Compressed Sparse Row format, and many others, that were especially designed to optimally store tensors with specific sparsity properties. However, existing tensor processing systems require specialized extensions in order to take advantage of every new storage format. In this paper we describe a system that allows users to define flexible storage formats in a declarative tensor query language, similar to the language used by the tensor program. The programmer only needs to write storage mappings, which describe, in a declarative way, how the tensors are laid out in main memory. Then, we describe a cost-based optimizer that optimizes the tensor program for the specific memory layout. We demonstrate empirically significant performance improvements compared to state-of-the-art tensor processing systems.
more » « less
Full Text Available
On the Tractability of SHAP Explanations

https://doi.org/10.1613/jair.1.13283

Van den Broeck, Guy; Lykov, Anton; Schleich, Maximilian; Suciu, Dan (May 2022, Journal of Artificial Intelligence Research)

SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.
more » « less
Full Text Available
GeCo: quality counterfactual explanations in real time

https://doi.org/10.14778/3461535.3461555

Schleich, Maximilian; Geng, Zixuan; Zhang, Yihong; Suciu, Dan (May 2021, Proceedings of the VLDB Endowment)

Machine learning is increasingly applied in high-stakes decision making that directly affect people's lives, and this leads to an increased demand for systems to explain their decisions. Explanations often take the form ofcounterfactuals, which consists of conveying to the end user what she/he needs to change in order to improve the outcome. Computing counterfactual explanations is challenging, because of the inherent tension between a rich semantics of the domain, and the need for real time response. In this paper we present CeCo, the first system that can compute plausible and feasible counterfactual explanations in real time. At its core, CeCo relies on a genetic algorithm, which is customized to favor searching counterfactual explanations with the smallest number of changes. To achieve real-time performance, we introduce two novel optimizations: Δ-representation of candidate counterfactuals, and partial evaluation of the classifier. We compare empirically CeCo against five other systems described in the literature, and show that it is the only system that can achieve both high quality explanations and real time answers.
more » « less
Full Text Available
On the Tractability of SHAP Explanations

Van den Broeck, Guy; Lykov, Anton; Schleich, Maximilian; Suciu, Dan (January 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.
more » « less
Full Text Available
Causality-based Explanation of Classification Outcomes

https://doi.org/10.1145/3399579.3399865

Bertossi, Leopoldo; Li, Jordan; Schleich, Maximilian; Suciu, Dan; Vagena, Zografoula (January 2020, DEEM'20: Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning)
null (Ed.)
Full Text Available
Rk-means: Fast Clustering for Relational Data

Curtin, Ryan; Moseley, Benjamin; Ngo, Hung; Nguyen, XuanLong; Olteanu, Dan; Schleich, Maximilian (January 2020, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics)
null (Ed.)
Full Text Available
On Functional Aggregate Queries with Additive Inequalities

https://doi.org/10.1145/3294052.3319694

Abo Khamis, Mahmoud; Curtin, Ryan R.; Moseley, Benjamin; Ngo, Hung Q.; Nguyen, XuanLong; Olteanu, Dan; Schleich, Maximilian (January 2019, Symposium on Principles of Database Systems)

Full Text Available

Search for: All records