NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stochastic SketchRefine: Scaling In-Database Decision-Making under Uncertainty to Millions of Tuples

Haque, Riddho R; Mai, Anh L; Brucato, Matteo; Abouzied, Azza; Haas, Peter J; Meliou, Alexandra (September 2025, Proceedings of the VLDB Endowment)

Decision making under uncertainty often requires choosing packages, or bags of tuples, that collectively optimize expected outcomes while limiting risks. Processing Stochastic Package Queries (SPQs) involves solving very large optimization problems on uncertain data. Monte Carlo methods create numerous scenarios, or sample realizations of the stochastic attributes of all the tuples, and generate packages with optimal objective values across these scenarios. The number of scenarios needed for accurate approximation---and hence the size of the optimization problem when using prior methods---increases with variance in the data, and the search space of the optimization problem increases exponentially with the number of tuples in the relation. Existing solvers take hours to process SPQs on large relations containing stochastic attributes with high variance. Besides enriching the SPaQL language to capture a broader class of risk specifications, we make two fundamental contributions toward scalable SPQ processing. First, we propose risk-constraint linearization (RCL), which converts SPQs into Integer Linear Programs (ILPs) whose size is independent of the number of scenarios used. Solving these ILPs gives us feasible and near-optimal packages. Second, we propose Stochastic Sketch Refine, a divide and conquer framework that breaks down a large stochastic optimization problem into subproblems involving smaller subsets of tuples. Our experiments show that, together, RCL and Stochastic Sketch Refine produce high-quality packages in orders of magnitude lower runtime than the state of the art.
more » « less
Free, publicly-accessible full text available September 1, 2026
Data Management Perspectives on Prescriptive Analytics (Invited Talk)

https://doi.org/10.4230/LIPIcs.ICDT.2025.2

Meliou, Alexandra; Abouzied, Azza; Haas, Peter J; Haque, Riddho R; Mai, Anh; Vittis, Vasileios (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Roy, Sudeepa; Kara, Ahmet (Ed.)
Decision makers in a broad range of domains, such as finance, transportation, manufacturing, and healthcare, often need to derive optimal decisions given a set of constraints and objectives. Traditional solutions to such constrained optimization problems are typically application-specific, complex, and do not generalize. Further, the usual workflow requires slow, cumbersome, and error-prone data movement between a database, and predictive-modeling and optimization packages. All of these problems are exacerbated by the unprecedented size of modern data-intensive optimization problems. The emerging research area of in-database prescriptive analytics aims to provide seamless domain-independent, declarative, and scalable approaches powered by the system where the data typically resides: the database. Integrating optimization with database technology opens up prescriptive analytics to a much broader community, amplifying its benefits. We discuss how deep integration between the DBMS, predictive models, and optimization software creates opportunities for rich prescriptive-query functionality with good scalability and performance. Summarizing some of our main results and ongoing work in this area, we highlight challenges related to usability, scalability, data uncertainty, and dynamic environments, and argue that perspectives from data management research can drive novel strategies and solutions.
more » « less
Free, publicly-accessible full text available January 1, 2026
Non-Invasive Fairness in Learning Through the Lens of Data Drift

https://doi.org/10.1109/ICDE60146.2024.00172

Yang, Ke; Meliou, Alexandra (May 2024, IEEE)

Full Text Available
Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization

https://doi.org/10.14778/3641204.3641222

Mai, Anh L; Wang, Pengyu; Abouzied, Azza; Brucato, Matteo; Haas, Peter J; Meliou, Alexandra (January 2024, Proceedings of the VLDB Endowment)

A package query returns a package---a multiset of tuples---that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as kd-tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.
more » « less
Full Text Available
Diversity, Equity and Inclusion Activities in Database Conferences: A 2023 Report

https://doi.org/10.1145/3685980.3685996

Amer-Yahia, Sihem; Agrawal, Divyakant; Amsterdamer, Yael; Bhowmick, Sourav S; Borovica-Gajic, Renata; Camacho-Rodríguez, Jesús; Cao, Jinli; Catania, Barbara; Chrysanthis, Panos K; Curino, Carlo; et al (July 2024, ACM SIGMOD Record)

The Diversity, Equity and Inclusion (DEI) initiative started as the Diversity/Inclusion initiative in 2020 [4]. The current report summarizes our activities in 2023.
more » « less
Full Text Available
Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification

https://doi.org/10.1145/3514221.3517841

Islam, Maliha Tashfia; Fariha, Anna; Meliou, Alexandra; Salimi, Babak (June 2022, Proceedings of the 2022 International Conference on Management of Data (SIGMOD))

Full Text Available
Improved Approximation and Scalability for Fair Max-Min Diversification

https://doi.org/10.4230/LIPIcs.ICDT.2022.7

Addanki, Raghavendra; McGregor, Andrew; Meliou, Alexandra; Moumoulidou, Zafeiria (March 2022, 25th International Conference on Database Theory (ICDT))

Full Text Available
DataPrism: Exposing Disconnect between Data and Systems

https://doi.org/10.1145/3514221.3517864

Galhotra, Sainyam; Fariha, Anna; Lourenço, Raoni; Freire, Juliana; Meliou, Alexandra; Srivastava, Divesh (June 2022, Proceedings of the 2022 International Conference on Management of Data (SIGMOD))

Full Text Available
Improved Approximation and Scalability for Fair Max-Min Diversification

Addanki, Raghavendra; McGregor, Andrew; Meliou, Alexandra; Moumoulidou, Zafeiria (January 2022, ICDT 2022)

Full Text Available
CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning

https://doi.org/10.1145/3448016.3452750

Fariha, Anna; Tiwari, Ashish; Meliou, Alexandra; Radhakrishna, Arjun; Gulwani, Sumit (June 2021, International Conference on Management of Data (SIGMOD))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records