skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns
The emergence of database-as-a-service platforms has made deploying database applications easier than before. Now, developers can quickly create scalable applications. However, designing performant, maintainable, and accurate applications is challenging. Developers may unknowingly introduce anti-patterns in the application's SQL statements. These anti-patterns are design decisions that are intended to solve a problem, but often lead to other problems by violating fundamental design principles. In this paper, we present SQLCheck, a holistic toolchain for automatically finding and fixing anti-patterns in database applications. We introduce techniques for automatically (1) detecting anti-patterns with high precision and recall, (2) ranking the anti-patterns based on their impact on performance, maintainability, and accuracy of applications, and (3) suggesting alternative queries and changes to the database design to fix these anti-patterns. We demonstrate the prevalence of these anti-patterns in a large collection of queries and databases collected from open-source repositories. We introduce an anti-pattern detection algorithm that augments query analysis with data analysis. We present a ranking model for characterizing the impact of frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for high-impact anti-patterns using rule-based query refactoring techniques. Our experiments demonstrate that SQLCheck enables developers to create more performant, maintainable, and accurate applications.  more » « less
Award ID(s):
1908984
PAR ID:
10147833
Author(s) / Creator(s):
Date Published:
Journal Name:
ACM International Conference on the Management of Data (SIGMOD)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We will demonstrate a prototype of sqlcheck, a holistic toolchain for automatically finding and fixing anti-patterns in database appli- cations. The advent of modern database-as-a-service platforms has made it easy for developers to quickly create scalable applications. However, it is still challenging for developers to design performant, maintainable, and accurate applications. This is because develop- ers may unknowingly introduce anti-patterns in the application’s SQL statements. These anti-patterns are design decisions that are intended to solve a problem, but often lead to other problems by violating fundamental design principles. sqlcheck leverages techniques for automatically: (1) detecting anti-patterns with high accuracy, (2) ranking them based on their impact on performance, maintainability, and accuracy of applica- tions, and (3) suggesting alternative queries and changes to the database design to fix these anti-patterns. We will demonstrate that sqlcheck enables developers to create more performant, maintain- able, and accurate applications. We will show the prevalence of these anti-patterns in a large collection of queries and databases collected from open-source repositories. 
    more » « less
  2. null (Ed.)
    Systems for ML inference are widely deployed today, but they typically optimize ML inference workloads using techniques designed for conventional data serving workloads and miss critical opportunities to leverage the statistical nature of ML. In this paper, we present WILLUMP, an optimizer for ML inference that introduces two statistically-motivated optimizations targeting ML applications whose performance bottleneck is feature computation. First, WILLUMP automatically cascades feature computation for classification queries: WILLUMP classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5× without statistically significant accuracy loss. Second, WILLUMP accurately approximates ML top-K queries, discarding low-scoring inputs with an automatically constructed approximate model and then ranking the remainder with a more powerful model, improving query performance by up to 10× with minimal accuracy loss. WILLUMP automatically tunes these optimizations’ parameters to maximize query performance while meeting an accuracy target. Moreover, WILLUMP complements these statistical optimizations with compiler optimizations to automatically generate fast inference code for ML applications. We show that WILLUMP improves the end-to-end performance of real-world ML inference pipelines curated from major data science competitions by up to 16× without statistically significant loss of accuracy. 
    more » « less
  3. Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users to directly edit a step-by-step explanation of a query to fix errors. Our experiments on multiple datasets, as well as a user study with 24 participants, demonstrate that our approach can achieve better performance than multiple SOTA approaches. 
    more » « less
  4. We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over videos from limited user interactions. The user only provides a handful of positive and negative examples of what they are looking for. EQUI-VOCAL utilizes these initial examples and additional ones collected through active learning to efficiently synthesize complex user queries. Our approach enables users to find events without database expertise, with limited labeling effort, and without declarative specifications or sketches. Core to EQUI-VOCAL's design is the use of spatio-temporal scene graphs in its data model and query language and a novel query synthesis approach that works on large and noisy video data. Our system outperforms two baseline systems---in terms of F1 score, synthesis time, and robustness to noise---and can flexibly synthesize complex queries that the baselines do not support.

     
    more » « less
  5. Modern web applications have stringent latency requirements while processing an ever-increasing amount of user data. To address these challenges and improve programmer productivity, Object Relational Mapping (ORM) frameworks have been developed to allow developers writing database processing code in an object-oriented manner. Despite such frameworks, prior work found that developers still struggle in developing performant ORM-based web applications. This paper presents PowerStation, a RubyMine IDE plugin for optimizing web applications developed using the Ruby on Rails ORM. Using automated static analysis, PowerStation detects ORMrelated inefficiency problems and suggests fixes to developers. Our evaluation on 12 real-world applications shows that PowerStation can automatically detects 1221 performance issues across all of them. We have uploaded a tutorial on using PowerStation plugin to https://youtu.be/v_uY5bjGuK0. 
    more » « less