Machine learning now drives the digital economy, yet most toolkits still demand low-level statistical and algorithmic expertise that excludes non-specialists. To remove this barrier, we present the Machine-learning Query Language (MQL) -- a fully declarative interface that lets users express analytic intent as succinctly as SQL expresses data retrieval. An MQL compiler faithfully translates each statement into an executable pipeline on mainstream frameworks such as Scikit-Learn, PyCaret, TPOT, TensorFlow or PyTorch, hiding all procedural detail. Experiments underscore its impact. Compared with hand-coded scripts, MQL cut development effort by 70–85 times for classification, 100–140 times for regression, and 65–80 times for clustering. In 95\% of trials the auto-generated pipelines matched or outperformed the most accurate manually tuned models, and MQL’s framework-selection logic chose the best backend 90\% of the time. By coupling SQL-style abstraction with robust code generation, MQL delivers a decisive leap toward true mass-market, self-service machine learning.
more »
« less
This content will become publicly available on April 8, 2026
Implementing a Declarative Query Language for High Level Machine Learning Application Design
The rising popularity of data science and machine learning (ML) across diverse domains, often driven by users with limited computational expertise, reflects the growing commoditization of ML tools. However, the advanced technical and mathematical knowledge demanded by current ML frameworks poses a formidable barrier for non-experts, preventing them from fully exploiting these powerful platforms.In response, we introduce MQL, a novel declarative query language for ML application design, alongside its corresponding query processing engine. We demonstrate that abstracting ML concepts -- similarly to SQL -- can preserve both processing efficiency and analytical fidelity. Our implementation defines MQL semantics through a semantics-preserving mapping to widely understood ML code fragments. By leveraging task-specific meta-features, heuristic knowledge, and standard assessment methods, our system ranks candidate ML libraries, selects optimal algorithms, and frees users from these choices.We introduce mapping algorithms to ensure that each MQL program retains its intended semantics and present experimental evaluations demonstrating that MQL’s algorithmic selections not only match but surpass human-engineered solutions in terms of performance and model accuracy. By offering declarative queries as a high-level alternative to traditional coding, MQL significantly reduces the complexity of data analysis pipeline construction, thereby democratizing machine learning application design. To foster shared community development, this work is maintained as an open-source project at \url{https://github.com/hmjamil/mql}.
more »
« less
- Award ID(s):
- 2410668
- PAR ID:
- 10631981
- Publisher / Repository:
- SSRN
- Date Published:
- Format(s):
- Medium: X
- Institution:
- University of Idaho
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Emerging domains, such as sensor-driven smart spaces and social media analytics, require incoming data to be enriched prior to its use. Enrichment often consists of machine learning (ML) functions that are too expensive/infeasible to execute at ingestion. We develop a strategy entitled Just-in-time ENrichmeNt in quERy Processing (JENNER) to support interactive analytics over data as soon as it arrives for such application context. JENNER exploits the inherent tradeoffs of cost and quality often displayed by the ML functions to progressively improve query answers during query execution. We describe how JENNER works for a large class of SPJ and aggregation queries that form the bulk of data analytics workload. Our experimental results on real datasets (IoT and Tweet) show that JENNER achieves progressive answers performing significantly better than the naive strategies of achieving progressive computation.more » « less
-
We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over videos from limited user interactions. The user only provides a handful of positive and negative examples of what they are looking for. EQUI-VOCAL utilizes these initial examples and additional ones collected through active learning to efficiently synthesize complex user queries. Our approach enables users to find events without database expertise, with limited labeling effort, and without declarative specifications or sketches. Core to EQUI-VOCAL's design is the use of spatio-temporal scene graphs in its data model and query language and a novel query synthesis approach that works on large and noisy video data. Our system outperforms two baseline systems---in terms of F1 score, synthesis time, and robustness to noise---and can flexibly synthesize complex queries that the baselines do not support.more » « less
-
When transferring sensitive data to a non-trusted party, end-users require that the data be kept private. Mobile and IoT application developers want to leverage the sensitive data to provide better user experience and intelligent services. Unfortunately, existing programming abstractions make it impossible to reconcile these two seemingly conflicting objectives. In this paper, we present a novel programming mechanism for distributed managed execution environments that hides sensitive user data, while enabling developers to build powerful and intelligent applications, driven by the properties of the sensitive data. Specifically, the sensitive data is never revealed to clients, being protected by the runtime system. Our abstractions provide declarative and configurable data query interfaces, enforced by a lightweight distributed runtime system. Developers define when and how clients can query the sensitive data’s properties (i.e., how long the data remains accessible, how many times its properties can be queried, which data query methods apply, etc.). Based on our evaluation, we argue that integrating our novel mechanism with the Java Virtual Machine (JVM) can address some of the most pertinent privacy problems of IoT and mobile applications.more » « less
-
Datalog is a declarative programming language that has gained popularity in various domains due to its simplicity, expressiveness, and efficiency. But pure Datalog is limited to monotone queries, and cannot be used in most practical applications. For that reason, newer systems are relaxing the language by allowing non-monotone queries to be freely combined with recursion. But by departing from the elegant fixpoint semantics of pure datalog, these systems often result in inefficient query execution, for example they perform redundant computations, or use redundant storage. In this paper, we propose Temporel, a system that allows recursion to be freely combined with non-monotone operators. Temporel optimizes the program by compiling it into a novel intermediate representation that we call TempoDL. Our experimental results show that our system outperforms a state-of-the-art Datalog engine as well as a vectorized and a compiled in-memory database system for a wide range of applications from machine learning to graph processing.more » « less
An official website of the United States government
