skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 3, 2026

Title: A Language Mapping Strategy for the Implementation of a Declarative Machine Learning Query Language
Machine learning now drives the digital economy, yet most toolkits still demand low-level statistical and algorithmic expertise that excludes non-specialists. To remove this barrier, we present the Machine-learning Query Language (MQL) -- a fully declarative interface that lets users express analytic intent as succinctly as SQL expresses data retrieval. An MQL compiler faithfully translates each statement into an executable pipeline on mainstream frameworks such as Scikit-Learn, PyCaret, TPOT, TensorFlow or PyTorch, hiding all procedural detail. Experiments underscore its impact. Compared with hand-coded scripts, MQL cut development effort by 70–85 times for classification, 100–140 times for regression, and 65–80 times for clustering. In 95\% of trials the auto-generated pipelines matched or outperformed the most accurate manually tuned models, and MQL’s framework-selection logic chose the best backend 90\% of the time. By coupling SQL-style abstraction with robust code generation, MQL delivers a decisive leap toward true mass-market, self-service machine learning.  more » « less
Award ID(s):
2410668
PAR ID:
10631979
Author(s) / Creator(s):
Publisher / Repository:
SSRN
Date Published:
Format(s):
Medium: X
Institution:
University of Idaho
Sponsoring Org:
National Science Foundation
More Like this
  1. The rising popularity of data science and machine learning (ML) across diverse domains, often driven by users with limited computational expertise, reflects the growing commoditization of ML tools. However, the advanced technical and mathematical knowledge demanded by current ML frameworks poses a formidable barrier for non-experts, preventing them from fully exploiting these powerful platforms.In response, we introduce MQL, a novel declarative query language for ML application design, alongside its corresponding query processing engine. We demonstrate that abstracting ML concepts -- similarly to SQL -- can preserve both processing efficiency and analytical fidelity. Our implementation defines MQL semantics through a semantics-preserving mapping to widely understood ML code fragments. By leveraging task-specific meta-features, heuristic knowledge, and standard assessment methods, our system ranks candidate ML libraries, selects optimal algorithms, and frees users from these choices.We introduce mapping algorithms to ensure that each MQL program retains its intended semantics and present experimental evaluations demonstrating that MQL’s algorithmic selections not only match but surpass human-engineered solutions in terms of performance and model accuracy. By offering declarative queries as a high-level alternative to traditional coding, MQL significantly reduces the complexity of data analysis pipeline construction, thereby democratizing machine learning application design. To foster shared community development, this work is maintained as an open-source project at \url{https://github.com/hmjamil/mql}. 
    more » « less
  2. With recent advancements, large language models (LLMs) such as ChatGPT and Bard have shown the potential to disrupt many industries, from customer service to healthcare. Traditionally, humans interact with geospatial data through software (e.g., ArcGIS 10.3) and programming languages (e.g., Python). As a pioneer study, we explore the possibility of using an LLM as an interface to interact with geospatial datasets through natural language. To achieve this, we also propose a framework to (1) train an LLM to understand the datasets, (2) generate geospatial SQL queries based on a natural language question, (3) send the SQL query to the backend database, (4) parse the database response back to human language. As a proof of concept, a case study was conducted on real-world data to evaluate its performance on various queries. The results show that LLMs can be accurate in generating SQL code for most cases, including spatial joins, although there is still room for improvement. As all geospatial data can be stored in a spatial database, we hope that this framework can serve as a proxy to improve the efficiency of spatial data analyses and unlock the possibility of automated geospatial analytics. 
    more » « less
  3. The goal of this thesis is to introduce a new design for building federated query optimizers, based on machine learning. We propose a modular and flexible architecture, allowing a federated query optimizer to integrate with any database system that supports SQL, with close-to-zero engineering effort. By observing the performance of the external systems, our optimizer learns and builds cost models on-the-fly, enabling federated query optimization with negligible communication with the external systems. To demonstrate the potential of this research plan, we present a prototype of our federated query optimizer built on top of Spark SQL. Our implementation effectively accelerates federated queries, achieving up to 7.5x better query execution times compared to the vanilla implementation of Spark SQL. 
    more » « less
  4. This paper evaluates the performances of dry, minimum quantity lubrication (MQL), and MQL with nanofluid conditions in turning of the most common titanium (Ti) alloy, Ti-6Al-4 V, in a solution treated and aged (STA) microstructure. In particular, the nanofluid evaluated here is vegetable (rapeseed) oil mixed with small concentrations of exfoliated graphite nanoplatelets (xGnPs). This paper focuses on turning process that imposes a challenging condition to apply the oil or nanofluid droplets directly onto the tribological surfaces of a cutting tool due to the uninterrupted engagement between tool and work material during cutting. A series of turning experiments was conducted with uncoated carbide inserts, while measuring the cutting forces with a dynamometer under the dry, MQL and MQL with nanofluid conditions supplying oil droplets externally from our MQL device. The inserts are retrieved intermittently to measure the progress of flank and crater wear using a confocal microscopy. This preliminary experimental result shows that MQL and in particular MQL with the nanofluid significantly improve the machinability of Ti alloys even in turning process. However, to attain the best performance, the MQL conditions such as nozzle orientation and the concentration of xGnP must be optimized. 
    more » « less
  5. Though recent advances in machine learning have led to significant improvements in natural language interfaces for databases, the accuracy and reliability of these systems remain limited, especially in high-stakes domains. This paper introduces SQLucid, a novel user interface that bridges the gap between non-expert users and complex database querying processes. SQLucid addresses existing limitations by integrating visual correspondence, intermediate query results, and editable step-by-step SQL explanations in natural language to facilitate user understanding and engagement. This unique blend of features empowers users to understand and refine SQL queries easily and precisely. Two user studies and one quantitative experiment were conducted to validate SQLucid’s effectiveness, showing significant improvement in task completion accuracy and user confidence compared to existing interfaces. Our code is available at https://github.com/magic-YuanTian/SQLucid. 
    more » « less