skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 26, 2025

Title: QPV: An Input Control Component For Progressive Visualization Analytics [Work-in-progress]
Progressive visual analytics enable data scientists to efficiently explore large datasets and examine progressive results with low latency. Most progressive visualization frameworks use a progressive query processing module that controls the quality of the results and then feeds these results into a visualization module. The goal is to avoid poor-quality progressive results which could mislead data scientists. This method misses some optimization opportunities as it improves the quality of the intermediate result while ignoring how this result affects the final visualization. This work presents a work-in-progress quality-aware progressive visualization input control component, named QPV. The key idea of the proposed framework is to integrate the visualization module into the progressive query results so that the quality control takes into account the final visualization. With limited computational resources, QPV solves an optimization problem to allocate resources and alleviate the misleading effects in the progressive plots.  more » « less
Award ID(s):
1954644 1924694
PAR ID:
10546047
Author(s) / Creator(s):
;
Publisher / Repository:
The VLDB Endowment
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Progressive query processing enables data scientists to efficiently analyze and explore large datasets. Data scientists can start further analyses earlier if the progressive result can represent the complete results well. Most progressive processing frameworks carefully control which parts of the input to process in order to improve the quality of progressive results. The input control strategies work well when the data are processed uniformly. However, the progressive results will be biased towards the join keys if the processed data are not uniform. A recently proposed input&output framework named QPJ corrects the bias by temporarily hiding some results. The framework dynamically estimates the distribution of the complete result and outputs progressive results with a similar distribution to the estimated complete result. This demo presents QPJVis, which is a progressive query processing system designed to inherently process the progressive queries using the QPJ frame- work. Additionally, we also implement an input control framework, Prism, in QPJVis so that users can compare the difference between the input&output framework and a purely input framework. 
    more » « less
  2. Progressive query processing enables data scientists to efficiently analyze and explore large datasets. Data scientists can start further analyses earlier if the progressive result can represent the complete results well. Most progressive processing frameworks carefully control which parts of the input to process in order to improve the quality of progressive results. The input control strategies work well when the data are processed uniformly. However, the progressive results will be biased towards the join keys if the processed data are not uniform. A recently proposed input&output framework named QPJ corrects the bias by temporarily hiding some results. The framework dynamically estimates the distribution of the complete result and outputs progressive results with a similar distribution to the estimated complete result. This demo presents QPJVis, which is a progressive query processing system designed to inherently process the progressive queries using the QPJ framework. Additionally, we also implement an input control framework, Prism, in QPJVis so that users can compare the difference between the input&output framework and a purely input framework. 
    more » « less
  3. With the requirements to enable data analytics and exploration interactively and efficiently, progressive data processing, especially progressive join, became essential to data science. Join queries are particularly challenging due to the correlation between input datasets which causes the results to be biased towards some join keys. Existing methods carefully control which parts of the input to process in order to improve the quality of progressive results. If the quality is not satisfactory, they will process more data to improve the result. In this paper, we propose an alternative approach that initially seems counter-intuitive but surprisingly works very well. After query processing, we intentionally report fewer results to the user with the goal of improving the quality. The key idea is that if the output is deviated from the correct distribution, we temporarily hide some results to correct the bias. As we process more data, the hidden results are inserted back until the full dataset is processed. The main challenge is that we do not know the correct output distribution while the progressive query is running. In this work, we formally define the progressive join problem with quality and progressive result rate constraints. We propose an input&output quality-aware progressive join framework (QPJ) that (1) provides input control that decides which parts of the input to process; (2) estimates the final result distribution progressively; (3) automatically controls the quality of the progressive output rate; and (4) combines input&output control to enable quality control of the progressive results. We compare QPJ with existing methods and show QPJ can provide the progressive output that can represent the final answer better than existing methods. 
    more » « less
  4. With the requirements to enable data analytics and exploration interactively and efficiently, progressive data processing, especially progressive join, became essential to data science. Join queries are particularly challenging due to the correlation between input datasets which causes the results to be biased towards some join keys. Existing methods carefully control which parts of the input to process in order to improve the quality of progressive results. If the quality is not satisfactory, they will process more data to improve the result. In this paper, we propose an alternative approach that initially seems counter-intuitive but surprisingly works very well. After query processing, we intentionally report fewer results to the user with the goal of improving the quality. The key idea is that if the output is deviated from the correct distribution, we temporarily hide some results to correct the bias. As we process more data, the hidden results are inserted back until the full dataset is processed. The main challenge is that we do not know the correct output distribution while the progressive query is running. In this work, we formally define the progressive join problem with quality and progressive result rate constraints. We propose an input&output quality-aware progressive join framework (QPJ) that (1) provides input control that decides which parts of the input to process; (2) estimates the final result distribution progressively; (3) automat- ically controls the quality of the progressive output rate; and (4) combines input&output control to enable quality control of the progressive results. We compare QPJ with existing methods and show QPJ can provide the progressive output that can represent the final answer better than existing methods. 
    more » « less
  5. We present a novel multi-level representation of time series called OM3 that facilitates efficient interactive progressive visualization of large data stored in a database and supports various interactions such as resizing, panning, zooming, and visual query. Based on our proposed line-segment aggregation, this representation can produce error-free line visualizations that preserve the shape of a time series in windows of arbitrary sizes. To reduce the interaction latency, we develop an incremental tree-based query strategy to support progressive visualizations, allowing a finer control on the accuracy-time tradeoff. We quantitatively compare OM3 with state-of-the-art methods, including a method implemented on a leading time-series database InfluxDB, in two settings with databases residing either in the local area network or on the cloud. Results show that OM^3 maintains a low latency within 300~ms on the web browser and a high data reduction ratio regardless of the data size (ranging from millions to billions of records), achieving around 1,000 times faster than the state-of-the-art methods on the largest dataset experimented with. 
    more » « less