skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data
Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are often biased, as most researchers have primarily focused on reducing computational latency. Biased sampling approaches select data with unequal probabilities and produce results that do not match the exact data distribution, leading end users to incorrect interpretations. In this paper, we propose a novel approach to perform unbiased online sampling of large spatiotemporal data. The proposed approach ensures the same probability of selection to every point that qualifies the specifications of a user's multidimensional query. To achieve unbiased sampling for accurate representative interactive visualizations, we design a novel data index and an associated sample retrieval plan. Our proposed sampling approach is suitable for a wide variety of visual analytics tasks, e.g., tasks that run aggregate queries of spatiotemporal data. Extensive experiments confirm the superiority of our approach over a state-of-the-art spatial online sampling technique, demonstrating that within the same computational time, data samples generated in our approach are at least 50% more accurate in representing the actual spatial distribution of the data and enable approximate visualizations to present closer visual appearances to the exact ones.  more » « less
Award ID(s):
1815796 1910216
NSF-PAR ID:
10301810
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
2020 IEEE Conference on Visual Analytics Science and Technology (VAST)
Page Range / eLocation ID:
72 to 83
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Visual analytics systems enable highly interactive exploratory data analysis. Across a range of fields, these technologies have been successfully employed to help users learn from complex data. However, these same exploratory visualization techniques make it easy for users to discover spurious findings. This paper proposes new methods to monitor a user’s analytic focus during visual analysis of structured datasets and use it to surface relevant articles that contextualize the visualized findings. Motivated by interactive analyses of electronic health data, this paper introduces a formal model of analytic focus, a computational approach to dynamically update the focus model at the time of user interaction, and a prototype application that leverages this model to surface relevant medical publications to users during visual analysis of a large corpus of medical records. Evaluation results with 24 users show that the modeling approach has high levels of accuracy and is able to surface highly relevant medical abstracts. 
    more » « less
  2. Abstract

    Disease surveillance systems worldwide face increasing pressure to maintain and distribute data in usable formats supplemented with effective visualizations to enable actionable policy and programming responses. Annual reports and interactive portals provide access to surveillance data and visualizations depicting temporal trends and seasonal patterns of diseases. Analyses and visuals are typically limited to reporting the annual time series and the month with the highest number of cases per year. Yet, detecting potential disease outbreaks and supporting public health interventions requires detailed spatiotemporal comparisons to characterize spatiotemporal patterns of illness across diseases and locations. The Centers for Disease Control and Prevention’s (CDC) FoodNet Fast provides population-based foodborne-disease surveillance records and visualizations for select counties across the US. We offer suggestions on how current FoodNet Fast data organization and visual analytics can be improved to facilitate data interpretation, decision-making, and communication of features related to trend and seasonality. The resulting compilation, or analecta, of 436 visualizations of records and codes are openly available online.

     
    more » « less
  3. Interaction is the cornerstone of how people perform tasks and gain insight in visual analytics. However, people’s inherent cognitive biases impact their behavior and decision making during their interactive visual analytic process. Understanding how bias impacts the visual analytic process, how it can be measured, and how its negative effects can be mitigated is a complex problem space. Nonetheless, recent work has begun to approach this problem by proposing theoretical computational metrics that are applied to user interaction sequences to measure bias in real-time. In this paper, we implement and apply these computational metrics in the context of anchoring bias. We present the results of a formative study examining how the metrics can capture anchoring bias in real-time during a visual analytic task. We present lessons learned in the form of considerations for applying the metrics in a visual analytic tool. Our findings suggest that these computational metrics are a promising approach for characterizing bias in users’ interactive behaviors. 
    more » « less
  4. null (Ed.)
    Most visual analytics systems assume that all foraging for data happens before the analytics process; once analysis begins, the set of data attributes considered is fixed. Such separation of data construction from analysis precludes iteration that can enable foraging informed by the needs that arise in-situ during the analysis. The separation of the foraging loop from the data analysis tasks can limit the pace and scope of analysis. In this paper, we present CAVA, a system that integrates data curation and data augmentation with the traditional data exploration and analysis tasks, enabling information foraging in-situ during analysis. Identifying attributes to add to the dataset is difficult because it requires human knowledge to determine which available attributes will be helpful for the ensuing analytical tasks. CAVA crawls knowledge graphs to provide users with a a broad set of attributes drawn from external data to choose from. Users can then specify complex operations on knowledge graphs to construct additional attributes. CAVA shows how visual analytics can help users forage for attributes by letting users visually explore the set of available data, and by serving as an interface for query construction. It also provides visualizations of the knowledge graph itself to help users understand complex joins such as multi-hop aggregations. We assess the ability of our system to enable users to perform complex data combinations without programming in a user study over two datasets. We then demonstrate the generalizability of CAVA through two additional usage scenarios. The results of the evaluation confirm that CAVA is effective in helping the user perform data foraging that leads to improved analysis outcomes, and offer evidence in support of integrating data augmentation as a part of the visual analytics pipeline. 
    more » « less
  5. Abstract

    Calibration weighting has been widely used to correct selection biases in nonprobability sampling, missing data and causal inference. The main idea is to calibrate the biased sample to the benchmark by adjusting the subject weights. However, hard calibration can produce enormous weights when an exact calibration is enforced on a large set of extraneous covariates. This article proposes a soft calibration scheme, where the outcome and the selection indicator follow mixed-effect models. The scheme imposes an exact calibration on the fixed effects and an approximate calibration on the random effects. On the one hand, our soft calibration has an intrinsic connection with best linear unbiased prediction, which results in a more efficient estimation compared to hard calibration. On the other hand, soft calibration weighting estimation can be envisioned as penalized propensity score weight estimation, with the penalty term motivated by the mixed-effect structure. The asymptotic distribution and a valid variance estimator are derived for soft calibration. We demonstrate the superiority of the proposed estimator over other competitors in simulation studies and using a real-world data application on the effect of BMI screening on childhood obesity.

     
    more » « less