skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Computer-Assisted Heuristic Evaluation of Data Visualization
Heuristic evaluation has been an important part of data visualization. Many heuristic rules and guidelines for evaluating data visualization have been proposed and reviewed. However, applying heuristic evaluation in practice is not trivial. First, the heuristic rules are discussed in different publications across different disciplines. There is no central repository of heuristic rules for data visualization. There are no consistent guidelines on how to apply them. Second, it is difficult to find multiple experts who are knowledgeable about the heuristic rules, their pitfalls, and counterpoints. To address this issue, we present a computer-assisted heuristic evaluation method for data visualization. Based on this method, we developed a Python-based tool for evaluating plots created by the visualization tool Plotly. Recent advances in declarative data visualization libraries have made it feasible to create such a tool. By providing advice, critiques, and recommendations, this tool serves as a knowledgeable virtual assistant to help data visualization developers evaluate their visualizations as they code.  more » « less
Award ID(s):
1852516
PAR ID:
10315473
Author(s) / Creator(s):
;
Editor(s):
Bebis, George
Date Published:
Journal Name:
Proceedings of the International Symposium on Visual Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Decision diagrams (DDs) are widely used in system verification to compute and store the state space of finite discrete events dynamic systems (DEDSs). DDs are organized into levels, and it is well known that the size of a DD encoding a given set may be very sensitive to the order in which the variables capturing the state of the system are mapped to levels. Computing optimal orders is NP-hard. Several heuristics for variable order computation have been proposed, and metrics have been introduced to evaluate these orders. However, we know of no published evaluation that compares the actual predictive power for all these metrics. We propose and apply a methodology to carry out such an evaluation, based on the correlation between the metric value of a variable order and the size of the DD generated with that order. We compute correlations for several metrics from the literature, applied to many variable orders built using different approaches, for 40 DEDSs taken from the literature. Our experiments show that these metrics have correlations ranging from "very weak or nonexisting" to "strong." We show the importance of highly correlating metrics on variable order heuristics, by defining and evaluating two new heuristics (an improvement of the well-known Force heuristic and a metric-based simulated annealing), as well as a meta-heuristic (that uses a metric to select the "best" variable order among a set of different variable orders). 
    more » « less
  2. Defining characteristics of a problem domain continues to challenge developers of visualization software, even though it is essential for designing both tools and resulting visualizations. Additionally, effectiveness of a visualization software tool often depends on the context of systems and actors within the domain problem. The nested blocks and guidelines model is a useful template for informing design and evaluation criteria for visualization software development because it aligns design to need. [1] Characterizing the outermost block of the nested model—the domain problem—is challenging, mainly due to the nature of contemporary domain problems, which are dynamic and by definition difficult to problematize. We offer here our emerging conceptual model, based on the central question in our research study—what visualization works for whom and in which situation—to characterize the outermost block, the domain problem, of the nested model. [1] We apply examples from a three-year case study of visualization software design and development to demonstrate how the conceptual model might be used to create evaluation criteria affecting design and development of a visualization tool. 
    more » « less
  3. ABSTRACT The rapid expansion of food and nutrition information requires new ways of data sharing and dissemination. Interactive platforms integrating data portals and visualization dashboards have been effectively utilized to describe, monitor, and track information related to food and nutrition; however, a comprehensive evaluation of emerging interactive systems is lacking. We conducted a systematic review on publicly available dashboards using a set of 48 evaluation metrics for data integrity, completeness, granularity, visualization quality, and interactivity based on 4 major principles: evidence, efficiency, emphasis, and ethics. We evaluated 13 dashboards, summarized their characteristics, strengths, and limitations, and provided guidelines for developing nutrition dashboards. We applied mixed effects models to summarize evaluation results adjusted for interrater variability. The proposed metrics and evaluation principles help to improve data standardization and harmonization, dashboard performance and usability, broaden information and knowledge sharing among researchers, practitioners, and decision makers in the field of food and nutrition, and accelerate data literacy and communication. 
    more » « less
  4. Evaluating novel contextual bandit policies using logged data is crucial in applications where exploration is costly, such as medicine. But it usually relies on the assumption of no unobserved confounders, which is bound to fail in practice. We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model. Surprisingly, we show that there exist no single set of weights that give unbiased evaluation regardless of outcome model, unlike the case with no unobserved confounders where density ratios are sufficient. Instead, we propose an adversarial objective and weights that minimize it, ensuring sufficient balance in the latent confounders regardless of outcome model. We develop theory characterizing the consistency of our method and tractable algorithms for it. Empirical results validate the power of our method when confounders are latent. 
    more » « less
  5. Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and there has as yet been no time for a systematic comparison between them. To this end, this paper provides a comprehensive assessment of recently proposed dialog evaluation metrics on a number of datasets. In this paper, 23 different automatic evaluation metrics are evaluated on 10 different datasets. Furthermore, the metrics are assessed in different settings, to better qualify their respective strengths and weaknesses. Metrics are assessed (1) on both the turn level and the dialog level, (2) for different dialog lengths, (3) for different dialog qualities (e.g., coherence, engaging), (4) for different types of response generation models (i.e., generative, retrieval, simple models and stateof-the-art models), (5) taking into account the similarity of different metrics and (6) exploring combinations of different metrics. This comprehensive assessment offers several takeaways pertaining to dialog evaluation metrics in general. It also suggests how to best assess evaluation metrics and indicates promising directions for future work. 
    more » « less