skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ferret: Reviewing Tabular Datasets for Manipulation
Abstract How do we ensure the veracity of science? The act of manipulating or fabricating scientific data has led to many high‐profile fraud cases and retractions. Detecting manipulated data, however, is a challenging and time‐consuming endeavor. Automated detection methods are limited due to the diversity of data types and manipulation techniques. Furthermore, patterns automatically flagged as suspicious can have reasonable explanations. Instead, we propose a nuanced approach where experts analyze tabular datasets, e.g., as part of the peer‐review process, using a guided, interactive visualization approach. In this paper, we present an analysis of how manipulated datasets are created and the artifacts these techniques generate. Based on these findings, we propose a suite of visualization methods to surface potential irregularities. We have implemented these methods in Ferret, a visualization tool for data forensics work. Ferret makes potential data issues salient and provides guidance on spotting signs of tampering and differentiating them from truthful data.  more » « less
Award ID(s):
1751238
PAR ID:
10426691
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Computer Graphics Forum
Volume:
42
Issue:
3
ISSN:
0167-7055
Page Range / eLocation ID:
p. 187-198
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The field of connectomics aims to reconstruct the wiring diagram of Neurons and synapses to enable new insights into the workings of the brain. Reconstructing and analyzing the Neuronal connectivity, however, relies on many individual steps, starting from high‐resolution data acquisition to automated segmentation, proofreading, interactive data exploration, and circuit analysis. All of these steps have to handle large and complex datasets and rely on or benefit from integrated visualization methods. In this state‐of‐the‐art report, we describe visualization methods that can be applied throughout the connectomics pipeline, from data acquisition to circuit analysis. We first define the different steps of the pipeline and focus on how visualization is currently integrated into these steps. We also survey open science initiatives in connectomics, including usable open‐source tools and publicly available datasets. Finally, we discuss open challenges and possible future directions of this exciting research field. 
    more » « less
  2. We present a design-based exploration of the potential to reinterpret glyph-based visualization of scalar fields on 3D surfaces, a traditional scientific visualization technique, as a data physicalization technique. Even with the best virtual reality displays, users often struggle to correctly interpret spatial relationships in 3D datasets; thus, we are motivated to understand the extent to which traditional scientific visualization methods can translate to physical media where users may simultaneously leverage their visual systems and tactile senses to, in theory, better understand and connect with the data of interest. This pictorial traces the process of our design for a specific user study experiment: (1) inspiration, (2) exploring the data physicalization design space, (3) prototyping with 3D printing, (4) applying the techniques to different synthetic datasets. We call our most recent and compelling visual/tactile design boxcars on potatoes, and the next step in the research is to run a user-based evaluation to elucidate how this design compares to several of the others pictured here. 
    more » « less
  3. Image manipulation localization (IML) is a critical technique in media forensics, focusing on identifying tampered regions within manipulated images. Most existing IML methods require extensive training on labeled datasets with both image-level and pixel-level annotations. These methods often struggle with new manipulation types and exhibit low generalizability. In this work, we propose a training-free IML approach using diffusion models. Our method adaptively selects an appropriate number of diffusion timesteps for each input image in the forward process and performs both conditional and unconditional reconstructions in the backward process without relying on external conditions. By comparing these reconstructions, we generate a localization map highlighting regions of manipulation based on inconsistencies. Extensive experiments were conducted using sixteen state-of-the-art (SoTA) methods across six IML datasets. The results demonstrate that our training-free method outperforms SoTA unsupervised and weakly-supervised techniques. Furthermore, our method competes effectively against fully-supervised methods on novel (unseen) manipulation types. 
    more » « less
  4. Trust is fundamental to effective visual data communication between the visualization designer and the reader. Although personal experience and preference influence readers’ trust in visualizations, visualization designers can leverage design techniques to create visualizations that evoke a "calibrated trust," at which readers arrive after critically evaluating the information presented. To systematically understand what drives readers to engage in "calibrated trust," we must first equip ourselves with reliable and valid methods for measuring trust. Computer science and data visualization researchers have not yet reached a consensus on a trust definition or metric, which are essential to building a comprehensive trust model in human-data interaction. On the other hand, social scientists and behavioral economists have developed and perfected metrics that can measure generalized and interpersonal trust, which the visualization community can reference, modify, and adapt for our needs. In this paper, we gather existing methods for evaluating trust from other disciplines and discuss how we might use them to measure, define, and model trust in data visualization research. Specifically, we discuss quantitative surveys from social sciences, trust games from behavioral economics, measuring trust through measuring belief updating, and measuring trust through perceptual methods. We assess the potential issues with these methods and consider how we can systematically apply them to visualization research. 
    more » « less
  5. Exploring coordinated relationships is important for sense making of data in various fields, such as intelligence analysis. To support such investigations, visual analysis tools use biclustering to mine relationships in bipartite graphs and visualize the resulting biclusters with standard graph visualization techniques. Due to overlaps among biclusters, such visualizations can be cluttered (e.g., with many edge crossings), when there are a large number of biclusters. Prior work attempted to resolve this problem by automatically ordering nodes in a bipartite graph. However, visual clutter is still a serious problem, since the number of displayed biclusters remains unchanged. We propose bicluster aggregation as an alternative approach, and have developed two methods of interactively merging biclusters. These interactive bicluster aggregations help organize similar biclusters and reduce the number of displayed biclusters. Initial expert feedback indicates potential usefulness of these techniques in practice. 
    more » « less