skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: RONIN: data lake exploration
Dataset discovery can be performed using search (with a query or keywords) to find relevant data. However, the result of this discovery can be overwhelming to explore. Existing navigation techniques mostly focus on linkage graphs that enable navigation from one data set to another based on similarity or joinability of attributes. However, users often do not know which data set to start the navigation from. RONIN proposes an alternative way to navigate by building a hierarchical structure on a collection of data sets: the user navigates between groups of data sets in a hierarchical manner to narrow down to the data of interest. We demonstrate RONIN, a tool that enables user exploration of a data lake by seamlessly integrating the two common modalities of discovery: data set search and navigation of a hierarchical structure. In RONIN, a user can perform a keyword search or joinability search over a data lake, then, navigate the result using a hierarchical structure, called an organization , that is created on the fly. While navigating an organization, the user may switch to the search mode, and back to navigation on an organization that is updated based on search. This integration of search and navigation provides great power in allowing users to find and explore interesting data in a data lake.  more » « less
Award ID(s):
2107248 2107050
PAR ID:
10358650
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
14
Issue:
12
ISSN:
2150-8097
Page Range / eLocation ID:
2863 to 2866
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Children often interact with search engines within a classroom context to complete assignments or discover new information. To successfully identify relevant resources among those presented on a search engine results page (SERP), users must first be able to comprehend the text included in SERP snippets. While this task may be straightforward for an adult user, children may encounter obstacles in terms of readability and comprehension when attempting to navigate a SERP. Previous research has demonstrated the positive impact of including visual cues on a SERP as relevance signals to guide children toward appropriate resources. In this work, we explore the effect of supplying visual cues related to readability and text difficulty on children’s (ages 6-12) navigation of a SERP. Using quantitative data collected from user-interface interactions and qualitative data gathered from participant interviews, we analyze the impact of these visual cues on children’s selection of results on a SERP when carrying out information discovery tasks. 
    more » « less
  2. Children often interact with search engines within a classroom context to complete assignments or discover new information. To successfully identify relevant resources among those presented on a search engine results page (SERP), users must first be able to comprehend the text included in SERP snippets. While this task may be straightforward for an adult user, children may encounter obstacles in terms of readability and comprehension when attempting to navigate a SERP. Previous research has demonstrated the positive impact of including visual cues on a SERP as relevance signals to guide children toward appropriate resources. In this work, we explore the effect of supplying visual cues related to readability and text difficulty on children's (ages 6-12) navigation of a SERP. Using quantitative data collected from user-interface interactions and qualitative data gathered from participant interviews, we analyze the impact of these visual cues on children's selection of results on a SERP when carrying out information discovery tasks. 
    more » « less
  3. Social scientists increasingly share data so others can evaluate, replicate, and extend their research. To understand the process of data discovery as a precursor to data use, we study prospective users’ interactions with archived data. We gathered data for 98,000 user sessions initiated at a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). Our data reflect four years (2012-16) of users’ interactions with archival resources, including a data catalog, study-level metadata, variables, and publications that cite nearly 10,000 datasets. We constructed a network of user interactions linking website landing (e.g., site entrances) to exit pages, from which we identified three types of paths that users take through the research data archive: direct, orienting, and scenic. We also interpreted points of failure (e.g., drop-offs) and recurring behaviors (e.g., sensemaking) that support or impede data discovery along search paths. We articulate strategies that users adopt as they navigate data search and suggest ways to enhance the accessibility of data, metadata, and the systems that organize each. 
    more » « less
  4. Abstract AimSpecies occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses. InnovationWe introduce an R package, occTest, that synthesizes a growing open‐source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases(i.e. cleaning vs. testing)that encompass different testBlocksgrouping differenttestTypes(e.g.environmental outlier detection), which may use differenttestMethods(e.g.Rosner test, jacknife,etc.). Four differenttestBlockscharacterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user‐defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed. Main conclusionsoccTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom‐built rules. As a result, occTest can better assess each record's appropriateness for its intended application. 
    more » « less
  5. Abstract: Navigation is a major challenge in exploring data within immersive environments, especially of large omnidirectional spherical images. We propose a method of auto-scaling to allow users to navigate using teleportation within the safe boundary of their physical environment with different levels of focus. Our method combines physical navigation with virtual teleportation. We also propose a “peek then warp” behavior when using a zoom lens and evaluate our system in conjunction with different teleportation transitions, including a proposed transition for exploration of omnidirectional and 360-degree panoramic imagery, termed Envelop, wherein the destination view expands out from the zoom lens to completely envelop the user. In this work, we focus on visualizing and navigating large omnidirectional or panoramic images with application to GIS visualization as an inside-out omnidirectional image of the earth. We conducted two user studies to evaluate our techniques over a search and comparison task. Our results illustrate the advantages of our techniques for navigation and exploration of omnidirectional images in an immersive environment. 
    more » « less