NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Exploratory and directed search strategies at a social science data archive

https://doi.org/10.29173/iq1087

Lafia, Sara; Million, AJ; Hemphill, Libby (March 2024, IASSIST Quarterly)

Researchers need to be able to find, access, and use data to participate in open science. To understand how users search for research data, we analyzed textual queries issued at a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). We collected unique user queries from 988,475 user search sessions over four years (2012-16). Overall, we found that only 30% of site visitors entered search terms into the ICPSR website. We analyzed search strategies within these sessions by extending existing dataset search taxonomies to classify a subset of the 1,554 most popular queries. We identified five categories of commonly-issued queries: keyword-based (e.g., date, place, topic); name (e.g., study, series); identifier (e.g., study, series); author (e.g., institutional, individual); and type (e.g., file, format). While the dominant search strategy used short keywords to explore topics, directed searches for known items using study and series names were also common. We further distinguished exploratory browsing from directed search queries based on their page views, refinements, search depth, duration, and length. Directed queries were longer (i.e., they had more words), while sessions with exploratory queries had more refinements and associated page views. By comparing search interactions at ICPSR to other natural language interactions in similar web search contexts, we conclude that dataset search at ICPSR is underutilized. We envision how alternative search paradigms, such as those enabled by recommender systems, can enhance dataset search.
more » « less
Full Text Available
Transforming Data Discovery Through Behavior Modeling and Recommendation - Google Analytics Trace Data

https://doi.org/10.3886/E209981V4

Lafia, Sara; Million, AJ; Hemphill, Libby (January 2024, ICPSR - Interuniversity Consortium for Political and Social Research)

This dataset contains trace data describing user interactions with the Inter-university Consortium for Political and Social Research website (ICPSR). We gathered site usage data from Google Analytics. We focused our analysis on user sessions, which are groups of interactions with resources (e.g., website pages) and events initiated by users. ICPSR tracks a subset of user interactions (i.e., other than page views) through event triggers. We analyzed sequences of interactions with resources, including the ICPSR data catalog, variable index, data citations collected in the ICPSR Bibliography of Data-related Literature, and topical information about project archives. As part of our analysis, we calculated the total number of unique sessions and page views in the study period. Data in our study period fell between September 1, 2012, and 2016. ICPSR's website was updated and relaunched in September 2012 with new search functionality, including a Social Science Variables Database (SSVD) tool. ICPSR then reorganized its website and changed its analytics collection procedures in 2016, marking this as the cutoff date for our analysis. Data are relevant for two reasons. First, updates to the ICPSR website during the study period focused only on front-end design rather than the website's search functionality. Second, the core features of the website over the period we examined (e.g., faceted and variable search, standardized metadata, the use of controlled vocabularies, and restricted data applications) are shared with other major data archives, making it likely that the trends in user behavior we report are generalizable.
more » « less
Opening doors to physical sample tracking and attribution in Earth and environmental sciences

https://doi.org/10.1038/s41597-025-05295-z

Damerow, Joan_E; Raia, Natalie_H; Stanley, Val; Choe, Saebyul; Borton, Mikayla_A; Byers, Neil; Cassidy, Ellen_R; Cholia, Shreyas; Edmunds, Rorie; Forbes, Brieanne; et al (June 2025, Scientific Data)
DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization

https://doi.org/10.1002/pra2.820

Fan, Lizhou; Lafia, Sara; Li, Lingyao; Yang, Fangyuan; Hemphill, Libby (October 2023, Proceedings of the Association for Information Science and Technology)

Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter‐university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine‐readability of data and its documentation. There are opportunities to enhance dataset search by improving users' ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot‐based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives' and institutional repositories' ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.
more » « less
Full Text Available
Direct, Orienting, and Scenic Paths: How Users Navigate Search in a Research Data Archive

https://doi.org/10.1145/3576840.3578275

Lafia, Sara; Million, A.J.; Hemphill, Libby (March 2023, Proceedings of the ACM SIGIR Conference On Human Information Interaction and Retrieval)

Social scientists increasingly share data so others can evaluate, replicate, and extend their research. To understand the process of data discovery as a precursor to data use, we study prospective users’ interactions with archived data. We gathered data for 98,000 user sessions initiated at a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). Our data reflect four years (2012-16) of users’ interactions with archival resources, including a data catalog, study-level metadata, variables, and publications that cite nearly 10,000 datasets. We constructed a network of user interactions linking website landing (e.g., site entrances) to exit pages, from which we identified three types of paths that users take through the research data archive: direct, orienting, and scenic. We also interpreted points of failure (e.g., drop-offs) and recurring behaviors (e.g., sensemaking) that support or impede data discovery along search paths. We articulate strategies that users adopt as they navigate data search and suggest ways to enhance the accessibility of data, metadata, and the systems that organize each.
more » « less
Data, not documents: Moving beyond theories of information‐seeking behavior to advance data discovery

https://doi.org/10.1002/asi.24962

Million, Anthony_J; York, Jeremy; Lafia, Sara; Hemphill, Libby (November 2024, Journal of the Association for Information Science and Technology)

Abstract Many theories of human information behavior (HIB) assume that information objects are in text document format. This paper argues four important HIB theories are insufficient for describing users' search strategies for data because of assumptions about the attributes of objects that users seek. We first review and compare four HIB theories: Bates'berrypicking, Marchionni'selectronic information search, Dervin'ssense‐making, and Meho and Tibbo'ssocial scientist information‐seeking. All four theories assume that information‐seekers search for text documents. Next, we compare these theories to search behavior by analyzing Google Analytics data from the Inter‐university Consortium for Political and Social Research (ICPSR). Users took direct, scenic, and orienting paths when searching for data. We also interviewed ICPSR users (n = 20), and they said they needed dataset documentation and contextual information to find data. However, Dervin'ssense‐makingalone cannot explain the information‐seeking behaviors that we observed. Instead, what mattered most were object attributes determined by the type of information that users sought (i.e., data, not documents). We conclude by suggesting an alternative frame for building user‐centered data discovery tools.
more » « less
A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

https://doi.org/10.1002/pra2.614

Lafia, Sara; Fan, Lizhou; Hemphill, Libby (October 2022, Proceedings of the Association for Information Science and Technology)

Full Text Available
How do properties of data, their curation, and their funding relate to reuse?

https://doi.org/10.1002/asi.24646

Hemphill, Libby; Pienta, Amy; Lafia, Sara; Akmon, Dharma; Bleckley, David A. (October 2022, Journal of the Association for Information Science and Technology)

Full Text Available
Opening Doors to Physical Sample Data Discovery, Integration, and Credit

https://doi.org/10.31223/X5ST2K

Damerow, Joan; Raia, Natalie; Stanley, Val; Choe, Saebyul; Borton, Mikayla; Byers, Neil; Cassidy, Ellen; Cholia, Shreyas; Edmunds, Rorie; Forbes, Brieanne; et al (June 2024, Nature Scientific Data)

Physical samples and their associated (meta)data underpin scientific discoveries across disciplines, and can enable new science when appropriately archived. However, there are significant gaps in community practices and infrastructure that currently prevent accurate provenance tracking, reproducibility, and attribution. For the vast majority of samples, descriptive metadata is often sparse, inaccessible, or absent. Samples and associated (meta)data may also be scattered across numerous physical collections, data repositories, laboratories, data files, and papers with no clear linkages or provenance tracking as new information is generated over time. The Physical Samples Curation Cluster has therefore developed ‘A Scientific Author Guide for Publishing Open Research Using Physical Samples.’ This involved synthesizing existing practices, community feedback, and assessing real-world examples to identify community and infrastructure needs. We identified areas of work needed to enable authors to efficiently reference samples and related data, link related samples and data, and track their use. Our goal is to help improve the discoverability, interoperability, use of physical samples and associated (meta)data into the future.
more » « less
Full Text Available
The Craft and Coordination of Data Curation: Complicating Workflow Views of Data Science

https://doi.org/10.1145/3555139

Thomer, Andrea K.; Akmon, Dharma; York, Jeremy J.; Tyler, Allison R.; Polasek, Faye; Lafia, Sara; Hemphill, Libby; Yakel, Elizabeth (November 2022, Proceedings of the ACM on Human-Computer Interaction)

Data curation is the process of making a dataset fit-for-use and archivable. It is critical to data-intensive science because it makes complex data pipelines possible, studies reproducible, and data reusable. Yet the complexities of the hands-on, technical, and intellectual work of data curation is frequently overlooked or downplayed. Obscuring the work of data curation not only renders the labor and contributions of data curators invisible but also hides the impact that curators' work has on the later usability, reliability, and reproducibility of data. To better understand the work and impact of data curation, we conducted a close examination of data curation at a large social science data repository, the Inter-university Consortium for Political and Social Research (ICPSR). We asked: What does curatorial work entail at ICPSR, and what work is more or less visible to different stakeholders and in different contexts? And, how is that curatorial work coordinated across the organization? We triangulated accounts of data curation from interviews and records of curation in Jira tickets to develop a rich and detailed account of curatorial work. While we identified numerous curatorial actions performed by ICPSR curators, we also found that curators rely on a number of craft practices to perform their jobs. The reality of their work practices defies the rote sequence of events implied by many life cycle or workflow models. Further, we show that craft practices are needed to enact data curation best practices and standards. The craft that goes into data curation is often invisible to end users, but it is well recognized by ICPSR curators and their supervisors. Explicitly acknowledging and supporting data curators as craftspeople is important in creating sustainable and successful curatorial infrastructures.
more » « less
Full Text Available

« Prev Next »

Search for: All records