skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Pearls and pitfalls: a story of a programmatic data pull
As requirements for swift and sustainable data sharing are growing, questions of where and how researchers are sharing data are becoming increasingly important for institutions to answer. One of the goals of the Reality of Academic Data Sharing (RADS) Initiative, comprised of six academic institutions from the Data Curation Network (DCN), was to answer this question. This presentation will discuss the process of how RADS determined where data from our researchers are shared. To do this, we programmatically pulled DOIs from DataCite, making the naive assumption that the information we were collecting, the metadata fields we were utilizing, and the platforms we were using would present us with a neutral and unbiased view of where data from our affiliated researchers were shared. However, as we dug into the data, we found inconsistencies in the use and completeness of the necessary metadata fields for our questions, as well as differences in how DOIs were assigned across repositories. While we expected some differences, we did not anticipate these subtle differences would dramatically affect how we interpret the answer to the question of where data are shared. Our presentation will highlight examples in our work that show how these subtleties in the data are systematic and challenge our assumptions of neutrality of not just the data, but of our platforms and practices as well. By examining these biases, we are forced to reexamine the decisions behind how we practice and, as we move forward as information and repository managers, how to reduce bias or assumption of neutrality. As a community, we often rely on data-driven decisions and decision makers need to be aware of these biases, especially as we are likely to see increased investments due to the evolving data policies and practices.  more » « less
Award ID(s):
2135874
PAR ID:
10420972
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Research Data Access and Preservation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This is a story about the challenges and opportunities that surfaced while answering a deceptively complex question - where's the data? As faculty and researchers publish articles, datasets, and other research outputs to meet promotion and tenure requirements, address federal funding policies, and institutional open access and data sharing policies, many online locations for publishing these materials have developed over time. How can we capture where all of the research generated on an academic campus is shared and preserved? This presentation will discuss how our multi-institution collaboration, the Reality of Academic Data Sharing (RADS) Initiative, sought to answer this question. We programmatically pulled DOIs from DataCite and CrossRef, making the naive assumption that these platforms, the two predominant DOI registration agencies for US data, would present us with a neutral and unbiased view of where data from our affiliated researchers were shared. However, as we dug into the data, we found inconsistencies in the use and completeness of the necessary metadata fields for our questions, as well as differences in how DOIs were assigned across repositories. Additionally, we recognized the systematic and privileged bias introduced by our choice of data sources. Specifically, while DataCite and CrossRef provide easy discovery of research outputs because they aggregate DOIs, they are also costly commercial services. Many repositories that cannot afford such services or lack local staffing and knowledge required to use these services are left out of the technology that has recently been labeled “global research infrastructure”. Our presentation will identify the challenges we encountered in conducting this research specifically around finding the data, and cleaning and interpreting the data. We will further engage the audience in a discussion around increasing representation in the global research infrastructure to discover and account for more research outputs. 
    more » « less
  2. Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure. 
    more » « less
  3. The last 15 years have seen a marked growth of data management and sharing policies among federal agencies in the US and Canada. While these policies have an undeniable impact in terms of increased publicly available datasets, they have also impacted the research practices of funded researchers and the services and infrastructure provided by institutions. Researchers and institutions alike share the responsibility to align practices with funding agency requirements concerning data management and sharing, but each stakeholder group has responded in ways that may not align with one another. This presentation delves into research resulting from the National Science Foundation-funded Realities of Academic Data Sharing (RADS) Initiative and provides a comprehensive comparative analysis of services and infrastructure of six academic institutions, as well as an overview of the overall impact of these policies for researchers and institutions. Insights into services, infrastructure, and impact can lead to the creation of streamlined pathways for enhancing institutional efficiencies in data management and sharing. 
    more » « less
  4. This dataset is the result of studies conducted during phase one (NSF-funded) of the Realities of Academic Data Sharing (RADS) Initiative, based out of the Association of Research Libraries. Studies were conducted with federally-funded researchers and institutional administrators who support data sharing practices within their department or unit at the following institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Virginia Tech, and Washington University in St. Louis. The 2022 RADS studies were retrospective, investigating data sharing and management activities and support services from 2013 to 2022. Two surveys were utilized to collect data, the Institutional Infrastructure Survey for administrators and the Researcher Survey for federally-funded researchers. This dataset presents data from both of these surveys. Project website: https://www.arl.org/realities-of-academic-data-sharing-rads-initiative/ 
    more » « less
  5. Inconsistent and incomplete applications of metadata standards and unsatisfactory approaches to connecting repository holdings across the global research infrastructure inhibit data discovery and reusability. The Realities of Academic Data Sharing (RADS) Initiative has found that institutions and researchers create and have access to the most complete metadata, but that valuable metadata found in these local institutional repositories (IRs) are not making their way into global data infrastructure such as DataCite or Crossref. This panel examines the local to global spectrum of metadata completeness, including the challenges of obtaining quality metadata at a local level, specifically at Cornell University, and the loss of metadata during the transfer processes from IRs into global data infrastructure. The metadata completeness increases over time, as users reuse data and contribute to the metadata. As metadata improves and grows, users find and develop connections within data not previously visible to them. By feeding local IR metadata into the global data infrastructure, the global infrastructure starts giving back in the form of these connections. We believe that this information will be helpful in coordinating metadata better and more effectively across data repositories and creating more robust interoperability and reusability between and among IRs. 
    more » « less