Spatial data, under the broader umbrella of digital data, is becoming increasingly integral to all stages of archaeological research design and dissemination. As archaeologists lean toward reuse and interoperability, with ethics on their minds, how to treat spatial data is of particular importance. This is because of the complexities involved at every life-cycle stage, from collection to publication, including black box issues that may be taken for granted, and because the size of spatial data can lead to archiving difficulties. Here, the “DIY” momentum of increasingly accessible spatial methods such as photogrammetry and handheld lidar is examined alongside forthcoming changes in publication policies that will impact the United States in particular, framed around a conversation about best practices and a call for more comprehensive training for the archaeological community. At its heart, this special issue seeks to realize the potential of increasingly digitized—and increasingly large amounts of—archaeological data. Within cultural resource management, this means anticipating utilization of data through widespread standardization, among many interrelated activities. A desire to enhance the utility of archaeological data has distinct resonances with the use of spatial data in archaeology, as do some wider challenges that the archaeological community faces moving forward.
more »
« less
Promoting data quality and reuse in archaeology through collaborative identifier practices
Investments in data management infrastructure often seek to catalyze new research outcomes based on the reuse of research data. To achieve the goals of these investments, we need to better understand how data creation and data quality concerns shape the potential reuse of data. The primary audience for this paper centers on scientific domain specialists that create and (re)use datasets documenting archaeological materials. This paper discusses practices that promote data quality in support of more open-ended reuse of data beyond the immediate needs of the creators. We argue that identifier practices play a key, but poorly recognized, role in promoting data quality and reusability. We use specific archaeological examples to demonstrate how the use of globally unique and persistent identifiers can communicate aspects of context, avoid errors and misinterpretations, and facilitate integration and reuse. We then discuss the responsibility of data creators and data reusers to employ identifiers to better maintain the contextual integrity of data, including professional, social, and ethical dimensions.
more »
« less
- Award ID(s):
- 2129268
- PAR ID:
- 10410505
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 119
- Issue:
- 43
- ISSN:
- 0027-8424
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Over the last decade, significant changes have affected the work that data repositories of all kinds do. First, the emergence of globally unique and persistent identifiers (PIDs) has created new opportunities for repositories to engage with the global research community by connecting existing repository resources to the global research infrastructure. Second, repository use cases have evolved from data discovery to data discovery and reuse, significantly increasing metadata requirements.To respond to these evolving requirements, we need retrospective and on-going curation, i.e. re-curation, processes that 1) find identifiers and add them to existing metadata to connect datasets to a wider range of communities, and 2) add elements that support reuse to globally connected metadata.The goal of this work is to introduce the concept of re-curation with representative examples that are generally applicable to many repositories: 1) increasing completeness of affiliations and identifiers for organizations and funders in the Dryad Repository and 2) measuring and increasing FAIRness of DataCite metadata beyond required fields for institutional repositories.These re-curation efforts are a critical part of reshaping existing metadata and repository processes so they can take advantage of new connections, engage with global research communities, and facilitate data reuse.more » « less
-
Drawing on previous research into the value of developing and sharing data stories on social media, we use this paper to examine how practitioners address a spectrum of interests and concerns in relation to their own data literacies within this media form. To do so, we analyzed 107 data story videos from TikTok and Instagram to explore what practices and communication techniques are apparent in social media data stories that exhibit features of data literacy. Through our analysis, we uncovered a series of digital storytelling techniques (e.g., speaking to the camera, using a green screen) that supported the creators’ data science practices and communicative goals. This study contributes to the discourse on social media's role in data storytelling and literacy, providing guidance for future research and implications for the design of new data literacy learning experiences.more » « less
-
Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure.more » « less
-
10.17605/OSF.IO/AT4XE Despite increased use of digital biodiversity data in research, reliable methods to identify datasets are not widely adopted. While commonly used location-based dataset identifiers such as URLs help to easily download data today, additional identification schemes are needed to ensure long term access to datasets. We propose to augment existing location- and DOI-based identification schemes with cryptographic content-based identifiers. These content-based identifiers can be calculated from the datasets themselves using available cryptographic hashing algorithms (e.g., sha256). These algorithms take only the digital content as input to generate a unique identifier without needing a centralized identification administration. The use of content-based identifiers is not new, but a re-application of change management techniques used in the popular version control system "git". We show how content-based identifiers can be used to version datasets, to track the dataset locations, to monitor their reliability, and to efficiently detect dataset changes. We discuss the results of using our approach on datasets registered in GBIF and iDigBio from Sept 2018 to May 2020. Also, we propose how reliable, decentralized, dataset indexing and archiving systems can be devised. Lastly, we outline a modification to existing data citation practices to help work towards more reproducible and reusable research workflows.more » « less
An official website of the United States government

