skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Collections are truly priceless
Last month, Duke University in North Carolina announced that it was shuttering its herbarium. The collection consists of nearly 1 million specimens representing the most comprehensive and historic set of plants from the southeastern United States. It also includes extensive holdings from other regions of the world, especially Mexico, Central America, and the West Indies. Duke plans to disperse these samples to other institutions for use or storage over the next 2 to 3 years, but this decision reflects a lack of awareness by academia that such collections are being leveraged as never before. With modern technologies spanning multiple fields of study, the holdings in herbaria and other natural history collections are not only facilitating a deeper and broader understanding of the past and present world but are also providing tools to meet both known and unforeseen challenges facing humanity. Science and society can hardly risk the loss of such an important resource.  more » « less
Award ID(s):
2105903 2101884 1754584 1802209
PAR ID:
10516281
Author(s) / Creator(s):
Publisher / Repository:
AAAS
Date Published:
Journal Name:
Science
Volume:
383
Issue:
6687
ISSN:
0036-8075
Page Range / eLocation ID:
1035 to 1035
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Over 300 million arthropod specimens are housed in North American natural history collections. These collections represent a “vast hidden treasure trove” of biodiversity −95% of the specimen label data have yet to be transcribed for research, and less than 2% of the specimens have been imaged. Specimen labels contain crucial information to determine species distributions over time and are essential for understanding patterns of ecology and evolution, which will help assess the growing biodiversity crisis driven by global change impacts. Specimen images offer indispensable insight and data for analyses of traits, and ecological and phylogenetic patterns of biodiversity. Here, we review North American arthropod collections using two key metrics, specimen holdings and digitization efforts, to assess the potential for collections to provide needed biodiversity data. We include data from 223 arthropod collections in North America, with an emphasis on the United States. Our specific findings are as follows: (1) The majority of North American natural history collections (88%) and specimens (89%) are located in the United States. Canada has comparable holdings to the United States relative to its estimated biodiversity. Mexico has made the furthest progress in terms of digitization, but its specimen holdings should be increased to reflect the estimated higher Mexican arthropod diversity. The proportion of North American collections that has been digitized, and the number of digital records available per species, are both much lower for arthropods when compared to chordates and plants. (2) The National Science Foundation’s decade-long ADBC program (Advancing Digitization of Biological Collections) has been transformational in promoting arthropod digitization. However, even if this program became permanent, at current rates, by the year 2050 only 38% of the existing arthropod specimens would be digitized, and less than 1% would have associated digital images. (3) The number of specimens in collections has increased by approximately 1% per year over the past 30 years. We propose that this rate of increase is insufficient to provide enough data to address biodiversity research needs, and that arthropod collections should aim to triple their rate of new specimen acquisition. (4) The collections we surveyed in the United States vary broadly in a number of indicators. Collectively, there is depth and breadth, with smaller collections providing regional depth and larger collections providing greater global coverage. (5) Increased coordination across museums is needed for digitization efforts to target taxa for research and conservation goals and address long-term data needs. Two key recommendations emerge: collections should significantly increase both their specimen holdings and their digitization efforts to empower continental and global biodiversity data pipelines, and stimulate downstream research. 
    more » « less
  2. Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure. 
    more » « less
  3. Abstract Material samples are indispensable data sources in many natural science, social science, and humanity disciplines. More and more researchers recognize that samples collected in one discipline can be of great value for another. This has motivated organizations that manage a large number of samples to make their holdings accessible to the world. Currently, multiple projects are working to connect natural history and other samples managed by individual institutions or individuals into a universe of samples that follow FAIR principles. This poster reports the progress of the US NSF‐funded iSamples project, in the context of other efforts initiated by US DOE, DiSCCo, BCoN, and GBIF. By October 2021, we will also be able to present an iSamples prototype. We encourage individual organizations that hold material samples to get to know these projects and help shape these projects to realize the goal of a global linked sample cloud that connects all material samples and is accessible to all. 
    more » « less
  4. Abstract Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi‐file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real‐world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections. 
    more » « less
  5. Leal, José H (Ed.)
    "Mobilizing Millions of Mollusks of the Eastern Seaboard" (ESB) is a project sponsored by the National Science Foundation that improves our knowledge of mollusks from the East and Gulf coasts of the US. The four-year project is making taxonomically vetted, and completely georeferenced occurrence data for 535,000 specimen lots representing 4.5 million specimens available online on the iDigBio, GBIF, and OBIS data aggregators. The ESB region includes 18 states, nearly 6,000 km from Maine to Texas. In the ESB project, 17 major US collections, containing 85% of molluscan holdings from the ESB in all US molluscan collections, are collaborating. The ESB project improves reliability of and access to molluscan collection data for examining changes in distribution, morphology, population size, and genetic variation within and across species. The Museum collection had been digitized (cataloged electronically) at the start of the project (including 21,283 ESB lots); accordingly, the main goals of the project were cleaning data (improving the taxonomy, locality, dates, collecting data) and adding geolocation (geographic coordinates) to these lots. In addition, since the beginning of the project, we digitized an additional 3,897 ESB newly acquired lots consisting of 14,500 specimens. Other achievements are cleaning and standardizing collection metadata for 12,730 lots, adding geolocation data for 23,952 lots and photographing 320 lots. Currently, the total number of ESB lots is 25,180, of which 24,201 have geolocation data. 
    more » « less