skip to main content


Title: Arctic Unmanned Aerial Video (UAV) 4K footage demonstrating topographical factors in Huslia, Alaska, summer 2019
This aerial remote sensing data was captured in Huslia, Alaska in June 2019. The 2017 Huslia Community Plan states that "erosion poses a considerable threat to Huslia, particularly structures located in Old Huslia. Several homes and structures have been relocated as a result of erosion. There had been some unsuccessful (early) attempts to slow erosion in Huslia, but nothing in recent years." Residents of Huslia indicated both in the Community Plan (1) and in personal discussions with the principal investigators that erosion was one of the most serious problems affecting the village, with strong counter-indications for community development, health and wellness, and small business entrepreneurship. The issue of erosion has reached such a point that the Hazard Mitigation Plan (2) was created. During the conversation scheduling National Science Foundation workshop logistics, this high resolution aerial video capture was requested by Carl Burgett, First Chief of Huslia, in order to better share the significant impact of erosion on the region with the broader research community in the hopes of justifying granting agency funding for restoration efforts enhancing safety, economic development, reconstruction, transportation and shipping efforts. 1) Huslia Community Plan https://www.commerce.alaska.gov/dcra/DCRARepoExt/RepoPubs/Plans/HusliaCommunityPlan2017.pdf 2) Huslia Hazard Mitigation Plan https://www.commerce.alaska.gov/dcra/DCRARepoExt/RepoPubs/Plans/Huslia_HMP_2018_Update_8-22-18_Final_Plan.pdf  more » « less
Award ID(s):
1758781
NSF-PAR ID:
10385323
Author(s) / Creator(s):
;
Publisher / Repository:
Arctic Data Center
Date Published:
Subject(s) / Keyword(s):
["Arctic UAV Dataset","Digital Communication","Alaska","Remote Sensing","Huslia","Erosion","Topographic Data","Arctic Entrepreneurship","Arctic Small Business Development"]
Format(s):
Medium: X Other: text/xml
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less
  2. A biodiversity dataset graph: BHL

    The intended use of this archive is to facilitate (meta-)analysis of the Biodiversity Heritage Library (BHL). The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.

    This dataset provides versioned snapshots of the BHL network as tracked by Preston [2] between 2019-05-19 and 2020-05-09 using "preston update -u https://biodiversitylibrary.org".

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the BHL content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  

    To retrieve and verify the downloaded BHL biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3849560/files

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .
    <hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> <http://purl.org/pav/previousVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .
    <hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> <http://purl.org/pav/previousVersion> <hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> .
    <hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> <http://purl.org/pav/previousVersion> <hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> .
    <hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> <http://purl.org/pav/previousVersion> <hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> .
    <hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> <http://purl.org/pav/previousVersion> <hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> .
    <hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> <http://purl.org/pav/previousVersion> <hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> .
    <hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> <http://purl.org/pav/previousVersion> <hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> .
    <hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> <http://purl.org/pav/previousVersion> <hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> .
    <hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> <http://purl.org/pav/previousVersion> <hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> .
    <hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> <http://purl.org/pav/previousVersion> <hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> .
    <hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1> <http://purl.org/pav/previousVersion> <hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca    file:/home/preston/preston-bhl/data/e0/c1/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca    OK    CONTENT_PRESENT_VALID_HASH    49458087    hash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca
    hash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99    file:/home/preston/preston-bhl/data/1a/57/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99    OK    CONTENT_PRESENT_VALID_HASH    25745    hash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99
    hash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c    file:/home/preston/preston-bhl/data/85/ef/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c    OK    CONTENT_PRESENT_VALID_HASH    519892    hash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c
    hash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743    file:/home/preston/preston-bhl/data/25/1e/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743    OK    CONTENT_PRESENT_VALID_HASH    787414    hash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston[2] v0.1.15. --
    preston.jar

    -- preston archives containing BHL data files, associated provenance logs and a provenance index --
    preston-[00-ff].tar.gz

    -- individual provenance index files --
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    2b1104cb7749e818c9afca78391b2d0099bbb0a32f2b348860a335cd2f8f6800
    4081bc59dff58d63f6a86c623cb770f01e9a355a42495b205bcb538cd526190f
    47a2816f8b5600b24487093adcddfea12434cc4f270f3ab09d9215fbdd546cd2
    6f99a1388823fca745c9e22ac21e2da909a219aa1ace55170fa9248c0276903c
    7ae46d7cd9b5a0f5889ba38bac53c82e591b0bdf8b605f5e48c0dce8fb7b717f
    82903464889fea7c53f53daedf4e41fa31092f82619edeb3415eb2b473f74af3
    9e8c86243df39dd4fe82a3f814710eccf73aa9291d050415408e346fa2b09e70
    a8308fbf4530e287927c471d881ce0fc852f16543d46e1ee26f1caba48815f3a
    bcec6df2ea7f74e9a6e2830d0072e6b2fbe65323d9ddb022dd6e1349c23996e2
    cfe47c25ec0210ac73c06b407beb20d9c58355cb15bae427fdc7541870ca2e4e
    f73fc9e70bce8f21f0c96b8ef0903749d8f223f71343ab5a8910968f99c9b8b6

    --- end of file descriptions ---


    References

    [1] Biodiversity Heritage Library (BHL, https://biodiversitylibrary.org) accessed from 2019-05-19 to 2020-05-09 with provenance hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1.
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .


    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.

     
    more » « less
  3. This study systematically reviews the diverse body of research on community flood risk management in the USA to identify knowledge gaps and develop innovative and practical lessons to aid flood management decision-makers in their efforts to reduce flood losses. The authors discovered and reviewed 60 studies that met the selection criteria (e.g., study is written in English, is empirical, focuses on flood risk management at the community level in the USA, etc.). Upon reviewing the major findings from each study, the authors identified seven practical lessons that, if implemented, could not only help flood management decision-makers better understand communities’ flood risks, but could also reduce the impacts of flood disasters and improve communities’ resilience to future flood disasters. These seven lessons include: (1) recognizing that acquiring open space and conserving wetlands are some of the most effective approaches to reducing flood losses; (2) recognizing that, depending on a community’s flood risks, different development patterns are more effective at reducing flood losses; (3) considering the costs and benefits of participating in FEMA’s Community Rating System program; (4) engaging community members in the flood planning and recovery processes; (5) considering socially vulnerable populations in flood risk management programs; (6) relying on a variety of floodplain management tools to delineate flood risk; and (7) ensuring that flood mitigation plans are fully implemented and continually revised. 
    more » « less
  4. Currently, substantial efforts are underway to improve the engagement and retention of engineering and computer science (E/CS) students in their academic programs. Student participation in specific activities known as High Impact Educational Practices (HIP) has been shown to improve student outcomes across a variety of degree fields. Thus, we suggest that understanding how and why E/CS students, especially those from historically underrepresented groups, participate in HIP is vital for supporting efforts aimed at improving E/CS student engagement and retention. The aim of the current study is to examine the participation of E/CS undergraduates enrolled at two western land-grant institutions (both institutions are predominantly white; one is an emerging Hispanic-serving institution) across five HIEP (i.e., global learning and study aboard internships, learning communities, service and community-based learning, and undergraduate research) that are offered outside of required E/CS curricula and are widely documented in the research literature. As part of a larger study, researchers developed an online questionnaire to explore student HIP participation and then surveyed E/CS students (n = 576) across both land-grant institutions. Subsequently, researchers will use survey results to inform the development of focus groups interview protocols. Focus group interviews will be conducted with purposefully selected E/CS students who participated in the survey. Combined survey and focus group data will then be analyzed to more deeply understand why and how E/CS students participate in the HIP at their university. This research paper reports on the frequency distribution analysis of the survey data generated with E/CS undergraduates enrolled at one of the two land grant institutions. The combined sample included E/CS undergraduates from the following demographic groups: female (34 %), Asian (10 %), Black or African American (2%), Hispanic or Latinx (6%), Native American or Alaskan Native (1%), Native Hawaiian or Other Pacific Islander (1%), White (81 %), and multiracial (4 %). Results show that most (38%) E/CS students reported participating in internships, while study abroad programs garnered the smallest level of E/CS student participation (5%) across all five HIP. Internships were found most likely to engage diverse students: Female (42%), Hispanic or Latinx (24%), Multiracial (44%), Asian (31%), First-generation (29%), and nontraditional students—other than those categorized as highly nontraditional—all reported participating in internships more than any other HIP. Notable differences in participation across E/CS and demographic groups were found for other HIPs. Results further revealed that 43% of respondents did not participate in any extracurricular HIP and only 19% participated in two or more HIP. Insights derived from the survey and used to inform ongoing quantitative and qualitative analyses are discussed. Keywords: community-based learning, high impact educational practices, HIP, internships learning communities, service learning, study aboard, undergraduate research 
    more » « less
  5. International collaboration between collections, aggregators, and researchers within the biodiversity community and beyond is becoming increasingly important in our efforts to support biodiversity, conservation and the life of the planet. The social, technical, logistical and financial aspects of an equitable biodiversity data landscape – from workforce training and mobilization of linked specimen data, to data integration, use and publication – must be considered globally and within the context of a growing biodiversity crisis. In recent years, several initiatives have outlined paths forward that describe how digital versions of natural history specimens can be extended and linked with associated data. In the United States, Webster (2017) presented the “extended specimen”, which was expanded upon by Lendemer et al. (2019) through the work of the Biodiversity Collections Network (BCoN). At the same time, a “digital specimen” concept was developed by DiSSCo in Europe (Hardisty 2020). Both the extended and digital specimen concepts depict a digital proxy of an analog natural history specimen, whose digital nature provides greater capabilities such as being machine-processable, linkages with associated data, globally accessible information-rich biodiversity data, improved tracking, attribution and annotation, additional opportunities for data use and cross-disciplinary collaborations forming the basis for FAIR (Findable, Accessible, Interoperable, Reproducible) and equitable sharing of benefits worldwide, and innumerable other advantages, with slight variation in how an extended or digital specimen model would be executed. Recognizing the need to align the two closely-related concepts, and to provide a place for open discussion around various topics of the Digital Extended Specimen (DES; the current working name for the joined concepts), we initiated a virtual consultation on the discourse platform hosted by the Alliance for Biodiversity Knowledge through GBIF. This platform provided a forum for threaded discussions around topics related and relevant to the DES. The goals of the consultation align with the goals of the Alliance for Biodiversity Knowledge: expand participation in the process, build support for further collaboration, identify use cases, identify significant challenges and obstacles, and develop a comprehensive roadmap towards achieving the vision for a global specification for data integration. In early 2021, Phase 1 launched with five topics: Making FAIR data for specimens accessible; Extending, enriching and integrating data; Annotating specimens and other data; Data attribution; and Analyzing/mining specimen data for novel applications. This round of full discussion was productive and engaged dozens of contributors, with hundreds of posts and thousands of views. During Phase 1, several deeper, more technical, or additional topics of relevance were identified and formed the foundation for Phase 2 which began in May 2021 with the following topics: Robust access points and data infrastructure alignment; Persistent identifier (PID) scheme(s); Meeting legal/regulatory, ethical and sensitive data obligations; Workforce capacity development and inclusivity; Transactional mechanisms and provenance; and Partnerships to collaborate more effectively. In Phase 2 fruitful progress was made towards solutions to some of these complex functional and technical long-term goals. Simultaneously, our commitment to open participation was reinforced, through increased efforts to involve new voices from allied and complementary fields. Among a wealth of ideas expressed, the community highlighted the need for unambiguous persistent identifiers and a dedicated agent to assign them, support for a fully linked system that includes robust publishing mechanisms, strong support for social structures that build trustworthiness of the system, appropriate attribution of legacy and new work, a system that is inclusive, removed from colonial practices, and supportive of creative use of biodiversity data, building a truly global data infrastructure, balancing open access with legal obligations and ethical responsibilities, and the partnerships necessary for success. These two consultation periods, and the myriad activities surrounding the online discussion, produced a wide variety of perspectives, strategies, and approaches to converging the digital and extended specimen concepts, and progressing plans for the DES -- steps necessary to improve access to research-ready data to advance our understanding of the diversity and distribution of life. Discussions continue and we hope to include your contributions to the DES in future implementation plans. 
    more » « less