Title: Bringing microfossil specimens into the light: Using semi-automated digitization techniques to improve collection accessibility
Natural history collections are often considered remote and inaccessible without special permission from curators. Digitization of these collections can make them much more accessible to researchers, educators, and general enthusiasts alike, thereby removing the stigma of a lonely specimen on a dusty shelf in the back room of a museum that will never again see the light of day. We are in the process of digitizing the microfossils of the Indiana University Paleontology collection using the GIGAmacro Magnify2 Robotic Imaging System. This suite of software and hardware allows us to automate photography and post-production of high resolution images, thereby severely reducing the amount of time and labor needed to serve the data. Our hardware includes a Canon T6i 24 megapixel DSLR, a Canon MPE 65mm 1X to 5X lens, and a Canon MT26EX Dual Flash, all mounted on a lead system made with high performance precision IGUS Drylin anodized aluminum. The camera and its mount move over the tray of microfossil slides using bearings and rails. The software includes the GIGAmacro Capture Software (photography), GIGAmacro Viewer Software (display and annotation), Zerene Stacker (focus stacking), and Autopano GIGA (stitching). All of the metadata is kept in association with the images, uploaded to Notes from Nature, transcribed by community scientists, then everything is stored in the image archive, Imago. In ~460 hours we have photographed ~10,500 slides and have completed ~65% of our microfossil collection. Using the GIGAmacro system we are able update and store collection information in a more secure and longer lasting digital form. The advantages of this system are numerable and highly recommended for museums who are looking to bring their collections out of the shadows and back into the light. more »« less
Leal, José H
(, The Northeast Natural History Conference)
Goldfarb, Keith
(Ed.)
Natural history collections are important depositories of biodiversity data. Digital photography of natural history collection specimens and subsequent dissemination of the resulting images on the web allow for the virtual discovery of these specimens, enhancing their accessibility to the target audience and the public in general. This presentation discusses digital photography of marine mollusks in collections, including some of the latest techniques for imaging of very small specimens, photography of specimens preserved in liquid, haptobionts, problems of color retention, transparency, 3-D photography, equipment, and other current areas of interest. Despite the focus on mollusks, the discussions can be extrapolated as generalities applicable to invertebrates from other phyla. The presentation also includes a discussion on equipment and the ideal digital parameters for imaging of natural history collection specimens, including image policies on acceptable file-format requirements for data hosts and aggregators such as iDigBio and others. (The presentation includes work funded in part by the NSF Thematic Collections Network grant award 2001528 “Mobilizing Millions of Mollusks from the Eastern Seaboard”).
Hunt, I.; Husain, S.; Simon, J.; Obeid, I.; Picone, J.
(, IEEE Signal Processing in Medicine and Biology Symposium (SPMB))
Obeid, Iyad; Picone, Joseph; Selesnick, Ivan
(Ed.)
The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a National Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv, nedc_tuh_dpath@googlegroups.com, to be kept informed of the latest developments in this project. You can learn more from our project website: https://www.isip.piconepress.com/projects/nsf_dpath.
Conventional continuous-wave amplitude-modulated time-of-flight (CWAM ToF) cameras suffer from a fundamental trade-off between light throughput and depth of field (DoF): a larger lens aperture allows more light collection but suffers from significantly lower DoF. However, both high light throughput, which increases signal-to-noise ratio, and a wide DoF, which enlarges the system’s applicable depth range, are valuable for CWAM ToF applications. In this work, we propose EDoF-ToF, an algorithmic method to extend the DoF of large-aperture CWAM ToF cameras by using a neural network to deblur objects outside of the lens’s narrow focal region and thus produce an all-in-focus measurement. A key component of our work is the proposed large-aperture ToF training data simulator, which models the depth-dependent blurs and partial occlusions caused by such apertures. Contrary to conventional image deblurring where the blur model is typically linear, ToF depth maps are nonlinear functions of scene intensities, resulting in a nonlinear blur model that we also derive for our simulator. Unlike extended DoF for conventional photography where depth information needs to be encoded (or made depth-invariant) using additional hardware (phase masks, focal sweeping, etc.), ToF sensor measurements naturally encode depth information, allowing a completely software solution to extended DoF. We experimentally demonstrate EDoF-ToF increasing the DoF of a conventional ToF system by 3.6 ×, effectively achieving the DoF of a smaller lens aperture that allows 22.1 × less light. Ultimately, EDoF-ToF enables CWAM ToF cameras to enjoy the benefits of both high light throughput and a wide DoF.
Motz, Gary; Zimmerman, Alexander; Cook, Kimberly; Bancroft, Alyssa
(, Biodiversity Information Science and Standards)
Collections digitization relies increasingly upon computational and data management resources that occasionally exceed the capacity of natural history collections and their managers and curators. Digitization of many tens of thousands of micropaleontological specimen slides, as evidenced by the effort presented here by the Indiana University Paleontology Collection, has been a concerted effort in adherence to the recommended practices of multifaceted aspects of collections management for both physical and digital collections resources. This presentation highlights the contributions of distributed cyberinfrastructure from the National Science Foundation-supported Extreme Science and Engineering Discovery Environment (XSEDE) for web-hosting of collections management system resources and distributed processing of millions of digital images and metadata records of specimens from our collections. The Indiana University Center for Biological Research Collections is currently hosting its instance of the Specify collections management system (CMS) on a virtual server hosted on Jetstream, the cloud service for on-demand computational resources as provisioned by XSEDE. This web-service allows the CMS to be flexibly hosted on the cloud with additional services that can be provisioned on an as-needed basis for generating and integrating digitized collections objects in both web-friendly and digital preservation contexts. On-demand computing resources can be used for the manipulation of digital images for automated file I/O, scripted renaming of files for adherence to file naming conventions, derivative generation, and backup to our local tape archive for digital disaster preparedness and long-term storage. Here, we will present our strategies for facilitating reproducible workflows for general collections digitization of the IUPC nomenclatorial types and figured specimens in addition to the gigapixel resolution photographs of our large collection of microfossils using our GIGAmacro system (e.g., this slide of conodonts). We aim to demonstrate the flexibility and nimbleness of cloud computing resources for replicating this, and other, workflows to enhance the findability, accessibility, interoperability, and reproducibility of the data and metadata contained within our collections.
Light transport contains all light information between a light source and an image sensor. As an important application of light transport, dual photography has been a popular research topic, but it is challenged by long acquisition time, low signal-to-noise ratio, and the storage or processing of a large number of measurements. In this Letter, we propose a novel hardware setup that combines a flying-spot micro-electro mechanical system (MEMS) modulated projector with an event camera to implement dual photography for 3D scanning in both line-of-sight (LoS) and non-line-of-sight (NLoS) scenes with a transparent object. In particular, we achieved depth extraction from the LoS scenes and 3D reconstruction of the object in a NLoS scene using event light transport.
Thorpe, Emily D. Bringing microfossil specimens into the light: Using semi-automated digitization techniques to improve collection accessibility. Retrieved from https://par.nsf.gov/biblio/10129035. Making the Case for Natural History Collections: SPNHC 2019 .
Thorpe, Emily D. Bringing microfossil specimens into the light: Using semi-automated digitization techniques to improve collection accessibility. Making the Case for Natural History Collections: SPNHC 2019, (). Retrieved from https://par.nsf.gov/biblio/10129035.
Thorpe, Emily D.
"Bringing microfossil specimens into the light: Using semi-automated digitization techniques to improve collection accessibility". Making the Case for Natural History Collections: SPNHC 2019 (). Country unknown/Code not available. https://par.nsf.gov/biblio/10129035.
@article{osti_10129035,
place = {Country unknown/Code not available},
title = {Bringing microfossil specimens into the light: Using semi-automated digitization techniques to improve collection accessibility},
url = {https://par.nsf.gov/biblio/10129035},
abstractNote = {Natural history collections are often considered remote and inaccessible without special permission from curators. Digitization of these collections can make them much more accessible to researchers, educators, and general enthusiasts alike, thereby removing the stigma of a lonely specimen on a dusty shelf in the back room of a museum that will never again see the light of day. We are in the process of digitizing the microfossils of the Indiana University Paleontology collection using the GIGAmacro Magnify2 Robotic Imaging System. This suite of software and hardware allows us to automate photography and post-production of high resolution images, thereby severely reducing the amount of time and labor needed to serve the data. Our hardware includes a Canon T6i 24 megapixel DSLR, a Canon MPE 65mm 1X to 5X lens, and a Canon MT26EX Dual Flash, all mounted on a lead system made with high performance precision IGUS Drylin anodized aluminum. The camera and its mount move over the tray of microfossil slides using bearings and rails. The software includes the GIGAmacro Capture Software (photography), GIGAmacro Viewer Software (display and annotation), Zerene Stacker (focus stacking), and Autopano GIGA (stitching). All of the metadata is kept in association with the images, uploaded to Notes from Nature, transcribed by community scientists, then everything is stored in the image archive, Imago. In ~460 hours we have photographed ~10,500 slides and have completed ~65% of our microfossil collection. Using the GIGAmacro system we are able update and store collection information in a more secure and longer lasting digital form. The advantages of this system are numerable and highly recommended for museums who are looking to bring their collections out of the shadows and back into the light.},
journal = {Making the Case for Natural History Collections: SPNHC 2019},
author = {Thorpe, Emily D},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.