NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A FAIR and modular image‐based workflow for knowledge discovery in the emerging field of imageomics

https://doi.org/10.1111/2041-210X.14327

Balk, Meghan_A; Bradley, John; Maruf, M.; Altintaş, Bahadir; Bakiş, Yasin; Bart, Jr, Henry_L; Breen, David; Florian, Christopher_R; Greenberg, Jane; Karpatne, Anuj; et al (April 2024, Methods in Ecology and Evolution)

Abstract Image‐based machine learning tools are an ascendant ‘big data’ research avenue. Citizen science platforms, like iNaturalist, and museum‐led initiatives provide researchers with an abundance of data and knowledge to extract. These include extraction of metadata, species identification, and phenomic data. Ecological and evolutionary biologists are increasingly using complex, multi‐step processes on data. These processes often include machine learning techniques, often built by others, that are difficult to reuse by other members in a collaboration.We present a conceptual workflow model for machine learning applications using image data to extract biological knowledge in the emerging field of imageomics. We derive an implementation of this conceptual workflow for a specific imageomics application that adheres to FAIR principles as a formal workflow definition that allows fully automated and reproducible execution, and consists of reusable workflow components.We outline technologies and best practices for creating an automated, reusable and modular workflow, and we show how they promote the reuse of machine learning models and their adaptation for new research questions. This conceptual workflow can be adapted: it can be semi‐automated, contain different components than those presented here, or have parallel components for comparative studies.We encourage researchers—both computer scientists and biologists—to build upon this conceptual workflow that combines machine learning tools on image data to answer novel scientific questions in their respective fields.
more » « less
Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks

https://doi.org/10.1145/3580305.3599808

Elhamod, Mohannad; Khurana, Mridul; Manogaran, Harish Babu; Uyeda, Josef C.; Balk, Meghan A.; Dahdul, Wasila; Bakis, Yasin; Bart, Henry L.; Mabee, Paula M.; Lapp, Hilmar; et al (August 2023, KDD 2023 Proceedings. 29TH ACM SIGKDD. Conference on Knowledge Discovery and Data Mining.)

Full Text Available
Hierarchy‐guided neural network for species classification

https://doi.org/10.1111/2041-210X.13768

Elhamod, Mohannad; Diamond, Kelly M.; Maga, A. Murat; Bakis, Yasin; Bart, Henry L.; Mabee, Paula; Dahdul, Wasila; Leipzig, Jeremy; Greenberg, Jane; Avants, Brian; et al (March 2022, Methods in Ecology and Evolution)

Full Text Available
Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

https://doi.org/10.1007/978-3-030-71903-6_1

Leipzig, J (March 2021, MTSR 2020. Communications in Computer and Information Science)
Garoufallou E., Ovalle-Perandones MA. (Ed.)
Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of image quality metadata for a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found anatomical feature visibility was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support machine learning.
more » « less
Full Text Available

Search for: All records