skip to main content

Search for: All records

Award ID contains: 2118240

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Inexpensive and accessible sensors are accelerating data acquisition in animal ecology. These technologies hold great potential for large-scale ecological understanding, but are limited by current processing approaches which inefficiently distill data into relevant information. We argue that animal ecologists can capitalize on large datasets generated by modern sensors by combining machine learning approaches with domain knowledge. Incorporating machine learning into ecological workflows could improve inputs for ecological models and lead to integrated hybrid modeling tools. This approach will require close interdisciplinary collaboration to ensure the quality of novel approaches and train a new generation of data scientists in ecology and conservation.

  2. ABSTRACT The International Mouse Phenotyping Consortium (IMPC) has generated a large repository of three-dimensional (3D) imaging data from mouse embryos, providing a rich resource for investigating phenotype/genotype interactions. While the data is freely available, the computing resources and human effort required to segment these images for analysis of individual structures can create a significant hurdle for research. In this paper, we present an open source, deep learning-enabled tool, Mouse Embryo Multi-Organ Segmentation (MEMOS), that estimates a segmentation of 50 anatomical structures with a support for manually reviewing, editing, and analyzing the estimated segmentation in a single application. MEMOS is implemented as an extension on the 3D Slicer platform and is designed to be accessible to researchers without coding experience. We validate the performance of MEMOS-generated segmentations through comparison to state-of-the-art atlas-based segmentation and quantification of previously reported anatomical abnormalities in a Cbx4 knockout strain. This article has an associated First Person interview with the first author of the paper.
    Free, publicly-accessible full text available February 15, 2024
  3. Staples, Anne Elizabeth (Ed.)
    Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
    Free, publicly-accessible full text available December 7, 2023
  4. Charles, Cyril (Ed.)
    Manually collecting landmarks for quantifying complex morphological phenotypes can be laborious and subject to intra and interobserver errors. However, most automated landmarking methods for efficiency and consistency fall short of landmarking highly variable samples due to the bias introduced by the use of a single template. We introduce a fast and open source automated landmarking pipeline (MALPACA) that utilizes multiple templates for accommodating large-scale variations. We also introduce a K-means method of choosing the templates that can be used in conjunction with MALPACA, when no prior information for selecting templates is available. Our results confirm that MALPACA significantly outperforms single-template methods in landmarking both single and multi-species samples. K-means based template selection can also avoid choosing the worst set of templates when compared to random template selection. We further offer an example of post-hoc quality check for each individual template for further refinement. In summary, MALPACA is an efficient and reproducible method that can accommodate large morphological variability, such as those commonly found in evolutionary studies. To support the research community, we have developed open-source and user-friendly software tools for performing K-means multi-templates selection and MALPACA.
    Free, publicly-accessible full text available December 1, 2023
  5. ABSTRACT Due to the complexity of fish skulls, previous attempts to classify craniofacial phenotypes have relied on qualitative features or sparce 2D landmarks. In this work we aim to identify previously unknown 3D craniofacial phenotypes with a semiautomated pipeline in adult zebrafish mutants. We first estimate a synthetic ‘normative’ zebrafish template using MicroCT scans from a sample pool of wild-type animals using the Advanced Normalization Tools (ANTs). We apply a computational anatomy (CA) approach to quantify the phenotype of zebrafish with disruptions in bmp1a, a gene implicated in later skeletal development and whose human ortholog when disrupted is associated with Osteogenesis Imperfecta. Compared to controls, the bmp1a fish have larger otoliths, larger normalized centroid sizes, and exhibit shape differences concentrated around the operculum, anterior frontal, and posterior parietal bones. Moreover, bmp1a fish differ in the degree of asymmetry. Our CA approach offers a potential pipeline for high-throughput screening of complex fish craniofacial shape to discover novel phenotypes for which traditional landmarks are too sparce to detect. The current pipeline successfully identifies areas of variation in zebrafish mutants, which are an important model system for testing genome to phenome relationships in the study of development, evolution, and human diseases. This articlemore »has an associated First Person interview with the first author of the paper.« less
  6. We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker(M-TRACK) to guide the agent and monitor its progress. Specifcally, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent’s progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 33% and 52% relative improvement in unseen success rate over two competitive base models.
  7. Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. Our approach leverages several simple common sense heuristics to create an initial set of approximate seed labels. For example, relevant traffic participants are generally not persistent across multiple traversals of the same route, do not fly, and are never under ground. We demonstrate that these seed labels are highly effective to bootstrap a surprisingly accurate detector through repeated self-training without a single human annotated label. Code is available at
  8. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation for long-tailed instance segmentation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable framework FREESEG for extracting and leveraging these “free” object segments to facilitate model training. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high quality object segments can then be used to augment the existing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FREESEG yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories.
  9. Garoufallou E., Ovalle-Perandones MA. (Ed.)
    Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of image quality metadata for a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found anatomical feature visibility was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support machine learning.