skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Birds of a feather: capturing avian shape models from images
Animals are diverse in shape, but building a deformable shape model for a new species is not always possible due to the lack of 3D data. We present a method to capture new species using an articulated template and images of that species. In this work, we focus mainly on birds. Although birds represent almost twice the number of species as mammals, no accurate shape model is available. To capture a novel species, we first fit the articulated template to each training sample. By disentangling pose and shape, we learn a shape space that captures variation both among species and within each species from image evidence. We learn models of multiple species from the CUB dataset, and contribute new species-specific and multi-species shape models that are useful for downstream reconstruction tasks. Using a low-dimensional embedding, we show that our learned 3D shape space better reflects the phylogenetic relationships among birds than learned perceptual features.  more » « less
Award ID(s):
2124355
PAR ID:
10344546
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Page Range / eLocation ID:
14739-14749
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. D'Andrea, Rafael (Ed.)
    Data on the three dimensional shape of organismal morphology is becoming increasingly available, and forms part of a new revolution in high-throughput phenomics that promises to help understand ecological and evolutionary processes that influence phenotypes at unprecedented scales. However, in order to meet the potential of this revolution we need new data analysis tools to deal with the complexity and heterogeneity of large-scale phenotypic data such as 3D shapes. In this study we explore the potential of generative Artificial Intelligence to help organize and extract meaning from complex 3D data. Specifically, we train a deep representational learning method known as DeepSDF on a dataset of 3D scans of the bills of 2,020 bird species. The model is designed to learn a continuous vector representation of 3D shapes, along with a ’decoder’ function, that allows the transformation from this vector space to the original 3D morphological space. We find that approach successfully learns coherent representations: particular directions in latent space are associated with discernible morphological meaning (such as elongation, flattening, etc.). More importantly, learned latent vectors have ecological meaning as shown by their ability to predict the trophic niche of the bird each bill belongs to with a high degree of accuracy. Unlike existing 3D morphometric techniques, this method has very little requirements for human supervised tasks such as landmark placement, increasing it accessibility to labs with fewer labour resources. It has fewer strong assumptions than alternative dimension reduction techniques such as PCA. Once trained, 3D morphology predictions can be made from latent vectors very computationally cheaply. The trained model has been made publicly available and can be used by the community, including for finetuning on new data, representing an early step toward developing shared, reusable AI models for analyzing organismal morphology. 
    more » « less
  2. Perceiving and manipulating 3D articulated objects (e.g., cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals. We design an interaction-for-perception framework VAT-Mart to learn such actionable visual representations by simultaneously training a curiosity-driven reinforcement learning policy exploring diverse interaction trajectories and a perception module summarizing and generalizing the explored knowledge for pointwise predictions among diverse shapes. Experiments prove the effectiveness of the proposed approach using the large-scale PartNet-Mobility dataset in SAPIEN environment and show promising generalization capabilities to novel test shapes, unseen object categories, and real-world data. 
    more » « less
  3. Learning multi-agent system dynamics has been extensively studied for various real-world applications, such as molecular dynamics in biology, multi-body system in physics, and particle dynamics in material science. Most of the existing models are built to learn single system dynamics, which learn the dynamics from observed historical data and predict the future trajectory. In practice, however, we might observe multiple systems that are generated across different environments, which differ in latent exogenous factors such as temperature and gravity. One simple solution is to learn multiple environment-specific models, but it fails to exploit the potential commonalities among the dynamics across environments and offers poor prediction results where per-environment data is sparse or limited. Here, we present GG-ODE (Generalized Graph Ordinary Differential Equations), a machine learning framework for learning continuous multi-agent system dynamics across environments. Our model learns system dynamics using neural ordinary differential equations (ODE) parameterized by Graph Neural Networks (GNNs) to capture the continuous interaction among agents. We achieve the model generalization by assuming the dynamics across different environments are governed by common physics laws that can be captured via learning a shared ODE function. The distinct latent exogenous factors learned for each environment are incorporated into the ODE function to account for their differences. To improve model performance, we additionally design two regularization losses to (1) enforce the orthogonality between the learned initial states and exogenous factors via mutual information minimization; and (2) reduce the temporal variance of learned exogenous factors within the same system via contrastive learning. Experiments over various physical simulations show that our model can accurately predict system dynamics, especially in the long range, and can generalize well to new systems with few observations. 
    more » « less
  4. null (Ed.)
    Sum-product networks (SPN) are knowledge compilation models and are related to other graphical models for efficient probabilistic inference such as arithmetic circuits and AND/OR graphs. Recent investigations into generalizing SPNs have yielded sum-product-max networks (SPMN) which offer a data-driven alternative for decision making that has predominantly relied on handcrafted models. However, SPMNs are not suited for decision-theoretic planning which involves sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPNs in being data driven and mostly tractable, they are also well suited for planning problems. We establish soundness conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template. RSPMNs learned on a testbed of data sets, some generated using RDDLSim, yield MEUs and policies that are close to the optimal on perfectly-observed domains and easily improve on a recent batch-constrained RL method, which is important because RSPMNs offer a new model-based approach to offline RL. 
    more » « less
  5. Abstract As computed tomography and related technologies have become mainstream tools across a broad range of scientific applications, each new generation of instrumentation produces larger volumes of more-complex 3D data. Lagging behind are step-wise improvements in computational methods to rapidly analyze these new large, complex datasets. Here we describe novel computational methods to capture and quantify volumetric information, and to efficiently characterize and compare shape volumes. It is based on innovative theoretical and computational reformulation of volumetric computing. It consists of two theoretical constructs and their numerical implementation: the spherical wave decomposition ( SWD ), that provides fast, accurate automated characterization of shapes embedded within complex 3D datasets; and symplectomorphic registration with phase space regularization by entropy spectrum pathways ( SYMREG ), that is a non-linear volumetric registration method that allows homologous structures to be correctly warped to each other or a common template for comparison. Together, these constitute the Shape Analysis for Phenomics from Imaging Data ( SAPID ) method. We demonstrate its ability to automatically provide rapid quantitative segmentation and characterization of single unique datasets, and both inter-and intra-specific comparative analyses. We go beyond pairwise comparisons and analyze collections of samples from 3D data repositories, highlighting the magnified potential our method has when applied to data collections. We discuss the potential of SAPID in the broader context of generating normative morphologies required for meaningfully quantifying and comparing variations in complex 3D anatomical structures and systems. 
    more » « less