skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: 3D Volumetric Modeling with Introspective Neural Networks
In this paper, we study the 3D volumetric modeling problem by adopting the Wasserstein introspective neural networks method (WINN) that was previously applied to 2D static im ages. We name our algorithm 3DWINN which enjoys the same properties as WINN in the 2D case: being simultaneously generative and discriminative. Compared to the existing 3D volumetric modeling approaches, 3DWINN demonstrates competitive results on several benchmarks in both the generation and the classification tasks. In addition to the standard inception score, the Fréchet Inception Distance (FID) metric is also adopted to measure the quality of 3D volumetric generations. In addition, we study adversarial attacks for volumetric data and demonstrate the robustness of 3DWINN against ad- versarial examples while achieving appealing results in both classification and generation within a single model. 3DWINN is a general framework and it can be applied to the emerging tasks for 3D object and scene modeling  more » « less
Award ID(s):
1717431
PAR ID:
10107777
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
THE THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE
Volume:
33
Issue:
1
Page Range / eLocation ID:
8481-8488
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks. 
    more » « less
  2. Hambleton, J. P. (Ed.)
    Soil particles that have been deposited through water or air generally align their largest projected surface area normal to the depositional direction, which generates a cross-anisotropic fabric of granular soils. Researchers have used both two-dimensional (2D) and three-dimensional (3D) images to determine scalar fabric parameters of granular soils, including void ratio, coordination number, and average branch vector length. This study aims to evaluate the accuracy and effectiveness of 2D images to characterize fabric in 3D soils based on scalar parameters. The X-ray computed tomography (X-ray CT) is used to reconstruct the 3D volumetric images of three air-pluviated sand specimens, including crushed limestone, Griffin sand, and glass beads. Then, six slices are obtained by vertically cutting the 3D volumetric image in an angle increment of 30 degrees. The 3D and 2D images are analyzed to determine scalar fabric parameters. The results show that coordination numbers and average branch vector lengths computed from 2D images underestimate these values in 3D granular soils. The void ratios computed from 2D images vary a large range depending on slicing directions, which cannot provide reliable fabric characterizations for 3D granular soils. 
    more » « less
  3. In recent years, semi-supervised learning has been widely explored and shows excellent data efficiency for 2D data. There is an emerging need to improve data efficiency for 3D tasks due to the scarcity of labeled 3D data. This paper explores how the coherence of different modalities of 3D data (e.g. point cloud, image, and mesh) can be used to improve data efficiency for both 3D classification and retrieval tasks. We propose a novel multimodal semi-supervised learning framework by introducing instance-level consistency constraint and a novel multimodal contrastive prototype (M2CP) loss. The instance-level consistency enforces the network to generate consistent representations for multimodal data of the same object regardless of its modality. The M2CP maintains a multimodal prototype for each class and learns features with small intra-class variations by minimizing the feature distance of each object to its prototype while maximizing the distance to the others. Our proposed framework significantly outperforms all the state-of-the-art counterparts for both classification and retrieval tasks by a large margin on the modelNet10 and ModelNet40 datasets. 
    more » « less
  4. null (Ed.)
    Osteoarthritis (OA) is the most common form of arthritis and can often occur in the knee. While convolutional neural networks (CNNs) have been widely used to study medical images, the application of a 3-dimensional (3D) CNN in knee OA diagnosis is limited. This study utilizes a 3D CNN model to analyze sequences of knee magnetic resonance (MR) images to perform knee OA classification. An advantage of using 3D CNNs is the ability to analyze the whole sequence of 3D MR images as a single unit as opposed to a traditional 2D CNN, which examines one image at a time. Therefore, 3D features could be extracted from adjacent slices, which may not be detectable from a single 2D image. The input data for each knee were a sequence of double-echo steady-state (DESS) MR images, and each knee was labeled by the Kellgren and Lawrence (KL) grade of severity at levels 0–4. In addition to the 5-category KL grade classification, we further examined a 2-category classification that distinguishes non-OA (KL ≤ 1) from OA (KL ≥ 2) knees. Clinically, diagnosing a patient with knee OA is the ultimate goal of assigning a KL grade. On a dataset with 1100 knees, the 3D CNN model that classifies knees with and without OA achieved an accuracy of 86.5% on the validation set and 83.0% on the testing set. We further conducted a comparative study between MRI and X-ray. Compared with a CNN model using X-ray images trained from the same group of patients, the proposed 3D model with MR images achieved higher accuracy in both the 5-category classification (54.0% vs. 50.0%) and the 2-category classification (83.0% vs. 77.0%). The result indicates that MRI, with the application of a 3D CNN model, has greater potential to improve diagnosis accuracy for knee OA clinically than the currently used X-ray methods. 
    more » « less
  5. Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior. 
    more » « less