Abstract In this paper, we propose and compare two novel deep generative model-based approaches for the design representation, reconstruction, and generation of porous metamaterials characterized by complex and fully connected solid and pore networks. A highly diverse porous metamaterial database is curated, with each sample represented by solid and pore phase graphs and a voxel image. All metamaterial samples adhere to the requirement of complete connectivity in both pore and solid phases. The first approach employs a Dual Decoder Variational Graph Autoencoder to generate both solid phase and pore phase graphs. The second approach employs a Variational Graph Autoencoder for reconstructing/generating the nodes in the solid phase and pore phase graphs and a Transformer-based Large Language Model (LLM) for reconstructing/generating the connections, i.e., the edges among the nodes. A comparative study is conducted, and we found that both approaches achieved high accuracy in reconstructing node features, while the LLM exhibited superior performance in reconstructing edge features. Reconstruction accuracy is also validated by voxel-to-voxel comparison between the reconstructions and the original images in the test set. Additionally, discussions on the advantages and limitations of using LLMs in metamaterial design generation, along with the rationale behind their utilization, are provided.
more »
« less
IS-GGT: Iterative Scene Graph Generation with Generative Transformers
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene, which adds computational overhead to the approach. This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. Using two transformer-based components, we first sample a possible scene graph structure from detected objects and their visual features. We then perform predicate classification on the sampled edges to generate the final scene graph. This approach allows us to efficiently generate scene graphs from images with minimal inference overhead. Extensive experiments on the Visual Genome dataset demonstrate the efficiency of the proposed approach. Without bells and whistles, we obtain, on average, 20.7% mean recall (mR@100) across different settings for scene graph generation (SGG), outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased SGG approaches.
more »
« less
- PAR ID:
- 10491046
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- ISSN:
- 2575-7075
- ISBN:
- 979-8-3503-0129-8
- Page Range / eLocation ID:
- 6292 to 6301
- Format(s):
- Medium: X
- Location:
- Vancouver, BC, Canada
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In this paper, we propose and compare two novel deep generative model-based approaches for the design representation, reconstruction, and generation of porous metamaterials characterized by complex and fully connected solid and pore networks. A highly diverse porous metamaterial database is curated, with each sample represented by solid and pore phase graphs and a voxel image. All metamaterial samples adhere to the requirement of complete connectivity in both pore and solid phases. The first approach employs a Dual Decoder Variational Graph Autoencoder to generate both solid phase and pore phase graphs. The second approach employs a Variational Graph Autoencoder for reconstructing/generating the nodes in the solid phase and pore phase graphs and a Transformer-based Large Language Model (LLM) for reconstructing/generating the connections, i.e., the edges among the nodes. A comparative study was conducted, and we found that both approaches achieved high accuracy in reconstructing node features, while the LLM exhibited superior performance in reconstructing edge features. Reconstruction accuracy is also validated by voxel-to-voxel comparison between the reconstructions and the original images in the test set. Additionally, discussions on the advantages and limitations of using LLMs in metamaterial design generation, along with the rationale behind their utilization, are provided.more » « less
-
Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships. Despite the recent success in object detection using deep learning techniques, inferring complex contextual relationships and structured graph representations from visual data remains a challenging topic. In this study, we propose a novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem. The first module is a semantic transformation module utilized to capture semantic embedded relation features, by translating visual features and linguistic features into a common semantic space. The other module is a graph self-attention module introduced to embed a joint graph representation through assigning various importance weights to neighboring nodes. Finally, accurate scene graphs are produced by the relation inference module to recognize all entities and the corresponding relations. We evaluate our proposed method on the widely-adopted Visual Genome dataset, and the results demonstrate the effectiveness and superiority of our model.more » « less
-
Avidan, S.; Brostow, G.; Cissé, M.; Farinella. G.M.; Hassner, T. (Ed.)Graph-based representations are becoming increasingly popular for representing and analyzing video data, especially in object tracking and scene understanding applications. Accordingly, an essential tool in this approach is to generate statistical inferences for graphical time series associated with videos. This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data. The main challenge here is to find and preserve the registration of nodes (salient detected objects) across time frames when the data has noise and clutter due to false and missing nodes. First, we introduce a quotient-space representation of graphs that incorporates temporal registration of nodes, and we use that metric structure to impose a dynamical model on graph evolution. Then, we derive a Kalman smoother, adapted to the quotient space geometry, to estimate dense, smooth trajectories of graphs. We demonstrate this framework using simulated data and actual video graphs extracted from the Multiview Extended Video with Activities (MEVA) dataset. This framework successfully estimates graphs despite the noise, clutter, and missed detections.more » « less
-
Graph-based representations are becoming increasingly popular for representing and analyzing video data, especially in object tracking and scene understanding applications. Accordingly, an essential tool in this approach is to generate statistical inferences for graphical time series associated with videos. This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data. The main challenge here is to find and preserve the registration of nodes (salient detected objects) across time frames when the data has noise and clutter due to false and missing nodes. First, we introduce a quotient-space representation of graphs that incorporates temporal registration of nodes, and we use that metric structure to impose a dynamical model on graph evolution. Then, we derive a Kalman smoother, adapted to the quotient space geometry, to estimate dense, smooth trajectories of graphs. We demonstrate this framework using simulated data and actual video graphs extracted from the Multiview Extended Video with Activities (MEVA) dataset. This framework successfully estimates graphs despite the noise, clutter, and missed detections.more » « less
An official website of the United States government

