We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is highly challenging, since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/. 
                        more » 
                        « less   
                    
                            
                            SHRED: 3D Shape Region Decomposition with Learned Local Operations
                        
                    
    
            We present SHRED, a method for 3D SHape REgion Decomposition. SHRED takes a 3D point cloud as input and uses learned local operations to produce a segmentation that approximates fine-grained part instances. We endow SHRED with three decomposition operations: splitting regions, fixing the boundaries between regions, and merging regions together. Modules are trained independently and locally, allowing SHRED to generate high-quality segmentations for categories not seen during training. We train and evaluate SHRED with fine-grained segmentations from PartNet; using its merge-threshold hyperparameter, we show that SHRED produces segmentations that better respect ground-truth annotations compared with baseline methods, at any desired decomposition granularity. Finally, we demonstrate that SHRED is useful for downstream applications, out-performing all baselines on zero-shot fine-grained part instance segmentation and few-shot finegrained semantic segmentation when combined with methods that learn to label shape regions. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1941808
- PAR ID:
- 10403485
- Date Published:
- Journal Name:
- ACM Transactions on Graphics
- Volume:
- 41
- Issue:
- 6
- ISSN:
- 0730-0301
- Page Range / eLocation ID:
- 1 to 11
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Enabling the vision of on-demand cyber manufacturing-as-a-service requires a new set of cloud-based computational tools for design manufacturability feedback and process selection to connect designers with manufacturers. In our prior work, we demonstrated a generative modeling approach in voxel space to model the shape transformation capabilities of machining operations using unsupervised deep learning. Combining this with a deep metric learning model enabled quantitative assessment of the manufacturability of a query part. In this paper, we extend our prior work by developing a semantic segmentation approach for machinable volume decomposition using pre-trained generative process capability models, which output per-voxel manufacturability feedback and labels of candidate machining operations for a query 3D part. Using three types of complex parts as case studies, we show that the proposed method accurately identifies machinable and non-machinable volumes with an average intersection-over-union (IoU) of 0.968 for axisymmetric machining operations, and a class-average F1 score of 0.834 for volume segmentation by machining operation.more » « less
- 
            Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large‐scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine‐grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision‐and‐language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large‐scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision‐and‐language models with adaptations for understanding landmark scene semantics. To bolster such models with fine‐grained knowledge, we leverage large‐scale Internet data containing images of similar landmarks along with weakly‐related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D‐compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large‐scale scenes with ground‐truth segmentations for multiple semantic concepts. Our results show that HaLo‐NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau‐vailab.github.io/HaLo‐NeRF/more » « less
- 
            Two segmentation methods, one atlas-based and one neural-network-based, were compared to see how well they can each automatically segment the brain stem and cerebellum in Displacement Encoding with Stimulated Echoes Magnetic Resonance Imaging (DENSE-MRI) data. The segmentation is a pre-requisite for estimating the average displacements in these regions, which have recently been proposed as biomarkers in the diagnosis of Chiari Malformation type I (CMI). In numerical experiments, the segmentations of both methods were similar to manual segmentations provided by trained experts. It was found that, overall, the neural-network-based method alone produced more accurate segmentations than the atlas-based method did alone, but that a combination of the two methods -- in which the atlas-based method is used for the segmentation of the brain stem and the neural-network is used for the segmentation of the cerebellum -- may be the most successful.more » « less
- 
            Abstract Contemporary approaches to instance segmentation in cell science use 2D or 3D convolutional networks depending on the experiment and data structures. However, limitations in microscopy systems or efforts to prevent phototoxicity commonly require recording sub-optimally sampled data that greatly reduces the utility of such 3D data, especially in crowded sample space with significant axial overlap between objects. In such regimes, 2D segmentations are both more reliable for cell morphology and easier to annotate. In this work, we propose the projection enhancement network (PEN), a novel convolutional module which processes the sub-sampled 3D data and produces a 2D RGB semantic compression, and is trained in conjunction with an instance segmentation network of choice to produce 2D segmentations. Our approach combines augmentation to increase cell density using a low-density cell image dataset to train PEN, and curated datasets to evaluate PEN. We show that with PEN, the learned semantic representation in CellPose encodes depth and greatly improves segmentation performance in comparison to maximum intensity projection images as input, but does not similarly aid segmentation in region-based networks like Mask-RCNN. Finally, we dissect the segmentation strength against cell density of PEN with CellPose on disseminated cells from side-by-side spheroids. We present PEN as a data-driven solution to form compressed representations of 3D data that improve 2D segmentations from instance segmentation networks.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    