Abstract Automated manufacturing feature recognition is a crucial link between computer-aided design and manufacturing, facilitating process selection and other downstream tasks in computer-aided process planning. While various methods such as graph-based, rule-based, and neural networks have been proposed for automatic feature recognition, they suffer from poor scalability or computational inefficiency. Recently, voxel-based convolutional neural networks have shown promise in solving these challenges but incur a tradeoff between computational cost and feature resolution. This paper investigates a computationally efficient sparse voxel-based convolutional neural network for manufacturing feature recognition, specifically, an octree-based sparse voxel convolutional neural network. This model is trained on a large-scale manufacturing feature dataset, and its performance is compared to a voxel-based feature recognition model (FeatureNet). The results indicate that the octree-based model yields higher feature recognition accuracy (99.5% on the test dataset) with 44% lower graphics processing unit (GPU) memory consumption than a voxel-based model of comparable resolution. In addition, increasing the resolution of the octree-based model enables recognition of finer manufacturing features. These results indicate that a sparse voxel-based convolutional neural network is a computationally efficient deep learning model for manufacturing feature recognition to enable process planning automation. Moreover, the sparse voxel-based neural network demonstrated comparable performance to a boundary representation-based feature recognition neural network, achieving similar accuracy in single-feature recognition without having access to the exact 3D shape descriptors.
more »
« less
Condensing CNNs with Partial Differential Equations
Convolutional neural networks (CNNs) rely on the depth of the architecture to obtain complex features. It results in computationally expensive models for low-resource IoT devices. Convolutional operators are local and restricted in the receptive field, which increases with depth. We explore partial differential equations (PDEs) that offer a global receptive field without the added overhead of maintaining large kernel convolutional filters. We propose a new feature layer, called the Global layer, that enforces PDE constraints on the feature maps, resulting in rich features. These constraints are solved by embedding iterative schemes in the network. The proposed layer can be embedded in any deep CNN to transform it into a shallower network. Thus, resulting in compact and computationally efficient architectures achieving similar performance as the original network. Our experimental evaluation demonstrates that architectures with global layers require 2 − 5× less computational and storage budget without any significant loss in performance
more »
« less
- PAR ID:
- 10447712
- Date Published:
- Journal Name:
- CVPR
- Page Range / eLocation ID:
- 600 to 609
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Despite the advances in the field of Face Recognition (FR), the precision of these methods is not yet sufficient. To improve the FR performance, this paper proposes a technique to aggregate the outputs of two state-of-the-art (SOTA) deep FR models, namely ArcFace and AdaFace. In our approach, we leverage the transformer attention mechanism to exploit the relationship between different parts of two feature maps. By doing so, we aim to enhance the overall discriminative power of the FR system. One of the challenges in feature aggregation is the effective modeling of both local and global dependencies. Conventional transformers are known for their ability to capture long-range dependencies, but they often struggle with modeling local dependencies accurately. To address this limitation, we augment the self-attention mechanism to capture both local and global dependencies effectively. This allows our model to take advantage of the overlapping receptive fields present in corresponding locations of the feature maps. However, fusing two feature maps from different FR models might introduce redundancies to the face embedding. Since these models often share identical backbone architectures, the resulting feature maps may contain overlapping information, which can mislead the training process. To overcome this problem, we leverage the principle of Information Bottleneck to obtain a maximally informative facial representation. This ensures that the aggregated features retain the most relevant and discriminative information while minimizing redundant or misleading details. To evaluate the effectiveness of our proposed method, we conducted experiments on popular benchmarks and compared our results with state-of-the- art algorithms. The consistent improvement we observed in these benchmarks demonstrate the efficacy of our approach in enhancing FR performance. Moreover, our model aggregation framework offers a novel perspective on model fusion and establishes a powerful paradigm for feature aggregation using transformer-based attention mechanisms.more » « less
-
Vision transformers (ViTs) have dominated computer vision in recent years. However, ViTs are computationally expensive and not well suited for mobile devices; this led to the prevalence of convolutional neural network (CNN) and ViT-based hybrid models for mobile vision applications. Recently, Vision GNN (ViG) and CNN hybrid models have also been proposed for mobile vision tasks. However, all of these methods remain slower compared to pure CNN-based models. In this work, we propose Multi-Level Dilated Convolutions to devise a purely CNN-based mobile backbone. Using Multi-Level Dilated Convolutions allows for a larger theoretical receptive field than standard convolutions. Different levels of dilation also allow for interactions between the short-range and long-range features in an image. Experiments show that our proposed model outperforms state-of-the-art (SOTA) mobile CNN, ViT, ViG, and hybrid architectures in terms of accuracy and/or speed on image classification, object detection, instance segmentation, and semantic segmentation. Our fastest model, RapidNet-Ti, achieves 76.3% top-1 accuracy on ImageNet-1K with 0.9 ms inference latency on an iPhone 13 mini NPU, which is faster and more accurate than MobileNetV2x1.4 (74.7% top-1 with 1.0 ms latency). Our work shows that pure CNN architectures can beat SOTA hybrid and ViT models in terms of accuracy and speed when designed properlymore » « less
-
Graph convolutional neural network architectures combine feature extraction and convolutional layers for hyperspectral image classification. An adaptive neighborhood aggregation method based on statistical variance integrating the spatial information along with the spectral signature of the pixels is proposed for improving graph convolutional network classification of hyperspectral images. The spatial-spectral information is integrated into the adjacency matrix and processed by a single-layer graph convolutional network. The algorithm employs an adaptive neighborhood selection criteria conditioned by the class it belongs to. Compared to fixed window-based feature extraction, this method proves effective in capturing the spectral and spatial features with variable pixel neighborhood sizes. The experimental results from the Indian Pines, Houston University, and Botswana Hyperion hyperspectral image datasets show that the proposed AN-GCN can significantly improve classification accuracy. For example, the overall accuracy for Houston University data increases from 81.71% (MiniGCN) to 97.88% (AN-GCN). Furthermore, the AN-GCN can classify hyperspectral images of rice seeds exposed to high day and night temperatures, proving its efficacy in discriminating the seeds under increased ambient temperature treatments.more » « less
-
Theunissen, Frédéric E. (Ed.)System identification techniques—projection pursuit regression models (PPRs) and convolutional neural networks (CNNs)—provide state-of-the-art performance in predicting visual cortical neurons’ responses to arbitrary input stimuli. However, the constituent kernels recovered by these methods are often noisy and lack coherent structure, making it difficult to understand the underlying component features of a neuron’s receptive field. In this paper, we show that using a dictionary of diverse kernels with complex shapes learned from natural scenes based on efficient coding theory, as the front-end for PPRs and CNNs can improve their performance in neuronal response prediction as well as algorithmic data efficiency and convergence speed. Extensive experimental results also indicate that these sparse-code kernels provide important information on the component features of a neuron’s receptive field. In addition, we find that models with the complex-shaped sparse code front-end are significantly better than models with a standard orientation-selective Gabor filter front-end for modeling V1 neurons that have been found to exhibit complex pattern selectivity. We show that the relative performance difference due to these two front-ends can be used to produce a sensitive metric for detecting complex selectivity in V1 neurons.more » « less
An official website of the United States government

