skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local Matching Networks for Engineering Diagram Search
Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image.This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams. Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating net-work enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.  more » « less
Award ID(s):
1815528
PAR ID:
10097100
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The World Wide Web Conference, WWW 2019
Page Range / eLocation ID:
340 to 350
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Liane Lewin-Eytan, David Carmel (Ed.)
    Graph convolutional networks (GCNs), aiming to obtain node embeddings by integrating high-order neighborhood information through stacked graph convolution layers, have demonstrated great power in many network analysis tasks such as node classification and link prediction. However, a fundamental weakness of GCNs, that is, topological limitations, including over-smoothing and local homophily of topology, limits their ability to represent networks. Existing studies for solving these topological limitations typically focus only on the convolution of features on network topology, which inevitably relies heavily on network structures. Moreover, most networks are text-rich, so it is important to integrate not only document-level information, but also the local text information which is particularly significant while often ignored by the existing methods. To solve these limitations, we propose BiTe-GCN, a novel GCN architecture modeling via bidirectional convolution of topology and features on text-rich networks. Specifically, we first transform the original text-rich network into an augmented bi-typed heterogeneous network, capturing both the global document-level information and the local text-sequence information from texts.We then introduce discriminative convolution mechanisms, which performs convolution on this augmented bi-typed network, realizing the convolutions of topology and features altogether in the same system, and learning different contributions of these two parts (i.e., network part and text part), automatically for the given learning objectives. Extensive experiments on text-rich networks demonstrate that our new architecture outperforms the state-of-the-arts by a breakout improvement. Moreover, this architecture can also be applied to several e-commerce search scenes such as JD searching, and experiments on JD dataset show the superiority of the proposed architecture over the baseline methods. 
    more » « less
  2. Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure. 
    more » « less
  3. null (Ed.)
    The success of supervised learning requires large-scale ground truth labels which are very expensive, time- consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Specifically, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the cross-modality correspondence. The three networks are jointly trained (i.e. cross-modality) by verifying whether two sampled data of different modalities belong to the same object, meanwhile, the 2D convolutional neural network is additionally optimized through minimizing intra-object distance while maximizing inter-object distance of rendered images in different views (i.e. cross-view). The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks including multi-view 2D shape recognition, 3D shape recognition, multi-view 2D shape retrieval, 3D shape retrieval, and 3D part-segmentation. Extensive evaluations on all the five different tasks across different datasets demonstrate strong generalization and effectiveness of the learned 2D and 3D features by the proposed self-supervised method. 
    more » « less
  4. The goal of the crime forecasting problem is to predict different types of crimes for each geographical region (like a neighborhood or censor tract) in the near future. Since nearby regions usually have similar socioeconomic characteristics which indicate similar crime patterns, recent state-of-the-art solutions constructed a distance-based region graph and utilized Graph Neural Network (GNN) techniques for crime forecasting, because the GNN techniques could effectively exploit the latent relationships between neighboring region nodes in the graph if the edges reveal high dependency or correlation. However, this distance-based pre-defined graph can not fully capture crime correlation between regions that are far from each other but share similar crime patterns. Hence, to make a more accurate crime prediction, the main challenge is to learn a better graph that reveals the dependencies between regions in crime occurrences and meanwhile captures the temporal patterns from historical crime records. To address these challenges, we propose an end-to-end graph convolutional recurrent network called HAGEN with several novel designs for crime prediction. Specifically, our framework could jointly capture the crime correlation between regions and the temporal crime dynamics by combining an adaptive region graph learning module with the Diffusion Convolution Gated Recurrent Unit (DCGRU). Based on the homophily assumption of GNN (i.e., graph convolution works better where neighboring nodes share the same label), we propose a homophily-aware constraint to regularize the optimization of the region graph so that neighboring region nodes on the learned graph share similar crime patterns, thus fitting the mechanism of diffusion convolution. Empirical experiments and comprehensive analysis on two real-world datasets showcase the effectiveness of HAGEN. 
    more » « less
  5. Larger networks generally have greater representational power at the cost of increased computational complexity. Sparsifying such networks has been an active area of research but has been generally limited to static regularization or dynamic approaches using reinforcement learning. We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the network on a per-example basis. Our novel DeepMoE architecture increases the representational power of standard convolutional networks by adaptively sparsifying and recalibrating channel-wise features in each convolutional layer. We employ a multi-headed sparse gating network to determine the selection and scaling of channels for each input, leveraging exponential combinations of experts within a single convolutional network. Our proposed architecture is evaluated on four benchmark datasets and tasks, and we show that Deep-MoEs are able to achieve higher accuracy with lower computation than standard convolutional networks. 
    more » « less