skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Delving Deeper into Anti-aliasing in ConvNets
Aliasing refers to the phenomenon that high frequency signals degenerate into com- pletely different ones after sampling. It arises as a problem in the context of deep learning as downsampling layers are widely adopted in deep architectures to reduce parameters and computation. The standard solution is to apply a low-pass filter (e.g., Gaussian blur) before downsampling [37]. However, it can be suboptimal to apply the same filter across the entire content, as the frequency of feature maps can vary across both spatial locations and feature channels. To tackle this, we propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and chan- nel group of the input feature maps. We investigate the effectiveness and generalization of the proposed method across multiple tasks including ImageNet classification, COCO instance segmentation, and Cityscapes semantic segmentation. Qualitative and quanti- tative results demonstrate that our approach effectively adapts to the different feature frequencies to avoid aliasing while preserving useful information for recognition. Code is available at https://maureenzou.github.io/ddac/.  more » « less
Award ID(s):
1934568
PAR ID:
10281893
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the British Machine Vision Conference (BMVC), 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling. It arises as a problem in the context of deep learning as downsampling layers are widely adopted in deep architectures to reduce parameters and computation. The standard solution is to apply a lowpass filter (e.g., Gaussian blur) before downsampling. However, it can be suboptimal to apply the same filter across the entire content, as the frequency of feature maps can vary across both spatial locations and feature channels. To tackle this, we propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and channel group of the input feature maps. We investigate the effectiveness and generalization of the proposed method across multiple tasks, including image classification, semantic segmentation, instance segmentation, video instance segmentation, and image-to-image translation. Both qualitative and quantitative results demonstrate that our approach effectively adapts to the different feature frequencies to avoid aliasing while preserving useful information for recognition. Code is available at https://maureenzou.github.io/ddac/ 
    more » « less
  2. Graph Neural Networks have recently become a prevailing paradigm for various high-impact graph analytical problems. Existing efforts can be mainly categorized as spectral-based and spatial-based methods. The major challenge for the former is to find an appropriate graph filter to distill discriminative information from input signals for learning. Recently, myriads of explorations are made to achieve better graph filters, e.g., Graph Convolutional Network (GCN), which leverages Chebyshev polynomial truncation to seek an approximation of graph filters and bridge these two families of methods. Nevertheless, it has been shown in recent studies that GCN and its variants are essentially employing fixed low-pass filters to perform information denoising. Thus their learning capability is rather limited and may over-smooth node representations at deeper layers. To tackle these problems, we develop a novel graph neural network framework AdaGNN with a well-designed adaptive frequency response filter. At its core, AdaGNN leverages a simple but elegant trainable filter that spans across multiple layers to capture the varying importance of different frequency components for node representation learning. The inherent differences among different feature channels are also well captured by the filter. As such, it empowers AdaGNN with stronger expressiveness and naturally alleviates the over-smoothing problem. We empirically validate the effectiveness of the proposed framework on various benchmark datasets. Theoretical analysis is also provided to show the superiority of the proposed AdaGNN. The open-source implementation of AdaGNN can be found here: https://github.com/yushundong/AdaGNN. 
    more » « less
  3. null (Ed.)
    Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.60% and 0.63% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1.42%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset. 
    more » « less
  4. The hippocampus is a crucial brain structure involved in memory formation, spatial navigation, emotional regulation, and learning. An accurate MRI image segmentation of the human hippocampus plays an important role in multiple neuro-imaging research and clinical practice, such as diagnosing neurological diseases and guiding surgical interventions. While most hippocampus segmentation studies focus on using T1-weighted or T2-weighted MRI scans, we explore the use of diffusion-weighted MRI (dMRI), which offers unique insights into the microstructural properties of the hippocampus. Particularly, we utilize various anisotropy measures derived from diffusion MRI (dMRI), including fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity, for a multi-contrast deep learning approach to hippocampus segmentation. To exploit the unique benefits offered by various contrasts in dMRI images for accurate hippocampus segmentation, we introduce an innovative multimodal deep learning architecture integrating cross-attention mechanisms. Our proposed framework comprises a multi-head encoder designed to transform each contrast of dMRI images into distinct latent spaces, generating separate image feature maps. Subsequently, we employ a gated cross-attention unit following the encoder, which facilitates the creation of attention maps between every pair of image contrasts. These attention maps serve to enrich the feature maps, thereby enhancing their effectiveness for the segmentation task. In the final stage, a decoder is employed to produce segmentation predictions utilizing the attention-enhanced feature maps. The experimental outcomes demonstrate the efficacy of our framework in hippocampus segmentation and highlight the benefits of using multi-contrast images over single-contrast images in diffusion MRI image segmentation. 
    more » « less
  5. null (Ed.)
    Texture-based features computed on eye movement scan paths have recently been proposed for eye movement biometric applications. Feature vectors were extracted within this prior work by computing the mean and standard deviation of the resulting images obtained through application of a Gabor filter bank. This paper describes preliminary work exploring an alternative technique for extracting features from Gabor filtered scan path images. Namely, features vectors are obtained by downsampling the filtered images, thereby retaining structured spatial information within the feature vector. The proposed technique is validated at various downsampling scales for data collected from 94 subjects during free-viewing of a fantasy movie trailer. The approach is demonstrated to reduce EER versus the previously proposed statistical summary technique by 11.7% for the best evaluated downsampling parameter. 
    more » « less