skip to main content

Title: Brain-inspired weighted normalization for CNN image classification
We studied a local normalization paradigm, namely weighted normalization, that better reflects the current understanding of the brain. Specifically, the normalization weight is trainable, and has a more realistic surround pool selection. Weighted normalization outperformed other normalizations in image classification tasks on Cifar10, Imagenet and a customized textured MNIST dataset. The superior performance is more prominent when the CNN is shallow. The good performance of weighted normalization may be related to its statistical effect of gaussianizing the responses.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Learning Representations (ICLR)
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    Cross-correlations of ambient seismic noise are widely used for seismic velocity imaging, monitoring and ground motion analyses. A typical step in analysing noise cross-correlation functions (NCFs) is stacking short-term NCFs over longer time periods to increase the signal quality. Spurious NCFs could contaminate the stack, degrade its quality and limit its use. Many methods have been developed to improve the stacking of coherent waveforms, including earthquake waveforms, receiver functions and NCFs. This study systematically evaluates and compares the performance of eight stacking methods, including arithmetic mean or linear stacking, robust stacking, selective stacking, cluster stacking, phase-weighted stacking, time–frequency phase-weighted stacking, Nth-root stacking and averaging after applying an adaptive covariance filter. Our results demonstrate that, in most cases, all methods can retrieve clear ballistic or first arrivals. However, they yield significant differences in preserving the phase and amplitude information. This study provides a practical guide for choosing the optimal stacking method for specific research applications in ambient noise seismology. We evaluate the performance using multiple onshore and offshore seismic arrays in the Pacific Northwest region. We compare these stacking methods for NCFs calculated from raw ambient noise (referred to as Raw NCFs) and from ambient noise normalized using a one-bit clipping time normalization method (referred to as One-bit NCFs). We evaluate six metrics, including signal-to-noise ratios, phase dispersion images, convergence rate, temporal changes in the ballistic and coda waves, relative amplitude decays with distance and computational time. We show that robust stacking is the best choice for all applications (velocity tomography, monitoring and attenuation studies) using Raw NCFs. For applications using One-bit NCFs, all methods but phase-weighted and Nth-root stacking are good choices for seismic velocity tomography. Linear, robust and selective stacking methods are all equally appropriate choices when using One-bit NCFs for monitoring applications. For applications relying on accurate relative amplitudes, the linear, robust, selective and cluster stacking methods all perform well with One-bit NCFs. The evaluations in this study can be generalized to a broad range of time-series analysis that utilizes data coherence to perform ensemble stacking. Another contribution of this study is the accompanying open-source software package, StackMaster, which can be used for general purposes of time-series stacking.

    more » « less
  2. Vedaldi, A. (Ed.)
    In state-of-the-art deep neural networks, both feature normalization and feature attention have become ubiquitous. They are usually studied as separate modules, however. In this paper, we propose a light-weight integration between the two schema and present Attentive Normalization (AN). Instead of learning a single affine transformation, AN learns a mixture of affine transformations and utilizes their weighted-sum as the final affine transformation applied to re-calibrate features in an instance-specific way. The weights are learned by leveraging channel-wise feature attention. In experiments, we test the proposed AN using four representative neural architectures. In the ImageNet-1000 classification benchmark and the MS-COCO 2017 object detection and instance segmentation benchmark. AN obtains consistent performance improvement for different neural architectures in both benchmarks with absolute increase of top-1 accuracy in ImageNet-1000 between 0.5\% and 2.7\%, and absolute increase up to 1.8\% and 2.2\% for bounding box and mask AP in MS-COCO respectively. We observe that the proposed AN provides a strong alternative to the widely used Squeeze-and-Excitation (SE) module. The source codes are publicly available at \href{}{the ImageNet Classification Repo} and \href{\_Detection}{the MS-COCO Detection and Segmentation Repo}. 
    more » « less
  3. Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous attempts to this approach. We test our algorithm in a healthcare-inspired simulator, a logged dataset collected from real hospitals and continuous control tasks. These experiments show the proposed method yields less overfitting and better test performance compared to state-of-the-art batch reinforcement learning algorithms. 
    more » « less
  4. null (Ed.)
    Abstract. The lower-order moments of the drop size distribution (DSD) have generally been considered difficult to retrieve accurately from polarimetric radar data because these data are related to higher-order moments. For example, the 4.6th moment is associated with a specific differential phase and the 6th moment with reflectivity and ratio of high-order moments with differential reflectivity. Thus, conventionally, the emphasis has been to estimate rain rate (3.67th moment) or parameters of the exponential or gamma distribution for the DSD. Many double-moment “bulk” microphysical schemes predict the total number concentration (the 0th moment of the DSD, or M0) and the mixing ratio (or equivalently, the 3rd moment M3). Thus, it is difficult to compare the model outputs directly with polarimetric radar observations or, given the model outputs, forward model the radar observables. This article describes the use of double-moment normalization of DSDs and the resulting stable intrinsic shape that can be fitted by the generalized gamma (G-G) distribution. The two reference moments are M3 and M6, which are shown to be retrievable using the X-band radar reflectivity, differential reflectivity, and specific attenuation (from the iterative correction of measured reflectivity Zh using the total Φdp constraint, i.e., the iterative ZPHI method). Along with the climatological shape parameters of the G-G fit to the scaled/normalized DSDs, the lower-order moments are then retrieved more accurately than possible hitherto. The importance of measuring the complete DSD from 0.1 mm onwards is emphasized using, in our case, an optical array probe with 50 µm resolution collocated with a two-dimensional video disdrometer with about 170 µm resolution. This avoids small drop truncation and hence the accurate calculation of lower-order moments. A case study of a complex multi-cell storm which traversed an instrumented site near the CSU-CHILL radar is described for which the moments were retrieved from radar and compared with directly computed moments from the complete spectrum measurements using the aforementioned two disdrometers. Our detailed validation analysis of the radar-retrieved moments showed relative bias of the moments M0 through M2 was <15 % in magnitude, with Pearson’s correlation coefficient >0.9. Both radar measurement and parameterization errors were estimated rigorously. We show that the temporal variation of the radar-retrieved mass-weighted mean diameter with M0 resulted in coherent “time tracks” that can potentially lead to studies of precipitation evolution that have not been possible so far. 
    more » « less
  5. Roy, Sushmita (Ed.)

    Heterogeneity in different genomic studies compromises the performance of machine learning models in cross-study phenotype predictions. Overcoming heterogeneity when incorporating different studies in terms of phenotype prediction is a challenging and critical step for developing machine learning algorithms with reproducible prediction performance on independent datasets. We investigated the best approaches to integrate different studies of the same type of omics data under a variety of different heterogeneities. We developed a comprehensive workflow to simulate a variety of different types of heterogeneity and evaluate the performances of different integration methods together with batch normalization by using ComBat. We also demonstrated the results through realistic applications on six colorectal cancer (CRC) metagenomic studies and six tuberculosis (TB) gene expression studies, respectively. We showed that heterogeneity in different genomic studies can markedly negatively impact the machine learning classifier’s reproducibility. ComBat normalization improved the prediction performance of machine learning classifier when heterogeneous populations are present, and could successfully remove batch effects within the same population. We also showed that the machine learning classifier’s prediction accuracy can be markedly decreased as the underlying disease model became more different in training and test populations. Comparing different merging and integration methods, we found that merging and integration methods can outperform each other in different scenarios. In the realistic applications, we observed that the prediction accuracy improved when applying ComBat normalization with merging or integration methods in both CRC and TB studies. We illustrated that batch normalization is essential for mitigating both population differences of different studies and batch effects. We also showed that both merging strategy and integration methods can achieve good performances when combined with batch normalization. In addition, we explored the potential of boosting phenotype prediction performance by rank aggregation methods and showed that rank aggregation methods had similar performance as other ensemble learning approaches.

    more » « less