Feature spaces in the deep layers of convolutional neural networks (CNNs) are often very high-dimensional and difficult to inter-pret. However, convolutional layers consist of multiple channels that are activated by different types of inputs, which suggests that more insights may be gained by studying the channels and how they relate to each other. In this paper, we first analyze theoretically channel-wise non-negative kernel (CW-NNK) regression graphs, which allow us to quantify the overlap between channels and, indirectly, the intrinsic dimension of the data representation manifold. We find that redundancy between channels is significant and varies with the layer depth and the level of regularization during training. Additionally, we observe that there is a correlation between channel overlap in the last convolutional layer and generalization performance. Our experimental results demonstrate that these techniques can lead to a better understanding of deep representations.
This content will become publicly available on December 1, 2022
Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation
State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings AsiaPacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC
- Sponsoring Org:
- National Science Foundation
More Like this
Obeid, I. ; Selesnick, I. (Ed.)The Neural Engineering Data Consortium at Temple University has been providing key data resources to support the development of deep learning technology for electroencephalography (EEG) applications [1-4] since 2012. We currently have over 1,700 subscribers to our resources and have been providing data, software and documentation from our web site  since 2012. In this poster, we introduce additions to our resources that have been developed within the past year to facilitate software development and big data machine learning research. Major resources released in 2019 include: ● Data: The most current release of our open source EEG data is v1.2.0 of TUH EEG and includes the addition of 3,874 sessions and 1,960 patients from mid-2015 through 2016. ● Software: We have recently released a package, PyStream, that demonstrates how to correctly read an EDF file and access samples of the signal. This software demonstrates how to properly decode channels based on their labels and how to implement montages. Most existing open source packages to read EDF files do not directly address the problem of channel labels . ● Documentation: We have released two documents that describe our file formats and data representations: (1) electrodes and channels : describes how tomore »
Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression (LARS), among others. These methods typically add variables into the model one by one. For such selection procedures, it is crucial to find a stopping criterion that controls model complexity. One of the most commonly used techniques to this end is cross-validation (CV) which, in spite of its popularity, has two major drawbacks: expensive computational cost and lack of statistical interpretation. To overcome these drawbacks, we introduce a flexible and efficient test-based variable selection approach that can be incorporated into any sequential selection procedure. The test, which is on the overall signal in the remaining inactive variables, is based on the maximal absolute partial correlation between the inactive variables and the response given active variables. We develop the asymptotic null distribution of the proposed test statistic as the dimension tends to infinity uniformly in the sample size. We also show that the test is consistent. With this test, at each step of the selection, a new variable is included if and only if the -value is below somemore »
Obeid, Iyad Selesnick (Ed.)Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy . As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency . However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours . Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor  and Persyst 14 . In this abstract, we describe our efforts to transform a high-performance offline seizure detection system  into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing . The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC)  features from each EEGmore »
Abstract Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve themore »