NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

Gulati, Aryan; Dong, Xingjian; Hurtado, Carlos; Shekkizhar, Sarath; Swayamdipta, Swabha; Ortega, Antonio (November 2024, ACL)

As language models become more general pur- pose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distribu- tions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detec- tion based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11× improve- ment in inference time and 87% reduction in storage requirements). It outperforms existing approaches by up to 4 AUROC points on four benchmarks. We also introduce an entropy- constrained version of our algorithm, leading to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings. Our source code is available on Github.
more » « less
Full Text Available
Towards a Geometric Understanding of Spatiotemporal Graph Convolution Networks

https://doi.org/10.1109/OJSP.2024.3396635

Das, Pratyusha; Shekkizhar, Sarath; Ortega, Antonio (January 2024, IEEE Open Journal of Signal Processing)

Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for skeleton -based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literature, to the best of our knowledge, there exists no study on the layerwise explainability of the embeddings learned on spatiotemporal data using STGCNs. In this paper, we first propose to use a local Dataset Graph (DS-Graph) obtained from the feature representation of input data at each layer to develop an understanding of the layer-wise embedding geometry of the STGCN. To do so, we develop a window-based dynamic time warping (DTW) method to compute the distance between data sequences with varying temporal lengths. To validate our findings, we have developed a layer-specific Spatiotemporal Graph Gradient-weighted Class Activation Mapping (L-STG-GradCAM) technique tailored for spatiotemporal data. This approach enables us to visually analyze and interpret each layer within the STGCN network. We characterize the functions learned by each layer of the STGCN using the label smoothness of the representation and visualize them using our L-STG-GradCAM approach. Our proposed method is generic and can yield valuable insights for STGCN architectures in different applications. However, this paper focuses on the human activity recognition task as a representative application. Our experiments show that STGCN models learn representations that capture general human motion in their initial layers while discriminating different actions only in later layers. This justifies experimental observations showing that fine-tuning deeper layers works well for transfer between related tasks. We provide experimental evidence for different human activity datasets and advanced spatiotemporal graph networks to validate that the proposed method is general enough to analyze any STGCN model and can be useful for drawing insight into networks in various scenarios. We also show that noise at the input has a limited effect on label smoothness, which can help justify the robustness of STGCNs to noise.
more » « less
Full Text Available
Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

https://doi.org/10.18653/v1/2024.findings-emnlp.758

Gulati, Aryan; Dong, Xingjian; Hurtado, Carlos; Shekkizhar, Sarath; Swayamdipta, Swabha; Ortega, Antonio (January 2024, Association for Computational Linguistics)

Full Text Available
Study of Manifold Geometry Using Multiscale Non-Negative Kernel Graphs

https://doi.org/10.1109/ICASSP49357.2023.10095956

Hurtado, Carlos; Shekkizhar, Sarath; Ruiz-Hidalgo, Javier; Ortega, Antonio (June 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Modern machine learning systems are increasingly trained on large amounts of data embedded in high-dimensional spaces. Often this is done without analyzing the structure of the dataset. In this work, we propose a framework to study the geometric structure of the data. We make use of our recently introduced non-negative kernel (NNK) regression graphs to estimate the point density, intrinsic dimension, and linearity of the data manifold (curvature). We further generalize the graph construction and geometric estimation to multiple scales by iteratively merging neighborhoods in the input data. Our experiments demonstrate the effectiveness of our proposed approach over other baselines in estimating the local geometry of the data manifolds on synthetic and real datasets.
more » « less
Full Text Available
NNK-Means: Data summarization using dictionary learning with non-negative kernel regression

https://doi.org/10.23919/EUSIPCO55093.2022.9909928

Shekkizhar, Sarath; Ortega, Antonio (August 2022, 2022 30th European Signal Processing Conference (EUSIPCO))

An increasing number of systems are being designed by gathering significant amounts of data and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. As task complexity, data size, and parameters all increase to millions or even billions, data summarization is becoming a major challenge. In this work, we investigate data summarization via dictionary learning (DL), leveraging the properties of recently introduced non-negative kernel regression (NNK) graphs. Our proposed NNK-Means, unlike previous DL techniques, such as kSVD, learns geometric dictionaries with atoms that are representative of the input data space. Experiments show that summarization using NNK-Means can provide better class separation compared to linear and kernel versions of kMeans and kSVD. Moreover, NNK-Means is scalable, with runtime complexity similar to that of kMeans.
more » « less
Full Text Available
Channel Redundancy and Overlap in Convolutional Neural Networks with Channel-Wise NNK Graphs

https://doi.org/10.1109/ICASSP43922.2022.9746186

Bonet, David; Ortega, Antonio; Ruiz-Hidalgo, Javier; Shekkizhar, Sarath (May 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Feature spaces in the deep layers of convolutional neural networks (CNNs) are often very high-dimensional and difficult to inter-pret. However, convolutional layers consist of multiple channels that are activated by different types of inputs, which suggests that more insights may be gained by studying the channels and how they relate to each other. In this paper, we first analyze theoretically channel-wise non-negative kernel (CW-NNK) regression graphs, which allow us to quantify the overlap between channels and, indirectly, the intrinsic dimension of the data representation manifold. We find that redundancy between channels is significant and varies with the layer depth and the level of regularization during training. Additionally, we observe that there is a correlation between channel overlap in the last convolutional layer and generalization performance. Our experimental results demonstrate that these techniques can lead to a better understanding of deep representations.
more » « less
Full Text Available
Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

Bonet, David; Ortega, Antonio; Ruiz-Hidalgo, Javier; Shekkizhar, Sarath (December 2021, Proceedings AsiaPacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC)

State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.
more » « less
Full Text Available
Revisiting Local Neighborhood Methods in Machine Learning

https://doi.org/10.1109/DSLW51110.2021.9523409

Shekkizhar, Sarath; Ortega, Antonio (June 2021, 2021 IEEE Data Science and Learning Workshop (DSLW))

Several machine learning methods leverage the idea of locality by using k-nearest neighbor (KNN) techniques to design better pattern recognition models. However, the choice of KNN parameters such as k is often made experimentally, e.g., via cross-validation, leading to local neighborhoods without a clear geometric interpretation. In this paper, we replace KNN with our recently introduced polytope neighborhood scheme - Non Negative Kernel regression (NNK). NNK formulates neighborhood selection as a sparse signal approximation problem and is adaptive to the local distribution of samples in the neighborhood of the data point of interest. We analyze the benefits of local neighborhood construction based on NNK. In particular, we study the generalization properties of local interpolation using NNK and present data dependent bounds in the non asymptotic setting. The applicability of NNK in transductive few shot learning setting and for measuring distance between two datasets is demonstrated. NNK exhibits robust, superior performance in comparison to standard locally weighted neighborhood methods.
more » « less
Full Text Available

Search for: All records