skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: NNK-Means: Data summarization using dictionary learning with non-negative kernel regression
An increasing number of systems are being designed by gathering significant amounts of data and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. As task complexity, data size, and parameters all increase to millions or even billions, data summarization is becoming a major challenge. In this work, we investigate data summarization via dictionary learning (DL), leveraging the properties of recently introduced non-negative kernel regression (NNK) graphs. Our proposed NNK-Means, unlike previous DL techniques, such as kSVD, learns geometric dictionaries with atoms that are representative of the input data space. Experiments show that summarization using NNK-Means can provide better class separation compared to linear and kernel versions of kMeans and kSVD. Moreover, NNK-Means is scalable, with runtime complexity similar to that of kMeans.  more » « less
Award ID(s):
2009032
PAR ID:
10433755
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2022 30th European Signal Processing Conference (EUSIPCO)
Page Range / eLocation ID:
2161 to 2165
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Several machine learning methods leverage the idea of locality by using k-nearest neighbor (KNN) techniques to design better pattern recognition models. However, the choice of KNN parameters such as k is often made experimentally, e.g., via cross-validation, leading to local neighborhoods without a clear geometric interpretation. In this paper, we replace KNN with our recently introduced polytope neighborhood scheme - Non Negative Kernel regression (NNK). NNK formulates neighborhood selection as a sparse signal approximation problem and is adaptive to the local distribution of samples in the neighborhood of the data point of interest. We analyze the benefits of local neighborhood construction based on NNK. In particular, we study the generalization properties of local interpolation using NNK and present data dependent bounds in the non asymptotic setting. The applicability of NNK in transductive few shot learning setting and for measuring distance between two datasets is demonstrated. NNK exhibits robust, superior performance in comparison to standard locally weighted neighborhood methods. 
    more » « less
  2. null (Ed.)
    Clustering algorithms are an important part of unsupervised machine learning. With Big Data, applying clustering algorithms such as KMeans has become a challenge due to the significantly larger volume of data and the computational complexity of the standard approach, Lloyd's algorithm. This work aims to tackle this challenge by transforming the classic clustering KMeans algorithm to be highly scalable and to be able to operate on Big Data. We leverage the distributed computing environment of the HPCC Systems platform. The presented KMeans algorithm adopts a hybrid parallelism method to achieve a massively scalable parallel KMeans. Our approach can save a significant amount of time of researchers and machine learning practitioners who train hundreds of models on a daily basis. The performance is evaluated with different size datasets and clusters and the results show a significant scalabilty of the scalable parallel KMeans algorithm. 
    more » « less
  3. Deep learning (DL) has been increasingly explored in low-dose CT image denoising. DL products have also been submitted to the FDA for premarket clearance. While having the potential to improve image quality over the filtered back projection method (FBP) and produce images quickly, generalizability of DL approaches is a major concern because the performance of a DL network can depend highly on the training data. In this work we take a residual encoder-decoder convolutional neural network (REDCNN)-based CT denoising method as an example. We investigate the effect of the scan parameters associated with the training data on the performance of this DL-based CT denoising method and identifies the scan parameters that may significantly impact its performance generalizability. This abstract particularly examines these three parameters: reconstruction kernel, dose level and slice thickness. Our preliminary results indicate that the DL network may not generalize well between FBP reconstruction kernels, but is insensitive to slice thickness for slice-wise denoising. The results also suggest that training with mixed dose levels improves denoising performance. 
    more » « less
  4. We present a novel framework to represent sets of time-varying signals as dynamic graphs using the non-negative kernel (NNK) graph construction. We extend the original NNK framework to allow explicit delays as part of the graph construction, so that unlike in NNK, two nodes can be connected with an edge corresponding to a non-zero time delay, if there is higher similarity between the signals after shifting one of them. We also propose to characterize the similarity between signals at different nodes using the node degree and clustering coefficients of their respective visibility graphs. Graph edges that can representing temporal delays, we provide a new perspective that enables us to see the effect of synchronization in graph construction for time-series signals. For both temperature and EEG datasets, we show that our proposed approach can achieve sparse and interpretable graph representations. Furthermore, the proposed method can be useful in characterizing different EEG experiments using sparsity. 
    more » « less
  5. Blanchette, Jasmin; Kovacs, Laura; Pattinson, Dirk (Ed.)
    Definition packages in theorem provers provide users with means of defining and organizing concepts of interest. This system description presents a new definition package for the hybrid systems theorem prover KeYmaera X based on differential dynamic logic (dL). The package adds KeYmaera X support for user-defined smooth functions whose graphs can be implicitly characterized by dL formulas. Notably, this makes it possible to implicitly characterize functions, such as the exponential and trigonometric functions, as solutions of differential equations and then prove properties of those functions using dL's differential equation reasoning principles. Trustworthiness of the package is achieved by minimally extending KeYmaera X's soundness-critical kernel with a single axiom scheme that expands function occurrences with their implicit characterization. Users are provided with a high-level interface for defining functions and non-soundness-critical tactics that automate low-level reasoning over implicit characterizations in hybrid system proofs. 
    more » « less