We propose a sequential algorithm for learning sparse radial basis approximations for streaming data. The initial phase of the algorithm formulates the RBF training as a convex optimization problem with an objective function on the expansion weights while the data fitting problem imposed only as an ℓ∞-norm constraint. Each new data point observed is tested for feasibility, i.e., whether the data fitting constraint is satisfied. If so, that point is discarded and no model update is required. If it is infeasible, a new basic variable is added to the linear program. The result is a primal infeasible-dual feasible solution. The dual simplex algorithm is applied to determine a new optimal solution. A large fraction of the streaming data points does not require updates to the RBF model since they are similar enough to previously observed data and satisfy the data fitting constraints. The structure of the simplex algorithm makes the update to the solution particularly efficient given the inverse of the new basis matrix is easily computed from the old inverse. The second phase of the algorithm involves a non-convex refinement of the convex problem. Given the sparse nature of the LP solution, the computational expense of the non-convex algorithm is greatly reduced. We have also found that a small subset of the training data that includes the novel data identified by the algorithm can be used to train the non-convex optimization problem with substantial computation savings and comparable errors on the test data. We illustrate the method on the Mackey-Glass chaotic time-series, the monthly sunspot data, and a Fort Collins, Colorado weather data set. In each case we compare the results to artificial neural networks (ANN) and standard skew-RBFs.
more »
« less
Beyond the Simplex: Hadamard-Infused Deep Sparse Representations for Enhanced Similarity Measures
Graphical representations are essential for comprehending high-dimensional data across diverse fields, yet their construction often presents challenges due to the limitations of traditional methods. This paper introduces a novel methodology, Beyond Simplex Sparse Representation (BSSR), which addresses critical issues such as parameter dependencies, scale inconsistencies, and biased data interpretation in constructing similarity graphs. BSSR leverages the robustness of sparse representation to noise and outliers, while incorporating deep learning techniques to enhance scalability and accuracy. Furthermore, we tackle the optimization of the standard simplex, a pervasive problem, by introducing a transformative approach that converts the constraint into a smooth manifold using the Hadamard parametrization. Our proposed Tangent Perturbed Riemannian Gradient Descent (T-PRGD) algorithm provides an efficient and scalable solution for optimization problems with standard simplex or L1-norm sphere constraints. These contributions, including the BSSR methodology, robustness and scalability through deep representation, shift-invariant sparse representation, and optimization on the unit sphere, represent major advancements in the field. Our work offers novel perspectives on data representation challenges and sets the stage for more accurate analysis in the era of big data.
more »
« less
- PAR ID:
- 10508020
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE ICKG 2023
- ISBN:
- 979-8-3503-0709-2
- Page Range / eLocation ID:
- 168 to 175
- Subject(s) / Keyword(s):
- Data Similarity, Sparse Representation, Simplex Constraint, Riemannian Optimization
- Format(s):
- Medium: X
- Location:
- Shanghai, China
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Medical image analysis using deep learning has recently been prevalent, showing great performance for various downstream tasks including medical image segmentation and its sibling, volumetric image segmentation. Particularly, a typical volumetric segmentation network strongly relies on a voxel grid representation which treats volumetric data as a stack of individual voxel `slices', which allows learning to segment a voxel grid to be as straightforward as extending existing image-based segmentation networks to the 3D domain. However, using a voxel grid representation requires a large memory footprint, expensive test-time and limiting the scalability of the solutions. In this paper, we propose Point-Unet, a novel method that incorporates the eciency of deep learning with 3D point clouds into volumetric segmentation. Our key idea is to rst predict the regions of interest in the volume by learning an attentional probability map, which is then used for sampling the volume into a sparse point cloud that is subsequently segmented using a point-based neural network. We have conducted the experiments on the medical volumetric segmentation task with both a small-scale dataset Pancreas and large-scale datasets BraTS18, BraTS19, and BraTS20 challenges. A comprehensive benchmark on dierent metrics has shown that our context-aware Point-Unet robustly outperforms the SOTA voxel-based networks at both accuracies, memory usage during training, and time consumption during testing.more » « less
-
Duarte, Marco (Ed.)We propose K-Deep Simplex (KDS) which, given a set of data points, learns a dictionary comprising synthetic landmarks, along with representation coefficients supported on a simplex. KDS employs a local weighted l1 penalty that encourages each data point to represent itself as a convex combination of nearby landmarks. We solve the proposed optimization program using alternating minimization and design an efficient, interpretable autoencoder using algorithm unrolling. We theoretically analyze the proposed program by relating the weighted l1 penalty in KDS to a weighted ℓ0 program. Assuming that the data are generated from a Delaunay triangulation, we prove the equivalence of the weighted l1 and weighted l0 programs. We further show the stability of the representation coefficients under mild geometrical assumptions. If the representation coefficients are fixed, we prove that the sub-problem of minimizing over the dictionary yields a unique solution. Further, we show that low-dimensional representations can be efficiently obtained from the covariance of the coefficient matrix. Experiments show that the algorithm is highly efficient and performs competitively on synthetic and real data sets.more » « less
-
This work studies the adversarial robustness of parametric functions composed of a linear predic-tor and a nonlinear representation map. Our analysis relies on sparse local Lipschitzness (SLL),an extension of local Lipschitz continuity that better captures the stability and reduced effectivedimensionality of predictors upon local perturbations. SLL functions preserve a certain degree ofstructure, given by the sparsity pattern in the representation map, and include several popular hy-pothesis classes, such as piecewise linear models, Lasso and its variants, and deep feedforward ReLUnetworks. Compared with traditional Lipschitz analysis, we provide a tighter robustness certificateon the minimal energy of an adversarial example, as well as tighter data-dependent nonuniformbounds on the robust generalization error of these predictors. We instantiate these results for the case of deep neural networks and provide numerical evidence that supports our results, shedding new insights into natural regularization strategies to increase the robustness of these models.more » « less
-
High-dimensional data is commonly encountered in various applications, including genomics, as well as image and video processing. Analyzing, computing, and visualizing such data pose significant challenges. Feature extraction methods are crucial in addressing these challenges by obtaining compressed representations that are suitable for analysis and downstream tasks. One effective technique along these lines is sparse coding, which involves representing data as a sparse linear combination of a set of exemplars. In this study, we propose a local sparse coding framework within the context of a classification problem. The objective is to predict the label of a given data point based on labeled training data. The primary optimization problem encourages the representation of each data point using nearby exemplars. We leverage the optimized sparse representation coefficients to predict the label of a test data point by assessing its similarity to the sparse representations of the training data. The proposed framework is computationally efficient and provides interpretable sparse representations. To illustrate the practicality of our proposed framework, we apply it to agriculture for the classification of crop diseases.more » « less
An official website of the United States government

