In this paper, we explore a novel online learning setting, where the online learners are presented with “doubly-streaming” data. Namely, the data instances constantly streaming in are described by feature spaces that over-time evolve, with new features emerging and old features fading away. The main challenge of this problem lies in the fact that the newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A seemingly plausible idea to overcome the challenge is to establish a relationship between the old and new feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional feature spaces that entail very complex feature interplay. Specifically. a tradeoff between onlineness, which biases shallow learners, and expressiveness, which requires deep models, is inevitable. Motivated by this, we propose a novel paradigm, named Online Learning Deep models from Data of Double Streams (OLD3S), where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building an intermediate feature mapping relationship. A key trait of OLD3S is to treat the model capacity as a learnable semantics, aiming to yield optimal model depth and parameters jointly in accordance with the complexity and non-linearity of the input data streams in an online fashion. To ablate its efficacy and applicability, two variants of OLD3S are proposed namely, OLD-Linear that learns the relationship by a linear function; and OLD-FD learns that two consecutive feature spaces pre-and-post evolution with fixed deep depth. Besides, instead of re-starting the entire learning process from scratch, OLD3S learns multiple newly emerging feature spaces in a lifelong manner, retaining the knowledge from the learned and vanished feature space to enjoy a jump-start of the new features’ learning process. Both theoretical analysis and empirical studies substantiate the viability and effectiveness of our proposed approach.
more »
« less
Robust Sparse Online Learning for Data Streams with Streaming Features
Sparse online learning has received extensive attention during the past few years. Most of existing algorithms that utilize ℓ1-norm regularization or ℓ1-ball projection assume that the feature space is fixed or changes by following explicit constraints. However, this assumption does not always hold in many real applications. Motivated by this observation, we propose a new online learning algorithm tailored for data streams described by open feature spaces, where new features can be occurred, and old features may be vanished over various time spans. Our algorithm named RSOL provides a strategy to adapt quickly to such feature dynamics by encouraging sparse model representation with an ℓ1- and ℓ2-mixed regularizer. We leverage the proximal operator of the ℓ1,2-mixed norm and show that our RSOL algorithm enjoys a closed-form solution at each iteration. A sub-linear regret bound of our proposed algorithm is guaranteed with a solid theoretical analysis. Empirical results benchmarked on nine streaming datasets validate the effectiveness of the proposed RSOL method over three state-of-the-art algorithms. Keywords: online learning, sparse learning, streaming feature selection, open feature spaces, ℓ1,2 mixed norm
more »
« less
- PAR ID:
- 10512865
- Publisher / Repository:
- SIAM
- Date Published:
- Journal Name:
- Proceedings of the 2024 SIAM International Conference on Data Mining (SDM)
- ISSN:
- 978-1-61197-803-2
- Format(s):
- Medium: X
- Location:
- Houston, TX
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Online Anomaly Detection (OAD) is critical for identifying rare yet important data points in large, dynamic, and complex data streams. A key challenge lies in achieving accurate and consistent detection of anomalies while maintaining computational and memory efficiency. Conventional OAD approaches, which depend on distributional deviations and static thresholds, struggle with model update delays and catastrophic forgetting, leading to missed detections and high false positive rates. To address these limitations, we propose a novel Streaming Anomaly Detection (SAD) method, grounded in a sparse active online learning framework. Our approach uniquely integrates ℓ1,2-norm sparse online learning with CUR decomposition-based active learning, enabling simultaneous fast feature selection and dynamic instance selection. The efficient CUR decomposition further supports real-time residual analysis for anomaly scoring, eliminating the need for manual threshold settings about temporal data distributions. Extensive experiments on diverse streaming datasets demonstrate SAD's superiority, achieving a 14.06% reduction in detection error rates compared to five state-of-the-art competitors.more » « less
-
We develop algorithms for online topology inference from streaming nodal observations and partial connectivity information; i.e., a priori knowledge on the presence or absence of a few edges may be available as in the link prediction problem. The observations are modeled as stationary graph signals generated by local diffusion dynamics on the unknown network. Said stationarity assumption implies the simultaneous diagonalization of the observations' covariance matrix and the so-called graph shift operator (GSO), here the adjacency matrix of the sought graph. When the GSO eigenvectors are perfectly obtained from the ensemble covariance, we examine the structure of the feasible set of adjacency matrices and its dependency on the prior connectivity information available. In practice one can only form an empirical estimate of the covariance matrix, so we develop an alternating algorithm to find a sparse GSO given its imperfectly estimated eigenvectors. Upon sensing new diffused observations in the streaming setting, we efficiently update eigenvectors and perform only one (or a few) online iteration(s) of the proposed algorithm until a new datum is observed. Numerical tests showcase the effectiveness of the novel batch and online algorithms in recovering real-world graphs.more » « less
-
null (Ed.)We develop online graph learning algorithms from streaming network data. Our goal is to track the (possibly) time-varying network topology, and affect memory and computational savings by processing the data on-the-fly as they are acquired. The setup entails observations modeled as stationary graph signals generated by local diffusion dynamics on the unknown network. Moreover, we may have a priori information on the presence or absence of a few edges as in the link prediction problem. The stationarity assumption implies that the observations’ covariance matrix and the so-called graph shift operator (GSO—a matrix encoding the graph topology) commute under mild requirements. This motivates formulating the topology inference task as an inverse problem, whereby one searches for a sparse GSO that is structurally admissible and approximately commutes with the observations’ empirical covariance matrix. For streaming data, said covariance can be updated recursively, and we show online proximal gradient iterations can be brought to bear to efficiently track the time-varying solution of the inverse problem with quantifiable guarantees. Specifically, we derive conditions under which the GSO recovery cost is strongly convex and use this property to prove that the online algorithm converges to within a neighborhood of the optimal time-varying batch solution. Numerical tests illustrate the effectiveness of the proposed graph learning approach in adapting to streaming information and tracking changes in the sought dynamic network.more » « less
-
null (Ed.)We consider the problem of explainable k-medians and k-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). In this problem, our goal is to find a threshold decision tree that partitions data into k clusters and minimizes the k-medians or k-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is O(log k) competitive with k-medians with ℓ1 norm and O(k) competitive with k-means. This is an improvement over the previous guarantees of O(k) and O(k^2) by Dasgupta et al (2020). We also provide a new algorithm which is O(log^{3}{2}k) competitive for k-medians with ℓ2 norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of Ω(log k) for k-medians; in this work, we prove a lower bound of Ω(k) for k-means. We also provide a lower bound of Ω(log k) for k-medians with ℓ2 norm.more » « less
An official website of the United States government

