Sparse online learning has received extensive attention during the past few years. Most of existing algorithms that utilize ℓ1-norm regularization or ℓ1-ball projection assume that the feature space is fixed or changes by following explicit constraints. However, this assumption does not always hold in many real applications. Motivated by this observation, we propose a new online learning algorithm tailored for data streams described by open feature spaces, where new features can be occurred, and old features may be vanished over various time spans. Our algorithm named RSOL provides a strategy to adapt quickly to such feature dynamics by encouraging sparse model representation with an ℓ1- and ℓ2-mixed regularizer. We leverage the proximal operator of the ℓ1,2-mixed norm and show that our RSOL algorithm enjoys a closed-form solution at each iteration. A sub-linear regret bound of our proposed algorithm is guaranteed with a solid theoretical analysis. Empirical results benchmarked on nine streaming datasets validate the effectiveness of the proposed RSOL method over three state-of-the-art algorithms. Keywords: online learning, sparse learning, streaming feature selection, open feature spaces, ℓ1,2 mixed norm 
                        more » 
                        « less   
                    This content will become publicly available on September 1, 2026
                            
                            A Novel Sparse Active Online Learning Framework for Fast and Accurate Streaming Anomaly Detection Over Data Streams
                        
                    
    
            Online Anomaly Detection (OAD) is critical for identifying rare yet important data points in large, dynamic, and complex data streams. A key challenge lies in achieving accurate and consistent detection of anomalies while maintaining computational and memory efficiency. Conventional OAD approaches, which depend on distributional deviations and static thresholds, struggle with model update delays and catastrophic forgetting, leading to missed detections and high false positive rates. To address these limitations, we propose a novel Streaming Anomaly Detection (SAD) method, grounded in a sparse active online learning framework. Our approach uniquely integrates ℓ1,2-norm sparse online learning with CUR decomposition-based active learning, enabling simultaneous fast feature selection and dynamic instance selection. The efficient CUR decomposition further supports real-time residual analysis for anomaly scoring, eliminating the need for manual threshold settings about temporal data distributions. Extensive experiments on diverse streaming datasets demonstrate SAD's superiority, achieving a 14.06% reduction in detection error rates compared to five state-of-the-art competitors. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10638987
- Publisher / Repository:
- International Joint Conferences on Artificial Intelligence (IJCAI 2025)
- Date Published:
- Page Range / eLocation ID:
- 2740 to 2748
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.more » « less
- 
            Sparse tensor factorization is a popular tool in multi-way data analysis and is used in applications such as cybersecurity, recommender systems, and social network analysis. In many of these applications, the tensor is not known a priori and instead arrives in a streaming fashion for a potentially unbounded amount of time. Existing approaches for streaming sparse tensors are not practical for unbounded streaming because they rely on maintaining the full factorization of the data, which grows linearly with time. In this work, we present CP-stream, an algorithm for streaming factorization in the model of the canonical polyadic decomposition which does not grow linearly in time or space, and is thus practical for long-term streaming. Additionally, CP-stream incorporates user-specified constraints such as non-negativity which aid in the stability and interpretability of the factorization. An evaluation of CP-stream demonstrates that it converges faster than state-of-the-art streaming algorithms while achieving lower reconstruction error by an order of magnitude. We also evaluate it on real-world sparse datasets and demonstrate its usability in both network traffic analysis and discussion tracking. Our evaluation uses exclusively public datasets and our source code is released to the public as part of SPLATT, an open source high-performance tensor factorization toolkit.more » « less
- 
            Abstract We propose a learning‐based approach for full‐body pose reconstruction from extremely sparse upper body tracking data, obtained from a virtual reality (VR) device. We leverage a conditional variational autoencoder with gated recurrent units to synthesize plausible and temporally coherent motions from 4‐point tracking (head, hands, and waist positions and orientations). To avoid synthesizing implausible poses, we propose a novel sample selection and interpolation strategy along with an anomaly detection algorithm. Specifically, we monitor the quality of our generated poses using the anomaly detection algorithm and smoothly transition to better samples when the quality falls below a statistically defined threshold. Moreover, we demonstrate that our sample selection and interpolation method can be used for other applications, such as target hitting and collision avoidance, where the generated motions should adhere to the constraints of the virtual environment. Our system is lightweight, operates in real‐time, and is able to produce temporally coherent and realistic motions.more » « less
- 
            null (Ed.)We develop online graph learning algorithms from streaming network data. Our goal is to track the (possibly) time-varying network topology, and affect memory and computational savings by processing the data on-the-fly as they are acquired. The setup entails observations modeled as stationary graph signals generated by local diffusion dynamics on the unknown network. Moreover, we may have a priori information on the presence or absence of a few edges as in the link prediction problem. The stationarity assumption implies that the observations’ covariance matrix and the so-called graph shift operator (GSO—a matrix encoding the graph topology) commute under mild requirements. This motivates formulating the topology inference task as an inverse problem, whereby one searches for a sparse GSO that is structurally admissible and approximately commutes with the observations’ empirical covariance matrix. For streaming data, said covariance can be updated recursively, and we show online proximal gradient iterations can be brought to bear to efficiently track the time-varying solution of the inverse problem with quantifiable guarantees. Specifically, we derive conditions under which the GSO recovery cost is strongly convex and use this property to prove that the online algorithm converges to within a neighborhood of the optimal time-varying batch solution. Numerical tests illustrate the effectiveness of the proposed graph learning approach in adapting to streaming information and tracking changes in the sought dynamic network.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
