This paper considers the real-time detection of abrupt and persistent anomalies in high-dimensional data streams. The goal is to detect anomalies quickly and accurately so that the appropriate countermeasures could be taken in time before the system possibly gets harmed. We propose a sequential and multivariate anomaly detection method that scales well to high-dimensional datasets. The proposed method follows a nonparametric, i.e., data-driven, and semi-supervised approach, i.e., trains only on nominal data. Thus, it is applicable to a wide range of applications and data types. Thanks to its multivariate nature, it can quickly and accurately detect challenging anomalies, such as changes in the correlation structure. Its asymptotic optimality and computational complexity are comprehensively analyzed. In conjunction with the detection method, an effective technique for localizing the anomalous data dimensions is also proposed. The practical use of proposed algorithms are demonstrated using synthetic and real data, and in variety of applications including seizure detection, DDoS attack detection, and video surveillance. 
                        more » 
                        « less   
                    
                            
                            Real-Time Detection and Classification of Power Quality Disturbances
                        
                    
    
            This paper considers the problem of real-time detection and classification of power quality disturbances in power delivery systems. We propose a sequential and multivariate disturbance detection method (aiming for quick and accurate detection). Our proposed detector follows a non-parametric and supervised approach, i.e., it learns nominal and anomalous patterns from training data involving clean and disturbance signals. The multivariate nature of the method enables joint processing of data from multiple meters, facilitating quicker detection as a result of the cooperative analysis. We further extend our supervised sequential detection method to a multi-hypothesis setting, which aims to classify the disturbance events as quickly and accurately as possible in a real-time manner. The multi-hypothesis method requires a training dataset per hypothesis, i.e., per each disturbance type as well as the ’no disturbance’ case. The proposed classification method is demonstrated to quickly and accurately detect and classify power disturbances. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2040572
- PAR ID:
- 10418527
- Date Published:
- Journal Name:
- Sensors
- Volume:
- 22
- Issue:
- 20
- ISSN:
- 1424-8220
- Page Range / eLocation ID:
- 7958
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we address the problem of detecting and learning anomalies in high-dimensional data-streams in real-time. Following a data-driven approach, we propose an online and multivariate anomaly detection method that is suitable for the timely and accurate detection of anomalies. We propose our method for both semi-supervised and supervised settings. By combining the semi-supervised and supervised algorithms, we present a self-supervised online learning algorithm in which the semi-supervised algorithm trains the supervised algorithm to improve its detection performance over time. The methods are comprehensively analyzed in terms of computational complexity, asymptotic optimality, and false alarm rate. The performances of the proposed algorithms are also evaluated using real-world cybersecurity datasets, that show a significant improvement over the state-of-the-art results.more » « less
- 
            Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semisupervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.more » « less
- 
            Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for text classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical text classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical text classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.more » « less
- 
            Observable reading behavior, the act of moving the eyes over lines of text, is highly stereotyped among the users of a language, and this has led to the development of reading detectors–methods that input windows of sequential fixations and output predictions of the fixation behavior during those windows being reading or skimming. The present study introduces a newmethod for reading detection using Region Ranking SVM (RRSVM). An SVM-based classifier learns the local oculomotor features that are important for real-time reading detection while it is optimizing for the global reading/skimming classification, making it unnecessary to hand-label local fixation windows for model training. This RRSVM reading detector was trained and evaluated using eye movement data collected in a laboratory context, where participants viewed modified web news articles and had to either read them carefully for comprehension or skim them quickly for the selection of keywords (separate groups). Ground truth labels were known at the global level (the instructed reading or skimming task), and obtained at the local level in a separate rating task. The RRSVM reading detector accurately predicted 82.5% of the global (article-level) reading/skimming behavior, with accuracy in predicting local window labels ranging from 72-95%, depending on how tuned the RRSVM was for local and global weights. With this RRSVM reading detector, a method now exists for near real-time reading detection without the need for hand-labeling of local fixation windows. With real-time reading detection capability comes the potential for applications ranging from education and training to intelligent interfaces that learn what a user is likely to know based on previous detection of their reading behavior.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    