Detecting Outliers in Data with Correlated Measures

Kuo, Yu-Hsuan; Li, Zhenhui; Kifer, Daniel

doi:10.1145/3269206.3271798

Citation Details

Detecting Outliers in Data with Correlated Measures

Advances in sensor technology have enabled the collection of largescale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events. more »

Award ID(s):: 1702760 1544455 1054389

PAR ID:: 10080790

Author(s) / Creator(s):: Kuo, Yu-Hsuan; Li, Zhenhui; Kifer, Daniel

Date Published:: 2018-10-01

Journal Name:: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Page Range / eLocation ID:: 287 to 296

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3269206.3271798

More Like this