Random Projection Clustering on Streaming Data

Carraher, Lee A.; Wilsey, Philip A.; Moitra, Anindya; Dey, Sayantan

doi:10.1109/ICDMW.2016.0105

Citation Details

Random Projection Clustering on Streaming Data

Clustering streaming data has gained importance in recent years due to an expanding opportunity to discover knowledge in widely available data streams. As streams are potentially evolving and unbounded sequence of data objects, clustering algorithms capable of performing fast and incremental processing of data points are necessary. This paper presents a method of clustering high-dimensional data streams using approximate methods called streamingRPHash. streamingRPHash combines random projections with locality-sensitivity hashing to construct a high-performance clustering method. streamingRPHash is amenable to distributed processing frameworks such as Map-Reduce, and also has the benefits of constrained overall complexity growth. This paper describes streamingRPHash algorithm and its various configurations. The clustering performance of streamingRPHash is compared to several alternatives. Experimental results show that streamingRPHash has comparable clustering accuracy and substantially lower runtime and memory usage. more »

Award ID(s):: 1440420

NSF-PAR ID:: 10193707

Author(s) / Creator(s):: Carraher, Lee A.; Wilsey, Philip A.; Moitra, Anindya; Dey, Sayantan

Date Published:: 2016-12-01

Journal Name:: IEEE ICDM Workshop on High Dimensional Data Mining

Page Range / eLocation ID:: 708 to 715

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICDMW.2016.0105

More Like this