Online k-means clustering on arbitrary data streams

Bhattacharjee, R.; Imola, J.; Moshkovitz, M.; Dasgupta, S.

Citation Details

We consider k-means clustering in an online setting where each new data point is assigned to its closest cluster center and incurs a loss equal to the squared distance to that center, after which the algorithm is allowed to update its centers. The goal over a data stream X is to achieve a total loss that is not too much larger than the best possible loss using k fixed centers in hindsight. Ours is the first algorithm to achieve polynomial space and time complexity in the online setting. We note that our results have implications to the related streaming setting, where one final clustering is outputted, and the no-substitution setting, where center selections are permanent. We show a general reduction between the no-substitution cost of a blackbox algorithm and its online cost. Finally, we translate our algorithm to the no-substitution setting and streaming settings, and it competes with and can outperform existing work in the areas. more »

Award ID(s):: 2211386

PAR ID:: 10465553

Author(s) / Creator(s):: Bhattacharjee, R.; Imola, J.; Moshkovitz, M.; Dasgupta, S.

Date Published:: 2023-01-01

Journal Name:: Proceedings of Machine Learning Research

Volume:: 201

ISSN:: 2640-3498

Page Range / eLocation ID:: 204--236

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this