Label consistency in overfitted generalized k-means

Zhang, Linfan; Amini, Arash A.

Citation Details

We provide theoretical guarantees for label consistency in generalized k-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a refinement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constant-factor approximation to the k-means problem. The results are also model-free and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely defined and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems. more »

Award ID(s):: 1945667

PAR ID:: 10336065

Author(s) / Creator(s):: Zhang, Linfan; Amini, Arash A.

Editor(s):: Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.S.; Vaughan, J. W.

Date Published:: 2021-01-01

Journal Name:: Advances in neural information processing systems

Volume:: 34

ISSN:: 1049-5258

Page Range / eLocation ID:: 7965-7977

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this