Title: k-MEANS REVISITED
Abstract:
In many applications, fairly fast clustering algorithms seem to yield the desired solution. Theoretically, two types of assumptions lead to provably fast algorithms for clustering:
(i) stochastic (mixture) models of data and (ii) uniqueness of optimal solution even under perturbations of data. We show that under an assumption weaker than either of these, Lloyd's (k-means) algorithm converges to the correct solution. We apply the result to the planted clique problem.
Joint work with Amit Kumar.