Algorithms & Randomness Center (ARC)
Ravi Kannan (MSR)
Monday, March 25, 2019
Klaus 1116E - 11:00 am
Title: A General Algorithm for Unsupervised Learning problems
Abstract: The following simply-stated geometric problem includes as special cases the core problems of a number of areas in Unsupervised Learning, including, Topic Modeling, Non-negative Matrix Factorization, Clustering, Stochastic Block Models and Overalapping Communities Detection:
There is an unknown polytope K ∈ Rd with k vertices. We are given n data points, each aperturbation of some point in K. The problem is to find K, i.e., its vertices (approximately). [The perturbations are large; indeed, many data points lie outside K.]
Our main result is an algorithm which solves this general problem under two natural assumptions. Our assumptions are technically different, but similar in spirit to existing models for the special cases. We assume separation between the vertices of K and the existence of “pure” data points whose unperturbed versions are close to the vertices of K. Notably we do not assume any stochastic model of data. Our algorithm has better complexity than known algorithms for the special cases when the input matrix A is sparse and k is relatively small compared to n, d.
The algorithm is simply stated, but the proof of correctness is involved. It draws on tools in Numerical Analysis, especially perturbation of singular spaces of matrices. Here is a description of our algorithm: It has k stages; in each stage, it picks a certain random vector u, finds the m largest u · x over data points x and outputs the average of these data points as an approximation to a new vertex of K.
Joint Work with C. Bhattacharyya
Videos of recent talks are available at: https://smartech.gatech.edu/handle/1853/46836