Many problems in data analytics can be formulated as estimating a low-rank matrix from
noisy data. This task is particularly challenging in the face of uncanonical data corruption
and growing size of the datasets. Addressing these challenges requires joint statisticalcomputational
considerations: developing methods that are both robust and scalable.
In this talk, we consider two examples that exhibit quite different statistical-computational
behaviors. The first example is recovering a low-rank matrix when only a small number of its
entries are observed, and observations from some columns are completely and arbitrarily
corrupted. We propose recovery algorithms based on convex relaxation, which has optimal
statistical performance in terms of sample complexity and robustness. The second example
is finding densely connected communities in noisy networks, for which we develop efficient
algorithms based on convexified MLE. These methods have strong recovery guarantees under
classical random graph models, but do not achieve the optimal statistical performance of the
intractable MLE, hence demonstrating a computational barrier.
While convex optimization methods run in polynomial time, their computational cost is
often too high for large real-world problems. We develop a general framework for low-rank
estimation using non-convex gradient descent that is highly efficient computationally. We
provide rigorous and global theoretical guarantees for the statistical performance of these
methods. Our general results apply to the two problems above among others, for which we
match the statistical performance achieved by convex relaxation.
- Tags
-