“MLE and the EM algorithm for
Mixtures: Minimax Results”
The
Expectation-Maximization (EM) algorithm is a widely used tool for computing the
maximum likelihood estimator (MLE) in statistical inference with latent
variables. Since the seminal work by Balakrishnan, Wainwright and Yu, much
recent work has provided non-asymptotic convergence guarantees of the EM
algorithm under various statistical and regularity assumptions. Yet many
important questions have remained unresolved, particularly in the low SNR setting
-- the setting where there is little separation between parameters of different
mixtures.
We consider mixtures of
log concave distributions in this challenging low SNR setting. We
provide a novel framework for understanding the performance of the EM
algorithm, and with this framework, we prove several important (according to
us, anyway!) new results. For the mixture of $K$ regressions, we show that as
long as signal-to-noise ratio (SNR) is $\tilde{\Omega}(K)$, well-initialized EM
converges to the true regression parameters. No previous results were available
for this setting. For the mixture of $K$ Gaussians in $d$ dimensions, we show
that EM converges as long as the separation of the means is $\Omega(\sqrt{log
K})$, improving on the best known result of $\sqrt{K}$. Additionally, we show
that in this regime, using EM, $O(K d / \epsilon^2)$ samples suffice to learn
the mixture parameters, improving the best known result of $O(poly(K,d)$ sample
complexity.
This represents joint
work with Jeongyeol Kwon. It appeared in AISTATS '20, and COLT '20.
Constantine
Caramanis is
a professor and holds the William H. Hartwig Fellowship in Electrical
Engineering in the Department of Electrical & Computer Engineering at The
University of Texas at Austin.
Dr. Caramanis
joined the UT Electrical and Computer Engineering department in the Fall of
2006. He received his A.B. in Mathematics from Harvard University, and his
M.S. and Ph.D. in Electrical Engineering and Computer Science from MIT.
His
research interests center on decision-making in large-scale complex systems,
with a focus on learning and computation. Specifically, he is interested
in robust and adaptable optimization, high dimensional statistics and machine
learning, and applications to large-scale networks, including social networks,
wireless networks, transportation networks, and energy networks. He also works
on applications of machine learning and optimization to computer-aided design.