Abstract: In this talk, we introduce an optimization-based framework called Generalized Low Rank Models designed to uncover structure in big messy data sets. These models generalize many well known techniques in data analysis, such as (standard or robust) PCA, nonnegative matrix factorization, matrix completion, and k-means. We examine their effectiveness at capturing the structure present in datasets drawn from social science and medical informatics applications. Strikingly, these models can perform more than 30% better than state of the art methods for imputing missing data in social science surveys. The resulting optimization problems are nonconvex, nonsmooth, and often have millions or even billions of parameters. We'll discuss efficient optimization techniques for these problems, and ways of exploiting parallelism to improve runtime, including a fast asynchronous parallel method with provable convergence and a linear speedup.
Biography: Madeleine Udell joined Cornell in Fall 2016 as Assistant Professor of Operations Research and Information Engineering and Richard and Sybil Smith Sesquicentennial Fellow, following a postdoctoral fellowship at Caltech in the Center for the Mathematics of Information. She completed a PhD at Stanford University in Computational & Mathematical Engineering in 2015. At Stanford, she was awarded a NSF Graduate Fellowship, a Gabilan Graduate Fellowship, and a Gerald J. Lieberman Fellowship, and served as a member of Stanford's School of Engineering Future Committee to develop a road-map for the future of engineering at the university. She received a B.S. degree in Mathematics and Physics, summa cum laude, with honors in mathematics and in physics, from Yale University. Professor Udell has worked with industry, nonprofit, and government agencies to solve problems in finance (Goldman Sachs), healthcare informatics (Apixio), political campaigns (Obama for America 2012), defense (DARPA), cybersecurity (Qadium), and clean energy (Aurora Solar).