Tuesday, September 30, 2014
The last few years have seen an increasing interest in utilizing optimization for large-scale data analysis. However, optimization problems arising from these applications often involve, in addition to expensive smooth components for data fitting, nonsmooth and nonseparable regularization terms to enforce certain structural properties for the generated solutions (e.g, low rank or group sparsity). It is well-known that such nonsmooth components can significantly slow down the convergence of existing first-order optimization algorithms, leading to a large number of traverses through the data sets. To address this issue, we present a new class of optimization techniques, referred as to gradient sliding and conditional gradient sliding methods, which can skip the computation of gradients from time to time while still maintaining the overall optimal rate of convergence. In particular, the number of gradient evaluations required for these algorithms will be the same as if the aforementioned nonsmooth and nonseparable components do not exist. When applied to data analysis problems, these algorithms can reduce the number of scans through the data set by orders of magnitude. Numerical experiments have been conducted to illustrate the effectiveness of these techniques.