Title: Is Deep Learning a Black Box Statistically?
Deep learning has benefited almost every aspect of modern big data applications. Yet its statistical properties still largely remain mysterious. So a natural simple question is: Is deep learning a black box statistically? This talk presents some recent attempts to investigate such a question from two different statistical perspectives.
For the first perspective, we try to gain some statistical insights into deep learning. It is commonly believed nowadays that deep neural nets (DNNs) learn through representational learning. To gain insights into this, we design a simple simulation study where we generate data from some latent subspace structure with each subspace regarded as a cluster. We empirically demonstrate that the performance of DNN is very similar to the two-step procedure of clustering followed by classification (unsupervised plus supervised). This motivates us to ask: does DNN indeed mimic the two-step procedure? That is, do bottom layers in DNN try to cluster first and then top layers classify within each cluster? To answer this question, we conduct a series of simulation studies and to our surprise, none of the hidden layers in DNN conducts successful clustering. In some sense, our results act counterpoints to the common belief of representational learning, suggesting that at least in some cases, although the performance of DNN is comparable to the ideal two-step procedure knowing the true latent cluster information a priori, it does not really do clustering in any of its layers. We also provide some statistical theory and heuristic arguments to support our empirical discoveries and further demonstrate the utility of our theoretical framework on the real data application of traffic sign recognition.
For the second perspective, we try to incorporate statistical inference principles into deep learning. Deep learning has become increasingly popular in both supervised and unsupervised machine learning thanks to its outstanding empirical performance. However, because of their intrinsic complexity, most deep learning methods are largely treated as black box tools with little interpretability. Even though recent attempts have been made to facilitate the interpretability of deep neural networks (DNNs), existing methods are susceptible to noise and lack of robustness. Therefore, scientists are justifiably cautious about the reproducibility of the discoveries, which is often related to the interpretability of the underlying statistical models. We describe a method to increase the interpretability and reproducibility of DNNs by incorporating the idea of feature selection with controlled error rate. By designing a new DNN architecture and integrating it with the recently proposed knockoffs framework, we perform feature selection with a controlled error rate, while maintaining high power. This new method, DeepPINK (Deep feature selection using Paired-Input Nonlinear Knockoffs), is applied to both simulated and real data sets to demonstrate its empirical utility.
This talk is based on joint works with Yingying Fan, Hao Wu, Yang Lu and William Noble.