Published: Feb. 19, 2021

Jeffrey Pennington, Research Scientist, Google Brain

Demystifying deep learning through high-dimensional statistics

As deep learning continues to amass ever more practical success, its novelty has slowly faded, but a sense of mystery persists and we still lack satisfying explanations for how and why these models perform so well. Among the various empirical observations that contribute to this sense of mystery is the apparent violation of the so-called bias-variance tradeoff, which specifies that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, striking a balance between simpler models that exhibit high bias and more complex models that exhibit high variance of the predictive function. Far from being unique to deep learning, the violation of this classical tenet of learning theory is in fact commonplace in high-dimensional inference. In this talk, I will describe a high-dimensional asymptotic analysis of random feature kernel regression that allows for a precise understanding of how the bias and variance behave in this simple model. I will then connect this analysis to neural network training through the Neural Tangent Kernel, and describe how a multivariate decomposition of the variance enables a more complete understanding of the rich empirical phenomena observed in practice.

Bio:
Jeffrey Pennington is a Staff Research Scientist at Google Brain. Prior to this, he was a postdoctoral fellow at Stanford University, as a member of the Stanford Artificial Intelligence Laboratory in the Natural Language Processing (NLP) group. He received his Ph.D. in theoretical particle physics from Stanford University while working at the SLAC National Accelerator Laboratory. Jeffrey’s research interests are multidisciplinary, ranging from the development of calculational techniques in perturbative quantum field theory to the vector representation of words and phrases in NLP to the theoretical analysis of wide neural networks and related kernel methods. Recently, his work has focused on deep learning in the high-dimensional regime.