Published: Sept. 3, 2021
Doug Nychka, Department of Applied Mathematics and Statistics, Colorado School of Mines

Florian Gerber, Department of Biostatistics, University of Zurich

Climate models, large spatial datasets, and harnessing deep learning for a statistical computation

Numerical simulations of the motion and state of the  Earth's atmosphere and ocean yield large and complex data sets that require statistics for their interpretation. Typically climate and weather variables are in the form of space and time fields and it is useful to describe their dependence using methods from spatial statistics. Throughout these problems is the need for estimating covariance functions over space and time and accounting for the fact that the covariance may not be stationary.  This talk focuses on a new computational technique for fitting covariance functions using maximum likelihood. Estimating local covariance functions is a useful way to represent spatial dependence but is computationally intensive because it requires optimizing a local likelihood over many windows of the spatial field. Thus the problem we tackle here is having numerous (tens of thousands) small spatial estimation problems and is in contrast to other research that attempts a single, global estimate for a massive spatial data set. In this work we show how a neural network (aka deep learning) model can be trained to give accurate maximum likelihood estimates  based on the spatial field or its empirical variogram. Why train a neural network to reproduce a statistical estimate? The advantage is that the neural network model evaluates very efficiently  and gives speedups on the order of a factor of a hundred or more. In this way computations that could take hours are reduced to minutes or tens of seconds and facilitates a more flexible and iterative approach to building spatial statistical models. An example of local covariance modeling is given using the large ensemble experiment created by the National Center for Atmospheric Research.