Published: Feb. 21, 2014

Parallel inference for massive distributed spatial data using low-rank models

Matthias Katzfuß

Department of StatisticsTexas A&M University

Date and time: 

Friday, February 21, 2014 - 3:00pm


ECCR 265


Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference based on all data without moving the datasets. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly. For many low-rank models, the required number of floating-point operations is linear in the number of data points, while the required amount of communication does not depend on the data sizes at all. After discussing several extensions and special cases, we apply our methodology to carry out spatio-temporal particle filtering inference on total precipitable water measured by three different sensor systems. This is joint work with Dorit Hammerling (NCAR).