Robust Multivariate Error Detection in Skewed Data with Application to Historical Radiosonde Winds
Date and time:
Friday, November 7, 2014 - 3:00pm
Quality control methods for multivariate data are generally based on using robust estimates of parameters for a particular distribution, and that particular distribution is usually the multivariate normal (MVN). However, many multivariate data generating processes do not produce elliptical contours, and in such cases, error detection using the MVN distribution would lead to many legitimate observations being erroneously flagged. In this work, we develop a non-parametric and a parametric method for identifying errors in skewed multivariate data. In the first method, we remove potential outliers by assigning each multivariate observation a depth score and remove those observations that fall beyond a given threshold. In the second method, we develop robust estimators for the parameters in a multivariate skew-t (MST) distribution, and this estimated distribution is used in assigning all observations a probability of having been generated from this MST. We test the performance of these methods in simulation against a more common MVN outlier detection method. Finally, we show how our method can be used in practice with radiosonde launches of horizontal and vertical wind components measured at 8 vertical pressure levels in which we demonstrate differences in the number of flagged outliers between a historical and a recent period and across pressure levels.