Working Paper No. 13-09

Orthogonalization of Categorical Data: How to Fix a Measurement Problem in Statistical Distance Metrics
Ross Knippenberg
November 2013

ABSTRACT

Policy makers depend on economists, statisticians, and other social scientists to make accurate observations and draw solid conclusions from quantitative analysis. Econometrics, for example, has come a long way in the past century and guides many decisions made today. On
the other hand, some statistical procedures have not had signicant advances, but are instead applied and their original assumptions are forgotten. The appropriateness of many of these measurements has come into question, and while criticism is often accepted, little is done to correct them. In reality, there is a prolic measurement problem being committed everyday. This problem involves the use of statistical distance metrics to measure social phenomena. For example, measurements which would routinely be used to answer questions like: by how much have the imports of the United States changed in the past year? By how much has racial diversity changed in the past decade? Does greater ethno-linguistic diversity lead to civil conict? These and similar questions rely on accurate multi-variate distance metrics. However all distance metrics suffer from a common calculation problem. No one can deny that the math is correct, rather, the problem lies with an overlooked implicit assumption: that all categories are mutually orthogonal (right angles). This is a bold assumption in any context. In this paper I first show that this assumption is rarely valid, and second I suggest an orthogonalization procedure: measure the similarity or angle between categories, and then apply a transformation from spherical to rectangular coordinates. I illustrate the effect of the methodology using a simulation, a collection of potential applications, and two examples from international trade.

 

JEL classification: C43, C18, F10
Keywords: Index Number Theory, International Trade, Orthogonalization, Principle Coordinates, Law of Cosines, Distance Metrics, Minkowski Metric, Euclidean Distance, Hirschman-Herfindahl Index, Business Analytics

DOWNLOAD PAPER