Working Paper No. 13-09

**Orthogonalization of Categorical Data: How to Fix a Measurement Problem in Statistical Distance Metrics**

Ross Knippenberg

November 2013

**ABSTRACT**

Policy makers depend on economists, statisticians, and other social scientists to make accurate observations and draw solid conclusions from quantitative analysis. Econometrics, for
example, has come a long way in the past century and guides many decisions made today. On

the other hand, some statistical procedures have not had signicant advances, but are instead
applied and their original assumptions are forgotten. The appropriateness of many of these
measurements has come into question, and while criticism is often accepted, little is done to
correct them. In reality, there is a prolic measurement problem being committed everyday.
This problem involves the use of statistical distance metrics to measure social phenomena. For
example, measurements which would routinely be used to answer questions like: by how much
have the imports of the United States changed in the past year? By how much has racial diversity changed in the past decade? Does greater ethno-linguistic diversity lead to civil conict?
These and similar questions rely on accurate multi-variate distance metrics. However all distance metrics suffer from a common calculation problem. No one can deny that the math is
correct, rather, the problem lies with an overlooked implicit assumption: that all categories are
mutually orthogonal (right angles). This is a bold assumption in any context. In this paper I
first show that this assumption is rarely valid, and second I suggest an orthogonalization procedure: measure the similarity or angle between categories, and then apply a transformation
from spherical to rectangular coordinates. I illustrate the effect of the methodology using a
simulation, a collection of potential applications, and two examples from international trade.

*JEL classification*: C43, C18, F10

*Keywords*: Index Number Theory, International Trade, Orthogonalization, Principle Coordinates, Law of Cosines, Distance Metrics, Minkowski Metric, Euclidean Distance, Hirschman-Herfindahl Index, Business Analytics