Although these issues of statistical generalization can be applied to data that is to be symbolized by points, lines, and areas, this discussion will be developed around the mapping of areas in choropleth maps. This is in part because choropleth maps are used so widely, but also because they are difficult to execute effectively. This is because choropleth maps have an inherent weakness - they involve the aggregation of data within areal units that do not correspond exactly with the underlying spatial distribution of data. By focusing on choropleth mapping in the following examples, some of these weaknesses can be revealed and discussed.
In the example below three maps are divided into quantiles of two, five, and nine categories, respectively.
B. Comparison of maps using different ranging methods
These three maps each have five ranges of data, but they were determined using different methods.
Even though these maps were developed from the same dataset, they seem to convey quite different spatial patterns. Some seem to stress the lowest values in the distribution, others the highest. The point is that cartographers use different ranging methods to generalize different types of data distributions. Each method is suited to a particular "shape" distribution. Therefore, the first step in preparing a choropleth map is to explore the dataset to come to an understanding of its underlying distribution.
6.3 Exploring your data and its "shape"
You should get to know the shape of any statistical distribution you plan to map. Plot a scattergram or histogram of the data and employ basic descriptive statistics to explore its distribution. Many automated mapping programs provide options which graph data and will automatically calculate descriptive statistics like mean, mode, median, range, and standard deviation. Take advantage of these options explore your data.
Be aware also that mathematical transformations change the shape of a distribution--implying that the ranging method must change also.
Probability Density Function (PDF) | Cumulative Distribution Function (CDF) | |
Not all distributions are "well-behaved". Sometimes you can encounter a bimodal (double-peaked) shaped distribution.
6.4 Commonly employed ranging methods for assigning cutpoints
In generalizing statistical distributions, cartographers use the term "cutpoint" to refer to the boundaries between categories. All the following methods pertain to the calculation or assignment of these cutpoints. Remember, all systems of classification depend upon the use of "exhaustive" and "mutually exclusive" categories. Exhaustive means that the categories classify all values of a given data range--no values within that range are omitted from the classification system. Mutually exclusive means that any given observation can be placed in one and only one category - data categories cannot overlap. Please be sure, if you are using an automated mapping system, that the the system does not assign overlapping cutpoints automatically when it creates the map legend.
The method is useful for mapping rectangular (uniform) distributions. It is also useful for exploratory analysis, at times when you wish to develop a "feel" for the characteristics of a data distribution.
The method is useful for mapping rectangular distributions. It is also useful for exploratory analysis, at times when you wish to develop a "feel" for the characteristics of a data distribution.
This method can be applied effectively to data that is J-shaped with a peak at the low end of the distribution.
In this method, the widths of the category intervals are increased in size at a geometric (that is, multiplicative) rate. If your first category is 2 units wide, the second would be 2x2 or 4 units wide, the third 2x2x2 or 8 units wide, and so forth to the end of the distribution.
This method can be applied effectively to data that is J-shaped with a peak at the low end of the distribution but with a long "stretch" between low and high values.
If your data is J-shaped with a peak at the high end of the distribution, the inverses of the arithmetic and geometric progressions can be employed. By inverting the cutpoints, the smallest intervals between cutpoints will be closest together at the high end of the distribution.
This is a sophisticated statistical method to split the data into different classes. The categories are found by maximizing the variance between the classes and minimizing the variance within every class.
This method can be applied to distributions that approximate a normal curve.
6.5 Symbolizing the Category Ranges
6.6 Statistical annotations are needed for some complex datasets
Data Sources
Connecticut Department Of Labor. Web Address: http://www.ctdol.state.ct.us/
Minnesota Population Center. National Historical Geographic Information System: Version 2.0. Minneapolis, MN: University of Minnesota 2011. Web Address: http://www.nhgis.org/
Wolfram|Alpha online computing environment. Web address: http://www.wolframalpha.com/
Further Reading
Coulson, Michael R.C. 1987. In the matter of class intervals for choropleth maps: With particular reference to the work of George F. Jenks. Cartographica 24 (2): 16-39.
Evans, Ian S. 1977. The selection of class intervals. Transactions of the Institute of British Geographers New Series 2: 98-124.
Jenks, George F. 1963. Generalization in statistical mapping. Annals of the Association of American Geographers 53: 15-26.
Jenks, George F. and Duane S. Knos. 1961. The Use of Shading Patterns in Graded Series. Annals of the Association of American Geographers 51: 316-334.