These materials were developed by Kenneth E. Foote and Donald J. Huebner, Department of Geography, University of Texas at Austin, 1996.  These materials may be used for study, research, and education in not-for-profit applications.  If you link to or cite these materials, please credit the authors, Kenneth E. Foote and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of Colorado at Boulder.  These materials may not be copied to or issued from another Web server without the authors' express permission.  Copyright 2000 All commercial rights are reserved.  If you have comments or suggestions, please contact the author or Kenneth E. Foote at ken.foote@uconn.edu .

This page is also available in a framed version .  For convenience we have provided a full Table of Contents .


1. The Problems of Error, Accuracy and Precision

Managing error in GIS datasets is now recognized as a substantial problem that needs to be addressed in the design and use of such systems. Failure to control and manage error can limit severely or invalidate the results of a GIS analysis. Please see the module, Error, Accuracy, and Precision for an overview of the key issues.

2. Setting Standards for Procedures and Products

No matter what the project, standards should be set from the start. Standards should be established for both spatial and non-spatial data to be added to the dataset. Issues to be resolved include the accuracy and precision to be invoked as information is placed in the dataset, conventions for naming geographic features, criteria for classifying data, and so forth. Such standards should be set both for the procedures used to create the dataset and for the final products. Setting standards involves three steps.

3. Documenting Procedures and Products: Data Quality Reports

Standards for procedures and products should always be documented in writing or in the dataset itself. Data documentation should include information about how data was collected and from what sources, how it was preprocessed and geocoded, how it was entered in the dataset, and how it is classified and encoded. On larger projects, one person or a team should be assigned responsibility for data documentation. Documentation is vitally important to the value and future use of a dataset. The saying is that an undocumented dataset is a worthless dataset. By in large, this is true. Without clear documentation a dataset can not be expanded and cannot be used by other people or organizations now or in the future.

 Documentation is of critical importance in large GIS projects because the dataset will almost certainly outlive the people who created it. That is, GIS for municipal, state, and AM/FM applications are usually designed to last 50-100 years. The staff who enters the data may have long retired when a question arises about the characteristics of their work. Written documentation is essential. Some projects actually place information about data quality and quality control directly in a GIS dataset as independent layers. An example of data quality reports is:


4. Measuring and Testing Products

GIS datasets should be checked regularly against reality. For spatial data, this involves checking maps and positions in the field or, at least, against sources of high quality. A sample of positions can be resurveyed to check their accuracy and precision. The USGS employs a testing procedure to check on the quality of its digital and paper maps, as does the Ordnance Survey. Indeed, the Ordnance Survey continues periodically to test maps and digital datasets long after they have first been compiled. If too many errors crop up, or if the mapped area has changed greatly, the work is updated and corrected.

 Non-spatial attribute data should also be checked either against reality or a source of equal or greater quality. The particular tests employed will, of course, vary with the type of data used and its level of measurement. Indeed, many different tests have been developed to test the quality of interval, ordinal, and nominal data. Both parametric and nonparametric statistical tests can be employed to compare true values (those observed "on the ground") and those recorded in the dataset.

 Cohen's Kappa provides just one example of the types of test employed, this one for nominal data. The following example shows how data on land cover stored in a database can be tested against reality.  

See Attribute Accuracy and Calculating Cohen's Kappa


5. Calibrating a Dataset to Ascertain How Error Influences Solutions

Solutions reached by GIS analysis should be checked or calibrated against reality. The best way to do this is check the results of a GIS analysis against the findings produced from completely independent calculations. If the two agree, then the user has some confidence that the data and modeling procedure is valid.

 This process of checking and calibrating a GIS is often referred to as Sensitivity Analysis. Sensitivity analysis allows the user to test how variations in data and modeling procedure influence a GIS solution. What the user does is vary the inputs of a GIS model, or the procedure itself, to see how each change alters the solution. In this way, the user can judge quite precision how data quality and error will influence subsequent modeling.

 This is quite straight forward with interval/ratio input data. The user tests to see how an incremental change in an input variable changes the output of the system. From this, the user can derive "marginal sensitivity" to an input and establish "marginal weights" to compensate for error.

 But sensitivity analysis can also be applied to nominal (categorical) and ordinal (ranked) input data. In these cases, data may be purposefully misclassified or misranked to see how such errors will change a solution.

 Sensitivity analysis can also be used during system design and development to test the levels of precision and accuracy required to meet system goals. That is, users can experiment with data of differing levels of precision and accuracy to see how they perform. If a test solution is not accurate or precise enough in one pass, the levels can be refined and tested again. Such testing of accuracy and precision is very important in large GIS projects that will generated large quantities of data. In is of little use (and tremendous cost) to gather and store data to levels of accuracy and precision beyond what is needed to reach a particular modeling need.

 Sensitivity can also be useful at the design stage in testing the theoretical parameters of a GIS model. It is sometimes the case that a factor, though of seemingly great theoretical importance to a solution, proves to be of little value in solving a particular problem. For example, soil type is certainly important in predicting crop yields but, if soil type varies little in a particular region, it is a waste of time entering into a dataset designed for this purpose. Users can check on such situations by selectively removing certain data layers from the modeling process. If they make no difference to the solutions, then no further data entry needs to be made.

 To see how sensitivity analysis might be applied to a problem concerned with upgrading a municipal water system, go to the following section on Sensitivity Analysis.

 In closing this example, it is useful to note that the results were reported in terms of ranking. No single solution was optimal in all cases. Picking a single, best solution might be misleading. Instead, the sites are simply ranked by the number of situations in which each comes out ahead.  


6. Report Results in Terms of the Uncertainties of the Data

Too often GIS projects fall prey to the problem of False Precision , that is reporting results to a level of accuracy and precision unsupported by the intrinsic quality of the underlying data. Just because a system can store numeric solutions down to four, six, or eight decimal places, does not mean that all of these are significant. Common practice allows users to round down one decimal place below the level of measurement. Below one decimal place the remaining digits are meaningless.

As examples of what this means, consider:

 Population figures are reported in whole numbers (5,421, 10,238, etc.) meaning that calculations can be carried down 1 decimal place (density of 21.5, mortality rate of 10.3).

 If forest coverage is measured to the closest 10 meters, then calculations can be rounded to the closest 1 meter.

 A second problem is False Certainty, that is reporting results with a degree of certitude unsupported by the natural variability of the underlying data. Most GIS solutions involve employing a wide range of data layers, each with its own natural dynamics and variability. Combining these layers can exacerbate the problem of arriving at a single, precision solution. Sensitivity analysis (discussed above) helps to indicate how much variations in one data layer will affect a solution. But GIS users should carry this lesson all the way to final solutions. These solutions are likely to be reported in terms of ranges, confidence intervals, or rankings. In some cases, this involves preparing high, low, and mid-range estimates of a solution based upon maximum, minimum, and average values of the data used in a calculation.

 You will notice that the case considered above pertaining an optimal site selection problem reported it's results in terms of rankings. Each site was optimal in certain confined situations, but only a couple proved optimal in more than one situation. The results rank the number of times each site came out ahead in terms of total cost.

 In situations where statistical analysis is possible, the use of confidence intervals is recommended. Confidence intervals established the probability of solution falling within a certain range (i.e. a 95% probability that a solutions falls between 100m and 150m).
 


7. References and Supplemental Reading

Chapter 14 in Bolstad, Paul.  2005.  GIS Fundamentals: A First Text on Geographic Information Systems, 2nd. ed.  White Bear Lake, MN: Eider Press.

Burrough, P.A. 1990. Principles of Geographical Information Systems for Land Resource Assessment. Clarendon Press. Oxford.

Chapter 8 in Chang, Kang-tsung.  2006.  Introduction to Geographic Information Systems, 3rd. ed.  Boston: McGraw Hill.

Chapter 4 in Lo, C.P. and Albert K.W. Yeung.  2002.  Concepts and Techniques of Geographic Information Systems.  Upper Saddle River, NJ: Prentice Hall.

Chapter 6 in Longley, Paul A., Michael F. Goodchild, David J. Maguire, and David W. Rhind.  2005.  Geographic Informaiton Systems and Science, 2nd ed.  Hoboken, NJ: Wiley.

USGS,  National Mapping Program Standards,  http://nationalmap.gov/gio/standards/

8. Examination and Study Questions.


Last revised on 2014.9.11. ken.foote@uconn.edu