Error, Accuracy, and PrecisionThis page is available in a framed version . For convenience, we have provided a Full Table of Contents .
This situation has changed substantially in recent years. It is now generally recognized that error, inaccuracy, and imprecision can "make or break" many types of GIS project. That is, errors left unchecked can make the results of a GIS analysis almost worthless.
The irony is that the problem of error is devolves from one of greatest strengths of GIS. GIS gain much of their power from being able to collate and cross-reference many types of data by location. They are particularly useful because they can integrate many discrete datasets within a single system. Unfortunately, every time a new dataset is imported, the GIS also inherits its errors. These may combine and mix with the errors already in the database in unpredictable ways.
One of first thorough discussions of the problems and sources error appeared in P.A. Burrough's Principles of Geographical Information Systems for Land Resources Assessment (1986). Now the issue is addressed in many introductory texts on GIS..
The key point is that even though error can disrupt GIS analyses, there are ways to keep error to a minimum through careful planning and methods for estimating its effects on GIS solutions. Awareness of the problem of error has also had the useful benefit of making GIS practitioners more sensitive to potential limitations of GIS to reach impossibly accurate and precise solutions.
Be aware also that GIS practitioners are not always consistent in their use of these terms. Sometimes the terms are used almost interchangeably and this should be guarded against.
Two additional terms are used as well:
3.1. Positional accuracy and precision.
This applies to both horizontal and vertical positions.
Accuracy and precision are a function of the scale at which a map (paper or digital) was created. The mapping standards employed by the United States Geological Survey specify that:
"requirements for meeting horizontal accuracy as 90 per cent of all measurable points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000."
Accuracy Standards for Various Scale Maps
1:2,400 ± 6.67 feet
1:4,800 ± 13.33 feet
1:10,000 ± 27.78 feet
1:12,000 ± 33.33 feet
1:24,000 ± 40.00 feet
1:63,360 ± 105.60 feet
1:100,000 ± 166.67 feet
Beware of the dangers of false accuracy and false precision, that is reading locational information from map to levels of accuracy and precision beyond which they were created. This is a very great danger in computer systems that allow users to pan and zoom at will to an infinite number of scales. Accuracy and precision are tied to the original map scale and do not change even if the user zooms in and out. Zooming in and out can however mislead the user into believing--falsely--that the accuracy and precision have improved.
3.2. Attribute accuracy and precision
The non-spatial data linked to location may also be inaccurate or imprecise. Inaccuracies may result from mistakes of many sorts. Non-spatial data can also vary greatly in precision. Precise attribute information describes phenomena in great detail. For example, a precise description of a person living at a particular address might include gender, age, income, occupation, level of education, and many other characteristics. An imprecise description might include just income, or just gender.
3.3. Conceptual accuracy and precision
GIS depend upon the abstraction and classification of real-world phenomena. The users determines what amount of information is used and how it is classified into appropriate categories. Sometimes users may use inappropriate categories or misclassify information. For example, classifying cities by voting behavior would probably be an ineffective way to study fertility patterns. Failing to classify power lines by voltage would limit the effectiveness of a GIS designed to manage an electric utilities infrastructure. Even if the correct categories are employed, data may be misclassified. A study of drainage systems may involve classifying streams and rivers by "order," that is where a particular drainage channel fits within the overall tributary network. Individual channels may be misclassified if tributaries are miscounted. Yet some studies might not require such a precise categorization of stream order at all. All they may need is the location and names of all stream and rivers, regardless of order.
3.4 Logical accuracy and precision
Information stored in a database can be employed illogically. For example, permission might be given to build a residential subdivision on a floodplain unless the user compares the proposed plat with floodplain maps. Then again, building may be possible on some portions of a floodplain but the user will not know unless variations in flood potential have also been recorded and are used in the comparison. The point is that information stored in a GIS database must be used and compared carefully if it is to yield useful results. GIS systems are typically unable to warn the user if inappropriate comparisons are being made or if data are being used incorrectly. Some rules for use can be incorporated in GIS designed as "expert systems," but developers still need to make sure that the rules employed match the characteristics of the real-world phenomena they are modeling.
Finally, It would be a mistake to believe that highly accurate and highly precision information is needed for every GIS application. The need for accuracy and precision will vary radically depending on the type of information coded and the level of measurement needed for a particular application. The user must determine what will work. Excessive accuracy and precision is not only costly but can cause considerable details.
Burrough (1986) divides sources of error into three main categories:
Data sources may simply be to old to be useful or relevant to current GIS projects. Past collection standards may be unknown, non-existent, or not currently acceptable. For instance, John Wesley Powell's nineteenth century survey data of the Grand Canyon lacks the precision of data that can be developed and used today. Additionally, much of the information base may have subsequently changed through erosion, deposition, and other geomorphic processes. Despite the power of GIS, reliance on old data may unknowingly skew, bias, or negate results.
Data on a give area may be completely lacking, or only partial levels of information may be available for use in a GIS project. For example, vegetation or soils maps may be incomplete at borders and transition zones and fail to accurately portray reality. Another example is the lack of remote sensing data in certain parts of the world due to almost continuous cloud cover. Uniform, accurate coverage may not be available and the user must decide what level of generalization is necessary, or whether further collection of data is required.
The ability to show detail in a map is determined by its scale. A map with a scale of 1:1000 can illustrate much finer points of data than a smaller scale map of 1:250000. Scale restricts type, quantity, and quality of data (Star and Estes 1990). One must match the appropriate scale to the level of detail required in the project. Enlarging a small scale map does not increase its level of accuracy or detail.
4.1.4. Density of Observations.
The number of observations within an area is a guide to data reliability and should be known by the map user. An insufficient number of observations may not provide the level of resolution required to adequately perform spatial analysis and determine the patterns GIS projects seek to resolve or define. A case in point, if the contour line interval on a map is 40 feet, resolution below this level is not accurately possible. Lines on a map are a generalization based on the interval of recorded data, thus the closer the sampling interval, the more accurate the portrayed data.
Quite often the desired data regarding a site or area may not exist and "surrogate " data may have to be used instead. A valid relationship must exist between the surrogate and the phenomenon it is used to study but, even then, error may creep in because the phenomenon is not being measured directly. A local example of the use of surrogate data are habitat studies of the golden-cheeked warblers in the Hill Country. It is very costly (and disturbing to the birds) to inventory these habitats through direct field observation. But the warblers prefer to live in stands of old growth cedar Juniperus ashei. These stands can be identified from aerial photographs. The density of Juniperus ashei can be used as surrogate measure of the density of warbler habitat. But, of course, some areas of cedar may uninhabited or inhibited to a very high density. These areas will be missed when aerial photographs are used to tabulate habitats.
Another example of surrogate data are electronic signals from remote sensing that are use to estimate vegetation cover, soil types, erosion susceptibility, and many other characteristics. The data is being obtained by an indirect method. Sensors on the satellite do not "see" trees, but only certain digital signatures typical of trees and vegetation. Sometimes these signatures are recorded by satellites even when trees and vegetation are not present (false positives) or not recorded when trees and vegetation are present (false negatives). Due to cost of gathering on site information, surrogate data is often substituted and the user must understand variations may occur and although assumptions may be valid, they may not necessarily be accurate.
Methods of formatting digital information for transmission, storage, and processing may introduce error in the data. Conversion of scale, projection, changing from raster to vector format, and resolution size of pixels are examples of possible areas for format error. Expediency and cost often require data reformation to the "lowest common denominator" for transmission and use by multiple GIS. Multiple conversions from one format to another may create a ratchet effect similar to making copies of copies on a photo copy machine. Additionally, international standards for cartographic data transmission, storage and retrieval are not fully implemented.
Accessibility to data is not equal. What is open and readily available in one country may be restricted, classified, or unobtainable in another. Prior to the break-up of the former Soviet Union, a common highway map that is taken for granted in this country was considered classified information and unobtainable to most people. Military restrictions, inter-agency rivalry, privacy laws, and economic factors may restrict data availability or the level of accuracy in the data.
Extensive and reliable data is often quite expensive to obtain or convert. Initiating new collection of data may be too expensive for the benefits gained in a particular GIS project and project managers must balance their desire for accuracy the cost of the information. True accuracy is expensive and may be unaffordable.
4.2. Errors Resulting from Natural Variation or from Original Measurements.
Although these error sources may not be as obvious, careful checking will reveal their influence on the project data.
Positional accuracy is a measurement of the variance of map features and the true position of the attribute (Antenucci and others 1991, p. 102). It is dependent on the type of data being used or observed.. Map makers can accurately place well-defined objects and features such as roads, buildings, boundary lines, and discrete topographical units on maps and in digital systems, whereas less discrete boundaries such as vegetation or soil type may reflect the estimates of the cartographer. Climate, biomes, relief, soil type, drainage and other features lack sharp boundaries in nature and are subject to interpretation. Faulty or biased field work, map digitizing errors and conversion, and scanning errors can all result in inaccurate maps for GIS projects.
Maps must be correct and free from bias. Qualitative accuracy refers to the correct labeling and presence of specific features. For example, a pine forest may be incorrectly labeled as a spruce forest, thereby introducing error that may not be known or noticeable to the map or data user. Certain features may be omitted from the map or spatial database through oversight, or by design.
Other errors in quantitative accuracy may occur from faulty instrument calibration used to measure specific features such as altitude, soil or water pH, or atmospheric gases. Mistakes made in the field or laboratory may be undetectable in the GIS project unless the user has conflicting or corroborating information available.
4.2.3. Sources of variation in data.
Variations in data may be due to measurement error introduced by faulty observation, biased observers, or by mis-calibrated or inappropriate equipment. For example, one can not expect sub-meter accuracy with a hand-held, non-differential GPS receiver. Likewise, an incorrectly calibrated dissolved oxygen meter would produce incorrect values of oxygen concentration in a stream.
There may also be a natural variation in data being collected, a variation that may not be detected during collection. As an example, salinity in Texas bays and estuaries varies during the year and is dependent upon freshwater influx and evaporation. If one was not aware of this natural variation, incorrect assumptions and decisions could be made, and significant error introduced into the GIS project. In any case if the errors do not lead to unexpected results their detection may be extremely difficult.
4.3. Errors Arising Through Processing
Processing errors are the most difficult to detect by GIS users and must be specifically looked for and require knowledge of the information and the systems used to process it. These are subtle errors that occur in several ways, and are therefore potentially more insidious, particularly because they can occur in multiple sets of data being manipulated in a GIS project.
Different computers may not have the same capability to perform complex mathematical operations and may produce significantly different results for the same problem. Burrough (1990) cites an example in number squaring that produced 1200% difference. Computer processing errors occur in rounding off operations and are subject to the inherent limits of number manipulation by the processor. Another source of error may from faulty processors, such as the recent mathematical problem identified in Intel's Pentium(tm) chip. In certain calculations, the chip would yield the wrong answer.
A major challenge is the accurate conversion of existing to maps to digital form (Muehrcke 1986). Because computers must manipulate data in a digital format, numerical errors in processing can lead to inaccurate results. In any case numerical processing errors are extremely difficult to detect, and perhaps assume a sophistication not present in most GIS workers or project managers.
4.3.2. Errors in Topological Analysis.
Logic errors may cause incorrect manipulation of data and topological analyses (Star and Estes 1990). One must recognize that data is not uniform and is subject to variation. Overlaying multiple layers of maps can result in problems such as Slivers , Overshoots , and Dangles . Variation in accuracy between different map layers may be obscured during processing leading to the creation of "virtual data which may be difficult to detect from real data" (Sample 1994).
4.3.3. Classification and Generalization Problems.
For the human mind to comprehend vast amounts of data it must be classified, and in some cases generalized, to be understandable. According to Burrough (1986, pp. 137) about seven divisions of data is ideal and may be retained in human short term memory. Defining class intervals is another problem area. For instance, defining a cause of death in males between 18-25 years old would probably be significantly different in a class interval of 18-40 years old. Data is most accurately displayed and manipulated in small multiples. Defining a reasonable multiple and asking the question "compared to what" is critical (Tufte 1990, pp. 67-79). Classification and generalization of attributes used in GIS are subject to interpolation error and may introduce irregularities in the data that is hard to detect.
4.3.4. Digitizing and Geocoding Errors.
Processing errors occur during other phases of data manipulation such as digitizing and geocoding, overlay and boundary intersections, and errors from rasterizing a vector map. Physiological errors of the operator by involuntary muscle contractions may result in spikes, switchbacks, polygonal knots, and loops . Errors associated with damaged source maps, operator error while digitizing , and bias can be checked by comparing original maps with digitized versions. Other errors are more elusive.
The point is that inaccuracy, imprecision, and error may be compounded in GIS that employ many data sources. There are two ways in which this compounded my occur.
Propagation occurs when one error leads to another. For example, if a map registration point has been mis-digitized in one coverage and is then used to register a second coverage, the second coverage will propagate the first mistake. In this way, a single error may lead to others and spread until it corrupts data throughout the entire GIS project. To avoid this problem use the largest scale map to register your points.
Often propagation occurs in an additive fashion, as when maps of different accuracy are collated.
Cascading means that erroneous, imprecise, and inaccurate information will skew a GIS solution when information is combined selectively into new layers and coverages. In a sense, cascading occurs when errors are allowed to propagate unchecked from layer to layer repeatedly.
The effects of cascading can be very difficult to predict. They may be additive or multiplicative and can vary depending on how information is combined, that is from situation to situation. Because cascading can have such unpredictable effects, it is important to test for its influence on a given GIS solution. This is done by calibrating a GIS database using techniques such as sensitivity analysis. Sensitivity analysis allows the users to gauge how and how much errors will effect solutions. Calibration and sensitivity analysis are discussed in Managing Error .
It is also important to realize that propagation and cascading may affect horizontal, vertical, attribute, conceptual, and logical accuracy and precision.
This means in practice that GIS solutions are often best reported as ranges or ranking, or presented within statistical confidence intervals. These issues are addressed in the module, Managing Error .
7.1. Ask or look for a when you borrow or purchase data.
Many major governmental and commercial data producers work to well-established standards of accuracy and precision that are available publicly in printed or digital form. These documents will tell you exactly how maps and datasets were compiled and such reports should be studied carefully. Data quality reports are usually provided with datasets obtained from local and state government agencies or from private suppliers.
7.2. Prepare a Data Quality Report for datasets you create.
Your data will not be valuable to others unless you too prepare a data quality report. Even if you do not plan to share your data with others, you should prepare a report--just in case you use the dataset again in the future. If you do not document the dataset when you create it, you may end up wasting time later having to check it a second time. Use the data quality reports found above as models for documenting your dataset.
7.3. In the absence of a Data Quality Report, ask questions about undocumented data before you use it.
Chapter 14 in Bolstad, Paul. 2005. GIS Fundamentals: A First Text on Geographic Information Systems, 2nd. ed. White Bear Lake, MN: Eider Press.
Burrough, P.A. 1990. Principles of Geographical Information Systems for Land Resource Assessment. Clarendon Press. Oxford.
Chapter 8 in Chang, Kang-tsung. 2006. Introduction to Geographic Information Systems, 3rd. ed. Boston: McGraw Hill.
11. Examination and Study Questions
Created on 14 Oct 95. DJH. Last
revised on 2006.3.15. KEF.