Data Sources for GIS

These materials were developed by Kenneth E. Foote and Margaret Lynch, Department of Geography, University of Texas at Austin, 1995.  These materials may be used for study, research, and education in not-for-profit applications.  If you link to or cite these materials, please credit the authors, Kenneth E. Foote and Margaret Lynch, The Geographer's Craft Project, Department of Geography, The University of Colorado at Boulder.  These materials may not be copied to or issued from another Web server without the authors' express permission.  Copyright © 1995-2009 All commercial rights are reserved.  If you have comments or suggestions, please contact the author or Kenneth E. Foote at k.foote@colorado.edu .

This page is available in a framed version . For convenience we provide a Full Table of Contents .

1. A Question of Range and Change: Key Issues

GIS Systems Employ a Wide Range of Data Sources

There is tremendous range in the types of data used for GIS analysis. This reflects the varied goals of the systems themselves. Since GIS may be used for applications as varied as archeological analysis, marketing research, and urban planning, the source materials can be difficult to inventory and classify comprehensively. Even within a single GIS project, the range of materials employed can be daunting. Consider, for example, a list of the sources that are being used in central Arizona to develop the database for the Salt River Project .

Although the type of materials will vary greatly from project to project, GIS practitioners should be aware of some of the most commonly available data sources. These are materials collected and published by a number of government agencies and commercial businesses and are used quite widely. Even if these sources fall outside the scope of your project, it is worth learning a bit about their characteristics and limitations.

Tremendous Change is Occurring in the Transformation from Paper to Digital Sources

Not so long ago, most GIS projects had to rely almost exclusively upon data available only in printed or "paper" form. Much of the data available for use is still published on paper, but a great deal of information is now distributed in digital formats. The ever increasing pace of this transformation from paper to digital sources has many repercussions for GIS. Data already produced in digital format will certainly ease the work and speed the process of developing GIS, but only if users learn how to employ these new sources effectively.

The digital data revolution also means that users must often search for materials in different places. To acquire some digital sources, users must contact the producers directly and work closely with them to gain the necessary data in a useable format. Also, the Internet and Worldwide Web are being used to distribute data to a greater extent, and this means knowing where to look and how to search the networks.

Finally, all data sources have strengths and limitations. Digital sources are no different. It is important to understand their characteristics, costs, and benefits before using them. Learning a little about commonly employed digital formats will save much work in the long run.


2. Common Data Types and Producers

Common Public Suppliers

Local, state, and federal government agencies are major suppliers of data. A good deal of detective work is sometimes required, however, to find the data you need. This is perhaps less the case at the federal level if only because certain key agencies such as the Bureau of the Census, United State Geological Survey, Soil Conservation Service, NASA, and Federal Emergency Management Agency, provide standard sorts of information for the entire nation.

A good place to start is the Federal Geographic Data Committee, Data and Services Page at http://www.fgdc.gov/dataandservices/. This links to three portal sites:

  1. GeoData.gov and data.gov
  2. USGS, Publications and Products, http://www.usgs.gov/pubprod/index.html
  3. The Global Spatial Data Infrastructure Association: http://gsdi.org/
  4. OpenStreetMap, http://www.openstreetmap.org/
  5. Wikimapia, http://wikimapia.org
Equally important are:
  1. The National Map: http://nationalmap.gov/
  2. The National Atlas: http://nationalatlas.gov/

For an extensive index of other federal on-line data sources see the USGS Publications and Data Products Page . The USGS index repeats some of the FGDP information, but also includes links to the Global Land Information System and tends to concentrate on data created through the USGS.

For other common federal data sources and helpful indexes to those sources see:

The acquisition of data from state and local providers can take more time and skill. Some states, such as Texas, have established central clearinghouses for data.

Click here to link to the Texas Natural Resources Information System .

Other states offer similar clearinghouse sites. Usually these are associated with natural resource, environmental, and planning agencies:

Other state information sources might be found through the Library of Congress's State and Local Governments Index .

Clearinghouses are an important step in acquiring data, but they will rarely have everything you need. In many cases, the most effective clearinghouses provide indices to data held by other agencies, not necessary the data itself--they simply point to you to the appropriate source. You then have to contact the source yourself.

The real problem at the local and state levels is that government agencies organize themselves--and their data--in very different ways from place to place. Because states and cities vary so greatly in their systems of organization, it sometimes takes time to discover which agency holds the information you want. In some cases, states and cities are required to collect certain types of information to comply with federal and state regulations--but again the agencies that actually compile this information may vary from place to place. This means that finding information at the local and state levels can be time consuming and may involve making phone calls, writing letters, asking questions, and visiting the agencies in person. Do not expect the data to be available in standard formats. Get to know the providers in your area by developing contacts at the agencies with which you work.

As an example, searches for data about Texas natural resources should start with the Texas Natural Resources Information System. The TNRIS may be reached through e-mail (tnris@twdb.texas.gov) conventional mail (TNRIS, P.O. Box 13231, Austin, TX 78711-3231), by phone (512/463-8337), or at http://www.tnris.org/

Local governments (cities, counties, etc.) also gather and provide information, but less is presently available on the Internet and coverage varies greatly. In Colorado, Boulder County's Land Use Department GIS is on-line, as is the City of Boulder's Planning Department (http://www.bouldercolorado.gov/index.php?option=com_content&task=view&id=3551&Itemid=1280). Data from government agencies may or may not come with adequate documentation and data quality reports. You may have to check the data or inquire how and why it was compiled.

One virtue of searching for information from government agencies is that most is in the public record and can be used for free. Sometimes, a small transfer cost is charged. At the state and local levels, however, you should always confirm that the information you gather is free of copyright restricitions. You may want to publish or sell the data you use, and copyright restrictions often exist specifically for such uses.

Private Suppliers

There are many private sources of information, some of which will be discussed below with regard to new commercial digital sources. Commercial mapmaking firms are among the largest providers, but other firms have for years supplied detailed demographic and economic information, such as data on retail trade and marketing trends. Some of this information can be quite expensive to purchase. Also, it is important to check on restrictions that might apply to the use of commercially provided data. In some cases, copyright and licensing restrictions may apply to your intended use and publication of the information. You should also ask for--and expect--documentation of the dataset pedigree, as discussed below.


3. Coping with the Digital Revolution

The Progress of the Revolution

When GIS first began to appear, users either developed their own formats for storing spatial information or employed those defined by vendors and their proprietary software. The formats were often specific to the needs of particular projects and were not intended, initially, to meet the needs of a broader range of users. It was often very difficult to share and transfer data. Users recognized that much time and money was being wasted when datasets could not be shared among a broader clientele. At the federal level, the United States Geological Survey and the Bureau of the Census were among the first agencies to begin experimenting with data formats that could serve both their needs and those of a wider public. The USGS's digital line graphs (DLG) and the Census's first dual-independent map encoding (DIME) files were the result, although both of these early formats have since been further refined into the DLG-2 and TIGER formats. The DLGs were a way of coding information drawn from the USGS's conventional paper quadrangle sheet maps, the DIME and TIGER formats a method of encoding the maps needed for effective census tabulations.

As these standards were being developed and used, the GIS world also saw a tremendous expansion in the range and coverage of spatial datasets. Large-scale digital map coverage was beginning to become available for entire states and cities. Many of these were still kept in proprietary software formats. Vendors and users, recognizing the limitations of being confined to these proprietary formats, began to develop in the 1980s far more effective and versatile file translation and conversion software. This meant that data could be imported and exported between proprietary software systems. Some vendors, such as AutoDesk, the makers of the AutoCAD computer-assisted drafting package, had tremendous success with their drawing exchange format (DXF), which allowed easy transfer of spatial data among CAD-based GIS.

The development of standards and the proliferation of conversion and translation software has had a galvanizing effect on the GIS world. Where once users had to count on digitizing datasets from scratch, they can now use a wide range of publicly available files. The growth in available files has been explosive and shows no sign of abating. The emergence of new standards, such as the recently adopted federal Spatial Data Transfer Standard, may even increase the pace of development. In beginning a GIS project, it is now wise to consider available digital sources carefully to see what is available before plunging ahead with paper sources.

Common Digital Formats

The ways in which digital data is being made available is in flux. The first standard formats were developed 20-30 years ago and are still employed for some datasets. New formats are also being developed. It is important for you to be aware of some of these formats because they are so widely used.

Commercial Formats

Occasionally, software vendors develop formats that become de facto standards for the transfer of information. In the world of GIS and cartography, the most important of these is the dxf, or drawing exchange format, developed by AutoDesk for the exchange of CAD files. The dxf standard is an ASCII format that describes the contents of a CAD drawing in a way that can be interpreted by other software systems. Almost all major CAD systems provide a means of importing and exporting dxf files in the AutoDesk format. Often small problems arise in the conversion of CAD drawings from and to dxf, but the format remains a very effective way of transferring information. You should become acquainted with the use of dxf because it is so widely employed.


4. Locating Digital Sources

The Internet and Worldwide Web

Over the past ten years the internet and web have become the leading means of acquiring primary and secondary data. In some cases, you must still contact providers in person or by mail to obtain data, but this is changing very reapidly.  The is now the primary means of disseminating data produced by the federal government and is assuming that role for most state governments.  At the moment, finding exactly what you are looking for on the Internet can be difficult, but navigation and focused searching of the networks are becoming easier.

Commercial Sources

Many software vendors earn a substantial income by repackaging and selling data in the proprietary forms used by their software products. Because the data is usually checked and corrected as it is repackaged, the use of these converted datasets can save time. The widespread expansion of this marketing and re- marketing of data has been a boon to many users who do not wish to be invest resources in building the datasets they need on a day-to-day basis--they simply buy what they need. Examples of software vendors that also sell data are:

Environmental Systems Research Institute the makers of ArcInfo and ArcMap software at: www.esri.com.
MapInfo, http://www.mapinfo.com/
Clark Labs, IDRISI Software, http://www.clarklabs.org/index.cfm

Other firms specialize in packaging and marketing datasets suited to a variety of software systems include: Equifax and Nielsen.

These and other firms will also build datasets to a user's specifications. These are often termed "conversion" firms. They are usually contracted to build special purpose datasets for utility companies and some government agencies. These datasets are often of such special purpose they cannot be assembled from existing publicly available sources, say when an electric utility wishes to digitize its maps of its service area.  An example is RAMTECH.

Finally, some commercial firms do provide a substantial amount of geospatial data free to users.  Perhaps the best example is GeoCommunity's GIS Data Depot .

Libraries

Libraries remain one of the best places to locate digital and paper sources.  Often local public libraries and university libraries are among the best places to gain access to local sources.


5. What to Look For and When to Quit

Checking on Pedigree and Quality: Becoming a Smart Shopper

As you consider available digital data, become a smart, well-informed shopper. It is said that an undocumented dataset is a worthless dataset. There is much truth in this assertion because, if you do not know what is in a dataset--its pedigree and quality, for example--you, the user, have to spend time checking it yourself. These days, you should expect to receive with your data some sort of "data quality report" from the vendor or provider. This report will provide a description of exactly what is in the file, how the information was compiled (and from what sources), and how the data was checked. The documentation for some products is quite extensive and much of the detailed information may be published separately, as it is for USGS digital products.

If documentation is limited, it is important for you to consider the following questions:

Coping with Data Availability and Differing File Formats: When to Give Up

In closing, it is important for you to be aware that sometimes the costs of using and converting publicly and commercially available digital files outweigh their value. Even these days, you must still consider the possibility that you will have to build a dataset yourself. No matter how much data becomes available publicly, there is no guarantee that it will contain exactly the sort of information that you need for your project. Or perhaps the information you need is divided among a large number of files provided by different public and private suppliers in different formats. When your work involves combing through such varied files, it is sometimes easier simply to compile the needed information from paper sources, or some combination of paper and digital sources. Just because some digital sources are available does not mean that you have to use them.

Furthermore, no matter how robust the conversion software you use, there are times when you find that the costs of converting data into the form you need outweigh the benefits. Again, just because a file exists does not mean that you have to use it in your work. If you do your homework carefully and consider the strengths and weaknesses of a given dataset, you can better judge whether conversion will repay the effort. Also, the importance of sampling datasets and testing them in your work is critical. Do not attempt a large conversion project without first testing your procedure and the data itself.


6. Examination and Study Questions


Last revised 2011.10.24. KEF.