These materials were developed by Kenneth E. Foote and Donald J.
Huebner, Department of Geography, University of Texas at Austin, 1996.
These materials may be used for study, research, and education in not-for-profit
applications. If you link to or cite these materials, please credit
the authors, Kenneth E. Foote and Donald J. Huebner, The Geographer's Craft
Project, Department of Geography, The University of Colorado at Boulder.
These materials may not be copied to or issued from another Web server
without the authors' express permission. Copyright © 2000 All
commercial rights are reserved. If you have comments or suggestions,
please contact the author or Kenneth E. Foote at k.foote@colorado.edu.
This page is also available in a framed
version. For convenience, a Full
Table of Contents is provided.
1. GIS as Representations of Reality
Perhaps we should use the acronym gIs, rather than GIS for geographic information
systems. These are really geographic INFORMATION systems. It is the information
they contain that makes them so valuable.
The database is also important because its creation will often account
for up to three-quarters of the time and effort involved in developing
a geographic information system. Once an organization compiles this information,
the database may be maintained for ten to fifty years. For this reason,
shortcuts are not recommended.
It is important, however, to view these GIS databases as more
than simple stores of information. The database is used to abstract very
specific sorts of information about reality and organize it in a way that
will prove useful. The database should be viewed as a representation or
model of the world developed for a very specific application.
One of the reasons that there are so many software and hardware
systems employed for GIS is because each system allows users to represent
and model certain types of phenomena.
2. Basic Types of Representation: Raster and
Vector Reality
One of the sharpest distinctions among GIS is the way that location is
represented in a database, as either a raster or vector position.
2.1. The Raster View of the World
A raster based system displays, locates, and stores graphical data by using
a matrix or grid of cells. A unique reference coordinate represents each
pixel either at a corner or the centroid. In turn each cell or pixel has
discrete attribute data assigned to it. Raster data resolution is dependent
on the pixel or grid size and may vary from sub-meter to many kilometers.
Because these data are two-dimensional, GISs store various information
such as forest cover, soil type, land use, wetland habitat, or other data
in different layers. Layers are functionally related map features. Generally,
raster data requires less processing than vector data, but it consumes
more computer storage space. Scanning remote sensors on satellites store
data in raster format. Digital terrain models (DTM) and digital elevation
models (DEM) are examples of raster data (Koeln et al 1994 and Huxhold
1991).

2.2. The Vector View of the World
A vector based system displays graphical data as points, lines or curves,
or areas with attributes. Cartesian coordinates (i.e., x and y)
and computational algorithms of the coordinates define points in a vector
system. Lines or arcs are a series of ordered points. Areas or polygons
are also stored as ordered lists of points, but by making the beginning
and end points the same node the shape is closed and defined. Topological
models define the connectivity of vector based systems. Vector systems
are capable of very high resolution (less than or equal to .001 inch) and
graphical output is similar to hand-drawn maps. This system works well
with azimuths, distances, and points, but it requires complex data structures
and is less compatible with remote sensing data. Vector data requires less
computer storage space and maintaining topological relationships is easier
in this system. Digital line graphs (DLG) and TIGER files are examples
or vector data (Koeln et al 1994; and Huxhold 1991).

2.3 Graphical Comparison of Raster and Vector Systems
It is important to stress that any given real world situation can be represented
in both raster and vector modes, the
choice is up to the user.
Each of these systems of representation has its advantages and
disadvantages:
| Method |
Advantages |
Disadvantages |
| Raster |
-
Simple data structure
-
Compatible with remotely sensed or scanned data
-
Simple spatial analysis procedures
|
Requires greater storage space on computer
Depending on pixel size, graphical output may be less pleasing
Projection transformations are more difficult
More difficult to represent topological relationships
|
| Vector |
Requires less disk storage space
Topological relationships are readily maintained
Graphical output more closely resembles hand-drawn maps
|
More complex data structure
Not as compatible with remotely sensed data
Software and hardware are often more expensive
Some spatial analysis procedures may be more difficult
Overlaying multiple vector maps is often time consuming
|
It should also be stressed that data modeled in one system can be converted
into the other. That is, raster data can be vectorized and vice versa.
Many systems even allow data modeled in raster form to be overlaid on vector
data and vice versa. In this graphical example,
an aerial photo (raster) is overlaid with with supplemental vector information.
3. Organizing Attribute Data
GIS use raster and vector representations to model location, but how they
must also record information about the real-world phenomena positioned
at each location and the attributes of these phenomena. That is, the GIS
must provide a linkage between spatial and non-spatial data. These linkages
make the GIS "intelligent" insofar as the user can store and examine information
about where things are and what they are like.
The relationship can be diagrammed as a linkage between:
-
Location <<< >>> What Is There
-
Spatial Data <<< >>> Non-Spatial Data
-
Geographic Features <<< >>> Attributes
At the most abstract level, this is a relationship between:
-
A Locational Symbol <<< >>> Its Meaning
In a raster system, this symbol is a grid cell location in a matrix. In
a vector system, the locational symbol may be a one-dimensional point;
a two-dimensional line, curve, boundary, or vector; or a three- dimensional
area, region, or polygon.
The linkage between symbol and meaning is established by giving
every geographic feature at least one unique means of identification, a
name or number usually just called its ID. Non-spatial attributes of the
feature are then stored, usually in one or more separate files, under this
ID number. In other words, locational information
is linked to specific information in a database
It is important to realize that this non-spatial data can be filed
away in several different forms depending on how it needs to be used and
accessed. Perhaps the simplist method is the flat file or spreadsheet,
where each geographic feature is matched to one row of data.
3.1 Flat Files and Spreadsheets
A flat file or spreadsheet is a simple
method for storing data. All records in this data base have the same number
of "fields". Individual records have different data in each field with
one field serving as a key to locate a particular record. For example,
your social security number may be the key field in a record of your name,
address, phone number, sex, ethnicity, place of birth, date of birth, and
so on. For an person, or a tract of land there could be hundreds of fields
associated with the record. When the number of fields becomes lengthy a
flat file is cumbersome to search. Also the key field is usually determined
by the programmer and searching by other determinants may be difficult
for the user. Although this type of database is simple in its structure,
expanding the number of fields usually entails reprogramming. Additionally,
adding new records is time consuming, particularly when there are numerous
fields. Other methods offer more flexibility and responsiveness in GIS.
3.2 Hierarchical Files
Hierarchical files store data in
more than one type of record. This method is usually described as a "parent-child,
one-to-many" relationship. One field is key to all records, but data in
one record does not have to be repeated in another. This system allows
records with similar attributes to be associated together. The records
are linked to each other by a key field in a hierarchy of files. Each record,
except for the master record, has a higher level record file linked by
a key field "pointer". In other words, one record may lead to another and
so on in a relatively descending pattern. An advantage is that when the
relationship is clearly defined, and queries follow a standard routine,
a very efficient data structure results. The database is arranged according
to its use and needs. Access to different records is readily available,
or easy to deny to a user by not furnishing that particular file of the
database. One of the disadvantages is one must access the master record,
with the key field determinant, in order to link "downward" to other records.
3.3 Relational Files
Relational files connect different
files or tables (relations) without using internal pointers or keys. Instead
a common link of data is used to join or associate records. The link is
not hierchical. A "matrices of tables" is used to store the information.
As long as the tables have a common link they may be combined by the user
to form new inquires and data output. This is the most flexible system
and is particularly suited to SQL (structured query language). Queries
are not limited by a hierarchy of files, but instead are based on relationships
from one type of record to another that the user establishes. Because of
its flexibility this system is the most popular database model for GIS.
3.4 Flat, Hierarchical, and Relational Files Compared
| Structure |
Advantages |
Disadvantages |
| Flat Files |
-
Fast data retrieval
-
Simple structure and easy to program
|
-
Difficult to process multiple values of a data item
-
Adding new data categories requires reprogramming
-
Slow data retrieval without the key
|
| Hierarchical Files |
-
Adding and deleting records is easy
-
Fast data retrieval through higher level records
-
Multiple associations with like records in different files
|
-
Pointer path restricts access
-
Each association requires repetitive data in other records
-
Pointers require large amount of computer storage
|
| Relational Files |
-
Easy access and minimal technical training for users
-
Flexibility for unforeseen inquiries
-
Easy modification and addition of new relationships, data, and records
-
Physical storage of data can change without affecting relationships between
records
|
-
New relations can require considerable processing
-
Sequential access is slow
-
Method of storage an disks impacts processing time
-
Easy to make logical mistakes due to flexibility of relationships between
records
|
Now, let us consider a couple of examples of matching applications to database
structures.
Exploratory research--flat files are easy to organize, space is
not particular problem
Government agencies--hierarchical systems are particularly attractive
Planning and development--relational might be justified for flexibility
4. Representing Relationships
GIS have the power to record more than location and simple attribute information.
In some situations, we will want to examine spatial relationships based
upon location, as well as functional and logical relationships among geographic
features.
Spatial Relationships
-
Absolute and relative location
-
Distance between features
-
Proximity of features
-
Features in the "neigborhood" of other features
-
Direction and movement from place to place
-
Boolean relationships of "and," "or," "inside," "outside," "intersecting,"
"non-intersecting," etc.
Functional Relationships among Geographic Features and Their Attributes.
This includes information about how features are connected and interact
in real-life terms. A road network might be classified functionally from
the largest superhighway down to the most isolated rural road or suburban
cul-de-sac based upon their role in the overall transportation system.
Minor roads and suburban streets "feed" major highways, but are not directly
connected to them. As another example in assessing wildlife habitats, various
environmental conditions function together to define the optimal living
environments for certain species. Within cities, ownership is a functional
classification of great importance as is landuse and zoning classification.
Logical Relationships among Geographic Features and Their Attributes.
Logical relationships involve "if-then" and "and-or" conditions that must
exist among features stored in the dataset. For example, no land may be
permitted to be zoned for residential use if it lies within a rivers five-year
flood plain. Development may disallowed in the habitat of some endangered
species.
Databases can be designed to represent, model, and store information
about these relationships as needed for particular applications.
5. The Example of Topological Relationships
Topology is one of the most useful relationships maintained in many spatial
databases. It is defined as the mathematics of connectivity or adjacency
of points or lines that determines spatial relationships in a GIS. The
topological data structure logically determines exactly how and where points
and lines connect on a map by means of nodes (topological junctions). The
order of connectivity defines the shape of an arc or polygon. The computer
stores this information in various tables of the database structure. By
storing information in a logical and ordered relationship missing information,
e.g., a line segment of a polygon, is readily apparent. A GIS manipulates,
analyzes, and uses topological data in determining data relationships.
Network analysis uses topological modeling for determining shortest
paths and alternate routes. For example, a GIS for emergency service dispatch
may use topological models to quickly ascertain optional routes for emergency
vehicles. Automobile commuters perform a similar mental task by altering
their route to avoid accidents and traffic congestion. Likewise an electrical
utility GIS could rapidly determine different circuit paths to route electricity
when service is interrupted by equipment damage. Similarly, political redistricting
planners could use certain algorithms to determine logical relationships
between population groups and areas for district boundaries.
To see how topology is represented or modeled, it
is useful to consider an example to see how connections are coded into
a database. This involves recording more than use the absolute location
of points, lines, and regions.
The first step is to record the location of all "nodes,"
that is endpoints and intersections of lines and boundaries.

Based upon these nodes, "arcs" are defined. These arcs have endpoints,
but they are also assigned a direction indicated by the arrowheads. The
starting point of the vector is referred to as the "from node" and the
destination the "to node." The orientation of a given vector can be assigned
in either direction, as long as this direction is recorded and stored in
the database.

By keeping track of the orientation of arcs, it is possible to use this
information to establish routes from node to node or place to place. Thus,
if one wants to move from node 3 to node 1, we can locate the necessary
connections in the database.
Now, "polygons" are defined by arcs. To define a given
polygon, trace around its area in a clockwise direction recording the component
arcs and their orientations. If an arc has to be followed in its reverse
orientation to make the tracing, it is assigned a negative sign in the
database.

Finally, for each arc, one records which polygon lies to the left and
right side of its direction of orientation. If an arc is on the edge of
the study area, it is bounded by the "universe."

Now that this information has been recorded in the database, it is possible
to pose questions about connectivity and location. For example:
-
What polygons adjoin polygon A? To find the solution, we first look to
see what arcs define polygon A, then we check to see what other polygons
are defined by these arcs in their negative orientation.
-
What is the shortest route from node 3 to node 2? Trace all arc paths that
lead from node 3 to node 2, sum their lengths by calculating distances
from node list. Choose path with shortest total length.
-
What polygon is directly across from polygon B along arc D? Search for
the polygon that is defined by the inverse (negative) of arc D.
Arc-node topology, as this is called, was developed several decades ago
as a convenient way of store information of this sort. It is used to encode
information used in the US Bureau of Census TIGER boundary files and is
the basis of the spatial modeling system used by the Arc/Info software
system.
6. Object-oriented Databases
The methods of file organization discussed above depend upon the careful
description of real-world phenomena in terms of their attributes, such
as height, weight, or age. It is these attributes that are stored in the
database and together they provide a sort of abstracted depiction of the
real-world feature. Much recent attention has focused on how to organize
this information in ways that more readily represent the way users gather
and use information about the world around them. That is, humans recognize
"objects" immediately in terms of their totality or "wholeness." Houses
and skyscrapers are recognized immediately by form and function. The differences
can be described in terms of the underlying attributes, but people recognize
these from experience.
The idea of "object-oriented" database is to organize information
(that is group attributes) into the sorts of "wholes" that people recognize.
Instead of "decomposing" each feature a distinctive list of attributes,
emphasis is placed on "grouping" the attributes of a given object into
a unit or template that can be stored or retrieved by its natural name.
Consider the following situation involving two ways of organizing
information about buildings zoned for different uses.
This information can be broken down into attributes, as follows:
| Parcel |
Use |
Height |
Minimum Lot Size |
Maximum Number of Dwelling Units |
| 01-4567 |
Residential |
35 ft |
10,000 SF |
1 |
| 01-5632 |
Residential |
35 ft |
7,000 SF |
2 |
| 04-6781 |
Residential |
40 ft |
43,560 SF |
23 |
| 05-3759 |
Residential |
60 ft |
43,560 SF |
54 |
| 06-3962 |
Office |
40 ft |
5,750 SF |
0 |
| 06-9977 |
Office |
60 ft |
5,750 SF |
0 |
To organize this information differently, let us first define some "templates"
that reflect the different "objects" we wish to include in the database.
| SF Single Family |
Token 1=Large Lot |
Token 3=Duplex |
| MF Multi-family |
Token 1=Low Density |
Token 5=High Density |
| LO Limited Office |
Must Specify Predominate Use |
Maximum Height=40 ft Minimum Lot Size=5,750 SF |
| GO General Office |
Must Specify Predominate Use |
Maximum Height=60 feet Minimum Lot Size=5,750 SF |
Once these are created, information can be added to our database by referring
to the template. The template maintains in one place all attributes held
in common by a certain class of object. It may be the case that slight
differences exist between objects of a given category. These differences
can be stored as "tokens" or additional attributes.
| Feature Number |
Token |
Description |
| SF-1 |
1 |
-
Single Family
-
Height=35 ft
-
Large Lot
|
| SF-1 |
3 |
-
Family Residence
-
Height=35 ft
-
Duplex
|
| MF-2 |
2 |
-
Multi-family
-
Height=40ft
-
Low Density
|
| MF-5 |
5 |
-
Multi-family
-
Height=60 ft
-
High Density
|
| LO |
40 |
-
Limited Office
-
Neighborhood Needs
|
| GO |
50 |
|
Although templates and tokens may be stored in two different files, it
is easy to see how this method of organization changes the database. It
is not merely a process of simplication. By using templates, users can
enter and retrieve data in terms of "real" items. A query might ask for
all "Single Family Houses."
Object-oriented databases thus have the advantage of organizing
information in ways that users often find easier to use. The database has
as an intuitive feel because it employs that categories that users employ
naturally in day-to-day life. For this reason, object-oriented databases
are gaining increased attention in GIS.
7. The Idea of the Expert Systems
If a database has been designed to store information about spatial, functional,
and logical relationships, the user can pose more complex questions of
the data. That is, the user can program the system to consider a variety
of spatial, functional, and logical conditions during query or analysis.
Such efforts result in what are termed expert systems or,
if carried further, artificially intelligent systems. At
there simplest, expert systems allow the user to set "rules" that must
be followed as data is analyzed. These rules are written to mirror the
way an experienced user would compare or judge data. As more and more rules
are written, the system becomes more adept or "expert" at finding solutions
with less directed guidance by users.
The point of expert systems is to build sets of rules that reflect
the sorts of comparisons and judgments that experienced users would make.
By programming these rules into the system, more and more of the work of
decision making can be passed on to the computer system--including complex
comparisons that may be difficult or time consuming for even experienced
users to undertake.
Such systems are of interest to GIS practioners in many fields including
urban planning and resource analysis. Complex issues involving zoning and
land use can often be written in terms of rules that need to be followed.
At the same time, following rules in only a step toward "intelligence."
The difference between expert systems and artificial intelligence is much
in debate. But to be truly "intelligent" a system must be able to "learn,"
"think," or "reason," perhaps really to write its own rules from experience.
The definition of artificial intelligence is, in fact, still a contentious
issue. So far, it has been very difficult to program computer systems to
provide a semblance of human thought processes. Yet, the potential of such
systems makes the effort irresistible. The idea that computer systems might
one day be able to reason about real- world environmental and geographical
problems and issues is a reason why GIS theorists maintain an interest
in developments in the area of artificial intelligence.
8. Required Reading
-
Chapter 5: Pages 85-110 in Antennucci, J.C., Brown, K., Croswell,
P.L., Kevany, M.J. 1991. Geographic Information Systems. New York
and London: Chapman & Hall.
-
Pages 36-61 in Huxhold, W.E. 1991. An Introduction of Urban Geographic
Information Systems. New York and Oxford: Oxford University Press.
9. References and Supplemental Reading
Antennucci, J.C., Brown, K., Croswell, P.L., Kevany, M.J. 1991.
Geographic
Information Systems. New York and London: Chapman & Hall.
Burrough, P.A. 1986. Principles of Geographical Information
Systems for Land Resource Assessment. New York: Oxford University Press.
Huxhold, W.E. 1991. An Introduction of Urban Geographic Information
Systems. New York and Oxford: Oxford University Press.
Koeln, G.T., Cowardin, L.M., Strong, L.L. 1994. Geographic information
systems. in T.A. Bookhout, ed. Research and Management Techniques
for Wildlife and Habitats. Fifth Edition. Bethesda: The Wildlife Society
Pages. pp. 540-566.
Walker, J.D., Black, R.A., Linn, J.K., Thomas, A.J., Wiseman,
R., and D'Attilio, M.G. 1996. Development of Geographic Information Systems-Oriented
Database for Integrated Geological and Geophysical Applications. GSA
Today: A Publication of the Geological Society of America 6(3):2-7.
Last revised on 2000.3.27. LNC
