4th International Conference on Integrating GIS and Environmental Modeling (GIS/EM4):
Problems, Prospects and Research Needs. Banff, Alberta, Canada, September 2 - 8, 2000.


A Semantic GIS Database Model for the Exploration of Spatio-Temporal Environmental Data

GIS/EM4 No. 197

Jeremy L. Mennis
Donna J. Peuquet
Diansheng Guo

Abstract

The purpose of this research is to develop a semantic GIS database model that better facilitates the exploration and analysis of spatio-temporal environmental data sets than conventional GIS database models. The goals in the design of this semantic database model are two-fold: 1) to efficiently manage and integrate diverse sets of spatio-temporal environmental data so that those data may be queried, visualized, and entered into statistical analyses, and 2) to allow for the explicit representation of the environmental entities, dynamic processes, and relationships that compose an environmental system in a manner that is intuitive and useful to the researcher. This database model integrates cognitive and computational knowledge representation techniques and is developed using the object-oriented database management system Poet (Poet Inc., San Mateo, CA). As a demonstration of its utility, we implement the database model using a spatio-temporal meteorological data set and the derived storm entities that are captured by these observational data.

Keywords

Data model, database representation, semantic representation, conceptual modeling, object-oriented.


Introduction

Geographic Information Systems (GIS) are becoming an increasingly popular tool for handling digital environmental data. GIS provides spatial data management, analysis, and display functions and facilitates the integration of diverse sources of data, qualities that make GIS an ideal tool with which to explore the vast amount of environmental data that is becoming available. However, the database models and analytical techniques of currently available GIS packages are most appropriate for the representation and analysis of simple, spatially discrete, and temporally static entities, such as roads and parcel boundaries. As environmental scientists have turned to GIS for the management and analysis of environmental data, this representational focus has proved very limiting since most environmental entities are complex, spatially 'fuzzy' (having non-discrete boundaries), and dynamic (Burrough and Frank 1995).

The purpose of this research is to develop a GIS database model that better facilitates the exploration and analysis of spatio-temporal environmental data sets. The goals in the design of this database model are two-fold: 1) to efficiently manage and integrate diverse sets of spatio-temporal environmental data so that those data may be queried, visualized, and entered into statistical analyses, and 2) to allow for the explicit representation of the environmental entities, dynamic processes, and relationships that compose an environmental system in a manner that is intuitive and useful to the researcher. Meeting these goals demands a database model that not only efficiently manages large quantities of observational data, but also represents semantics: higher-level knowledge and interpreted meaning about the represented domain beyond that explicitly encoded by the raw, observational data (Peckham and Maryanski 1988). In this paper, we describe the development of such a semantic GIS database model. As a demonstration its utility, we implement the database model using a spatio-temporal meteorological data set and the derived storm entities that are captured by these observational data.

Conceptual Framework

A number of researchers have looked to research in cognition for representing semantics in GIS (e.g. Peuquet 1988, Nyerges 1991, Usery 1993). Often, these approaches are combined with object-oriented modeling and Artificial Intelligence (AI) knowledge representation techniques as a means of formalizing the principles of knowledge representation in a digital environment (e.g. Worboys 1994, Gahegan and Flak 1999). We have drawn from this research in order to develop a conceptual framework, called the Pyramid framework, with which to guide the implementation of the semantic GIS database model and we review it briefly here. A more detailed explanation is provided in Mennis et al. (forthcoming). The Pyramid framework is composed of two separate, yet interrelated parts, the Data Component and the Knowledge Component (figure 1). The Data Component can be conceptualized as a multi-dimensional, spatio-temporally referenced 'hypercube' of observational data that is akin to the 'feature space' concept commonly cited in the analysis of remote sensing imagery. The Data Component stores spatio-temporally referenced observational data, such as spectral, climate, vegetation, and other environmental attributes, that may be queried and visualized to reveal embedded spatio-temporal patterns and relationships. The multiple dimensions of the Data Component can be divided into three categories: location (position in the three spatial dimensions), time (position along a time line), and theme (a set of observations, measurements, or attribute values associated with a particular location and time).

Figure 1. Overview of the Pyramid framework: Data Component and Knowledge Component.

The Knowledge Component stores information about higher-level semantic 'objects,' the geographic entities or processes that are described by the data. Information concerning an object's location, time, and 'composition' (the observational data that indicate the presence of the object) is stored as a reference to a portion of the Data Component. All objects are also placed within two hierarchical relationship structures central to cognitive knowledge representation and object-oriented modeling, taxonomy (generalization) and partonomy (aggregation). The taxonomy structure groups similar objects within a category and stores a rule-base that describes how those objects may be identified within the data space. These rules may be derived from expert knowledge or from inductive analysis of the observational data. The rule-base is encoded within a template, akin to a frame as described in the classical AI literature (Minsky 1975). The template can be used to formulate a query that extracts occurrences of the semantic object from the data space. The partonomy structure allows for the representation of part-whole relationships among objects.

Application Context

The case study application for the implementation of the database focuses on the representation of storms in the Susquehanna River Basin of Pennsylvania using a spatio-temporal data set of meteorological observations. This data set contains minimum temperature, maximum temperature, and total precipitation data for each day between the years 1986 - 1996. In its original format, each day's observations over the entire region are stored as a 97 x 91 cell rectangular grid with a resolution of approximately four kilometers. Each grid was generated from an interpolation of observations at approximately 150 climate stations throughout the region (the actual number of stations varies throughout the history of the data set).

Population of the proposed database using this data set clearly differs between the Data Component and the Knowledge Component of the Pyramid framework. The Data Component concerns the representation of the observational data, in this case the daily maximum and minimum temperature and precipitation data. In this implementation, the multiple dimensions of the Data Component describe the two dimensional location array and time line of data observations as well as the values of the meteorological observations associated with each location and time.

Implementation of this data set within the context of the Knowledge Component is more complex than that of the Data Component. The Knowledge Component represents the conceptual entities, and their properties and categorization, that are interpreted from the data. In the context of this data set and application, the Knowledge Component represents the set of individual storm phenomena captured by the observational data. These storm phenomena are represented within the Knowledge Component as semantic objects, with their associated spatio-temporal and behavioral properties, as well as by their instantiation within the location, time, and theme observational data within the Data Component.

Implementation

Overview

The Pyramid framework was implemented using an object-oriented database platform, which provides an implementation environment closer in structure to the conceptual design than relational or object-relational database platforms can provide. We used the object-oriented database Poet (Poet, Inc., San Mateo, CA) for implementation because it supports the relationship structures and query capabilities that the Pyramid framework demands, has an intuitive graphic user interface, and supports customization through a variety of programming languages. We use Java as the development programming language.

Data Component

Recall that the Data Component is defined by a spatio-temporally referenced multi-dimensional hypercube of location, time, and theme perspectives. We experimented with a number of different implementation options and ultimately decided on a Data Component data structure that emphasizes query efficiency over cost of storage. This structure provides a class, called AttValue, that represents one attribute value of a certain type (e.g. daily precipitation, figure 2). This class contains one location and one time object for spatio-temporal referencing.



Figure 2. Overview of the data structure of the Data Component and Knowledge Component presented in UML (Unified Modeling Language). All relationships between classes that are shown here are aggregation relationships and have 'one-to-one' relationships, except where indicated.


One AttValue object also contains an AttDescription object that describes the name and other information associated with that attribute, including its maximum and minimum values in the data set. The Metadata class stores range values for all the dimensions in the Data Component: location, time, and theme. This information may be retrieved by the user to provide information on what data are available in the database. The Metadata class may also be instantiated to form a range query on the Data Component in order to retrieve a 'portion' of the data space: a range in location, time, and specific themes and ranges in theme values. Such a query returns a collection (set) of AttValue objects that may be visualized or otherwise analyzed, as discussed below.

Knowledge Component

The primary class in the Knowledge Component is the Category class which represents a category, or type, of environmental entity (figure 2). The Thing class represents an actual observed geographic entity. A Category object has a collection of Thing objects, e.g. there are a number of individual observed storms that may be categorized as a certain type of storm. A Thing object is composed of an aggregation of AttValue objects. In other words, an observed geographic entity is defined as a 'portion' of the Data Component with an extent in the location, time, and one or more theme dimensions.

Any geographic domain may be represented through the specification and extension of the Category and Thing classes. The reason for separating the representation of an observed entity (Thing) from its categorical identity (Category) concerns the difference between the properties of a category and an observed geographic entity. Consider the properties of an actual observed storm entity. A storm, for example, has a specific size, duration, and total precipitation. The category of storm, however, has more general properties that may apply to all individual storm entities - a range in size, duration, and total precipitation. In the database, the properties of a Category object are used to identify and recognize individual Thing objects, i.e. the properties of the Category of storm are used to extract individual observed storms from the Data Component.

Because the relevant properties of geographic entities vary from one particular application domain to another, there are two libraries of properties available, one for the Category class and one for the Thing class, that may be added to specifications of the Category and Thing classes as the user sees fit. These two libraries are similar to one another except that a property that is associated with the Thing class describes an actual observed entity (e.g. average size) while a property associated with the Category class describes a range in values that defines a criterion for membership within that category (e.g. minimum and maximum size). Other user defined properties may be added when the Thing and Category classes are extended in order to represent a particular geographic domain.

Storm Representation

A hierarchy of storm types is defined by extending the Category class. We have developed a typology of storms that makes the best use of the available data given its spatial and temporal resolution. This typology defines four types of storms that are defined by their size, duration, and severity (indicated by a storm's average daily precipitation). Figure 3 shows this typology as it is extended from the Category class: the StormCategory class extends the Category class, the Local and Regional classes extend the StormCategory class, the LocalMild and LocalSevere classes extend the Local class, and the RegionalMild and RegionalSevere classes extend the Regional Class. Note that only the latter four classes are instantiated - the StormCategory, Local, and Regional classes are all abstract.



Figure 3. The Category class and Thing class extended for the representation of storms in the Knowledge Component, presented in UML. All relationships between classes that are shown here are generalization relationships.


All four types of storms share the same generic types of category properties. In other words, it is the properties of size, duration, and severity that are used to identify actual observed storms of a given type, although the values for those properties obviously varies from type to type. Those relevant property classes may therefore be included in the upper levels of the storm type hierarchy in the StormCategory class and inherited to those storm type classes that will actually be instantiated. Each type of storm is specified to have a range of acceptable values in size, duration, and severity. Table 1 describes the four non-abstract classes of storm types and their properties and property values. Note that some of the types of storms do not have values for certain properties, e.g. there is no maximum size limitation for the Regional storm type.




Properties
 Type of Storm
 
 LocalMild 
 LocalSevere 
 RegionalMild 
 RegionalSevere 
Min. Size (sq. km)
16
16
101
101
Max Size (sq. km)
100
100
   
Min. Duration (days)
1
1
1
1
Max. Duration (days)
1
1
   
Min. Avg. Prcp. (in/day)
0.01
1.01
0.01
1.01
Max. Avg. Prcp. (in/day)
1
 
1
 


Table 1. Categories of storms and their properties.



The Thing class is also extended for the representation of the storm domain to the StormThing class (figure 3). Each category of storm (e.g. LocalMild) will therefore have a collection of StormThing objects that represent actual storms that are observed in the observational data. The StormThing class has a set of properties that are particularly relevant for representing individual storms. These properties include the birth, death, lifespan, and average precipitation of the storm. The values for each of these attributes may be calculated by calling methods from the StormThing class.

The Category and Thing classes are tied together via the Engine class (not shown) which takes the property values of an Category object (e.g. LocalMild) and uses those values to execute a query on the Data Component to extract a set of StormThing objects. This query makes use of Poet's implementation of Object Query Language (OQL).

Conclusion

We have demonstrated here a semantic database model that allows for the representation of not only observational data but also the conceptual entities that may be captured by those data. This database model integrates the strengths of cognitive and computational representation to offer a generic database representation platform that may be extended to represent a specific geographic domain. Note that while our study has focused explicitly on meteorological phenomena, the overall structure of this database model may be applied to the analysis of many different types of observational data in a variety of analytical settings. For example, the semantic database model presented here is well suited for exploring multi- and hyper-spectral satellite imagery as it can support the representation of both the 'raw' imagery as well as the interpreted 'objects' or regions (e.g. the extraction of roads or land cover classes) that may be derived from that imagery. In addition, while our case study presents a simple rule-base approach to the extraction of storm entities from the meteorological data, other approaches, such as those associated with data mining, may also be applied within the context of the semantic database model.

We believe this approach to incorporating semantics into GIS database representation is an important step in advancing data exploration capabilities for environmental data analysis as it offers the opportunity for experts to explicitly represent their own conceptual models of a given environmental domain within the context of the database. We envision that this semantic database model form part of a larger data exploration system where a user may develop and iteratively revise their conceptual model of a geographic domain based on patterns that may be either statistically or visually recognized in the observational data. Continuing research in this area focuses on developing user interfaces for interacting with the semantic representations that the database model offers as well as the integration of more advanced inductive and deductive classification techniques within the data exploration system.

Acknowledgements

We would like to thank Alan MacEachren, Mark Gahegan, Dan Haug, Isaac Brewer, and Masa Takatsuka for their valued contributions to the development of this research. This research was supported in part by U.S. Environmental Protection Agency grant #R825195-01-0.

References

Burrough P.A., Frank A.U. 1995. Concepts and paradigms in spatial information: are current geographical information systems truly generic? International Journal of Geographical Information Systems 9 (2): 101-116.

Gahegan M., Flack J. 1999. The integration of scene understanding within a geographic information system: a prototype approach for agricultural applications. Transactions in GIS 3 (1): 31-49.

Mennis J.L., Peuquet, D.J., Qian L. forthcoming. A conceptual framework for incorporating cognitive principles into geographical database representation. International Journal of Geographical Information Science.

Minsky M. 1975. A framework for representing knowledge. In: Winston P.H., editor. The Psychology of Computer Vision. New York: McGraw-Hill. p 211-277.

Nyerges T.L. 1991. Geographic information abstractions: conceptual clarity for geographic modeling. Environment and Planning A 23: 1483-1499.

Peckham J., Maryanski F. 1988. Semantic data models. ACM Computing Surveys 20 (3): 153-189.

Peuquet D.J. 1988. Representations of geographic space: toward a conceptual synthesis. Annals of the Association of American Geographers 78 (3): 375-394.

Usery L. 1993. Category theory and the structure of features in geographic information systems. Cartography and Geographic Information Systems 20 (1): 5-12.

Worboys M.F. 1994. Object oriented approaches to geo-referenced information. International Journal of Geographical Information Systems 8 (4): 385-389.


Authors

Jeremy L. Mennis, Department of Geography
Pennsylvania State University, 302 Walker Building, University Park, Pennsylvania, USA 16802.
Email:jmennis@gis.psu.edu, Tel: +1-814-865-6421, Fax: +1-814-863-7943.

Donna J. Peuquet, Professor, Department of Geography
Pennsylvania State University, 302 Walker Building, University Park, Pennsylvania, USA 16802.
Email:peuquet@geog.psu.edu, Tel: +1-814-863-0390, Fax: +1-814-863-7943.

Diansheng Guo, Department of Geography
Pennsylvania State University, 302 Walker Building, University Park, Pennsylvania, USA 16802.
Email:dguo@psu.edu, Tel: +1-814-865-3020, Fax: +1-814-863-7943.