4th International Conference on Integrating GIS and Environmental Modeling (GIS/EM4):
Problems, Prospects and Research Needs. Banff, Alberta, Canada, September 2 - 8, 2000.



Using remote sensing, GIS and artificial intelligence to

evaluate landslide susceptibility levels:
Application in the Bolivian Andes


GIS/EM4 No. 228

Stéphane Péloquin

Q. Hugh J. Gwyn



Abstract

A model based on linear discriminant analysis was developed which can reproduce the Landslide Susceptibility Map prepared independently by an Expert. The model evaluates the contribution from Topographic, Geoecological and Drainage thematic factors and variables. Overall accuracies of 89 and 78 % were attained depending on whether 2 or 3 classes of susceptibility are used. The Model identified the most important variables from an initial set of 44. The major variables were mainly topographic (readily available) and geoecological (not always available). Using the list of model variables we then evaluated the contribution of three satellites sensors to provide the geoecological variables. The overall accuracies based on SPOT, Landsat and RADARSAT are in the range from 75 to 80 %. Each of the sensors performed equally, though Landsat had the highest score. Because there is no absolute bases for evaluating an LS map, the modeling result are considered very successful.


Keywords
Natural Hazard Mapping, Landslide Susceptibility mapping, Remote Sensing, GIS, Expert System.


 

Introduction

 

Landslide susceptibility (LS) mapping is a complex task that requires experience and fine expertise to produce reliable maps. Over the years, the need for such information has increased significantly because the population tends to move into geomorphologically fragile and hazardous regions of the world in response to increased population (Bruce, 1993). The production of these maps assists decision makers to plan urban development with more accuracy and protect local residents more efficiently (Leroi, 1996; Schuster, 1996).

 

Unfortunately, this information is almost inexistent because of the number and complexity of the required steps needed to produce LS documents (Cruden and Varnes, 1995).

 

As of today, the best LS maps are produced by earth scientist experts having years of experience applying traditional methodologies (i.e. air photos, existing maps and field work) as the basic sources of information (Van Westen, 1993; Leroi, 1996). These experts are recognised as being able to generate results that are highly reliable. Unfortunately, they are too few in the world to respond to the actual demand.

 

This paper presents the modelling steps that we have developed to reproduce a LS level map originally created by an expert using air photos. The reproduction involved the application of an artificial intelligence technique to extract the expert knowledge in order to develop a mathematical susceptibility model. Our results show that it is possible to automatically generate these documents with an accuracy ranging from 78% to 89%. The methodology allows less experienced people to generate LS maps and thereby greatly increase the availability of such information.

This research focused on two specific objectives.

 

1.      Identification of a methodology that automatically extracts, concretises and makes available all the implicit knowledge used by expert to map LS level;

2.      Reproduce the expert’s map automatically using both remote sensing and ancillary data with a precision equal to what an expert can generate.

 

Study site

 

Our study area is in the vicinity of Cochabamba, Bolivia. This city of 700 000 people is located in the middle of the country and is the third major city. Cochabamba lies on a lacustrine plain at an altitude of 2 500 m, and is surrounded by 39 watersheds that reach an average altitude of 4 200 m. The watersheds are part of the eastern Andean Cordillera. More specifically, our study concentrates on a 20 km2 pilot watershed named Taquiña. It has been intensively studied since 1991 by PROMIC, a bilateral (Bolivia-Switzerland) agency, which has the mandate of managing all the watersheds surrounding the city. To date, the PROMIC multidisciplinary team has accumulated, from field surveys and air photo interpretations, all the necessary data for land-use, vegetation, soil, geology and geomorphology mapping for 6 watersheds including Taquiña.

 

 

Background

 

The literature clearly shows that LS mapping is usually realized via the application of a diversity of methodologies that can be grouped in three different classes and which commonly include these three analytical steps:

 

1.      Identification and mapping of a series of topographical, ecological, geological and geomorphological factors directly or indirectly correlated to instability phenomenon;

2.      Evaluation of the contributing weight for each selected factors;

3.      Overall classification into homogeneous LS zones.

 

The methodology types are differentiated mainly on the basis of the following aspects: 1) data acquisition techniques (field surveys, air photos, and satellite data) ; 2) identification of possible contributing factors; 3) weight assignment techniques.

Our methodology is a combination between deductive and inductive procedures. As we demonstrate, we have profited from both expert knowledge and statistical mapping of LS. This brings our method to a higher level of objectivity.

 

Even today, use of remote sensing is mostly limited to visual recognition of landslide variables. The potential evaluation of this rich data source is still made in function of the capability of identifying scars produced by ancient landslides (Leir et al., 1996, Greenbaum et al., 1996; Van Westen, 1993; Soeters et al., 1993). Two recent surveys concerning the use of remote sensing for LS mapping reported that remote sensing data are too complex to use and too expensive, leaving air photos and field surveys as the main data sources (Mantovani et al., 1996). We believe that the benefits of remote sensing are much more extensive and satellite data can be used in replacement of air photos to establish geoecological conditions prior to performing the final LS evaluation.

 

Many authors have used GIS as their principal tool to gather and process information to generate LS mapping (Luzi and Petgalani, 1996; Mejia-Navarra et al., 1994). However, the majority of these studies did not take full advantage of the tool and no new methodologies were developed in the new environment. For example, no study integrating remote sensing, fuzzy logic and artificial intelligence has been developed as of today. We believe that such a system would seriously increase the quality of the final results.

 

Knowledge Base development

 

By definition a Knowledge Base (KB) corresponds to a set of facts and rules needed to reach the solution to a certain problem. For this specific step our goal is to bring together all the information that the expert used intuitively to perform his evaluation.

 

For this specific study, two datasets were used.

 

The first dataset corresponds to the LS map drawn by a very experienced Expert who prepared the interpretation from 1: 20 000 air photos. This was done without using any of the existing thematic maps in order to preserve the objectiveness of his research. The map shown in Figure 1 constitutes the final interpretation of LS of the pilot zone. Three LS classes were identified. The characteristics of each class are also presented in Figure 1.

 

Our objectives are based on the following hypotheses:

 

1.      The map produced by the expert constitutes the best evaluation of the instability conditions within Taquiña that can be made;

2.      The field characteristics required to evaluate LS are measurable from standard field surveys;

3.      The spatial distribution of LS classes can be explained with confidence by a combination of geoecological factors that can be expressed concretely and adequately (Varnes, 1984).

 

 

LS class

Characteristics

Low

Stable zones representing no danger.

Medium

Relatively stable zones that could present some danger if a triggering factor should occurred

High

Fragile zones characterized by conditions favorable for an eventual landslide that could be triggered at any moment

 

Figure 1. Expert LS map and class characteristics of Cuenca Taquiña

 

 

The second dataset corresponds to all existing information concerning geoecological, topographical and drainage conditions of Taquiña. This information constitutes the main reference themes that we have selected. It was provided by PROMIC resulting from extensive field surveys and air photo interpretation (Salinas, 1995; Claure et al., 1994.)

 

Data Hierarchy

 

The hierarchical organization of the KB is divided in three levels (Figure 2). The first level corresponds to the 3 major themes from which we have identified 9 factors. Each factor matches a specific aspect of a selected theme. For example, the vegetation conditions constitute one of the 3 factors taken from the geoecological theme; the slope is a factor taken from the topographical theme. The last level includes 44 variables extracted from the factors. Each variable is a representation of a unique legend class for a specific factor.

 

This information has been chosen in respect to existing literature but also from our combined knowledge of the study area and the instability phenomena.

 

Data Formatting

 

The diversity of our KB provides us with the potential to identify the expert’s motivations in dividing the area into LS levels. Prior to those steps, we homogenized all the variables into raster layers of similar geographic projection, spatial resolution and pixel size. We then brought each layer to its basic format. This format corresponds to a binary map showing the presence/absence of a certain class. This process was done using a routine that uses Boolean rules in order to automatically isolate the needed information (Figure 3).

 

 

 

 

Figure 2. Knowledge Base hierarchy scheme

 

 

 

Figure 3. Rule-based methods for creating variables

 

 

LS Model development

 

To develop the LS model, we took advantage of linear discriminant analysis (LDA). This multivariate statistical technique is often employed in Earth Sciences to locate mineralized zones. Surprisingly, this method is rarely used to establish LS levels, even though a few authors have used it successfully (Baeza et Corominas, 1996; Carrara et al., 1992).

 

LDA corresponds to a regression analysis that identifies the differences between each of the three LS classes in reference to the KB content. To mathematically establish these differences, the algorithm weights and linearly combines the valuable variables to statistically separate or discriminate each class (Ficher, 1936). The selection and linear combination of variables will constitute the susceptibility model in the form of a canonical function. From this model it will then be possible to map zones outside of the study area where the geoecological conditions are similar.

 

Statistical evaluation

 

We randomly selected 603 samples equally distributed over the LS map prepared by the Expert, which represents 33 samples/km2. They were then divided into two subgroups: the first (306 samples) was used to train the model; the second subgroup (297 samples) was used to evaluate the model accuracy.

 

The KB information concerning each training sample was exported to SPSS (version 4) in such a way as to build a binary database with values of 0 and 1 illustrating the presence-absence of each of the 44 variables. Before applying the LDA stepwise, we calculated a correlation matrix to identify repeated information resulting in 5 variables being removed. Stepwise application of LDA integrates the variables one by one starting with the variable that explains the highest percentage of the overall variability (Klecka, 1975). During the iteration process, each variable is added to the model and a discriminant score is calculated at each step to verify if the new variable entered increases or decreases the overall discrimination.

 

Scenarios

 

The first scenario used considered all variables entered (39). This constitutes an estimation of the capability of reproducing the expert’s knowledge when all the three themes (geoecological, topography and drainage) are available in the zone to be mapped. However, we also generated alternative scenarios calculated on a selection of available variables. This served as a simulation to estimate the model precision when the KB in a study area is limited. To do so, three alternative scenarios were evaluated. Each scenario regroups a subset of available knowledge as shown on the Figure 4.

 

Remote sensing data

 

Taquiña is the most studied watershed among its 38 neighbors, where the others are only covered by topographic information.

 

 

 

 

Figure 4. Alternative scenarios as a function of available Knowledge Base.

 

Since our model uses knowledge extracted from thematic information such as geomorphology, vegetation and landuse, we had to find an alternative to compensate this lack of knowledge outside of Taquiña. In this second phase we anticipated a decrease of accuracy of the model where thematic information is missing. To fill this gap, we have evaluated the potential of using remote sensing to map thematic variables that were selected as being important for the model. Three image types were evaluated: SPOT (XS panchromatic), TM and RADARSAT (Fine mode). Once the original model was established, we replaced the thematic variables obtained from traditional mapping by variables extracted from the classification of each image type. The model was then recalculated and the results were then compared in order to estimate the potential of each sensor and the potential of multisensors fusion.

 

Results and Discussion

 

The results obtained for each scenario are calculated by applying the model to the validation sub-group samples. The evaluation method is more conservative and severe but permits a better evaluation of the precision and reliability for each scenario. The method is also used instead of applying the model over the training subgroup samples, which obviously over evaluates the results.

 

The resulting discriminant score calculated for each pixel corresponds to a number between 1 and 3. This value was rounded to its closest entity to correspond to one of the three classes (1=Low, 2=Medium and 3=High). To evaluate the model precision we computed the number of samples that correspond to the ground truth.

 

We also evaluated the model performances when only the two extreme susceptibly classes (Low and High) are considered. We believe that for many projects, mapping of these two classes is adequate to provide the required security level.

 

An accuracy rate of 89% and 78% is obtained when respectively two and three LS classes are considered and when all 39 variables are integrated into the model. The results obtained from the application of our methodology proved that the expert’s knowledge can be reproduced with a high level of success and that we can numerically recreate the intellectual processes with accuracy. These results are slightly higher than those obtained from other studies with similar objectives (Carrara, 1983, 1988; Neuland, 1976).

 

The classification accuracies for each class within each scenario are shown in Table 1. The two extreme classes (High and Low) obtained the highest scores compared to the Medium susceptibility class. That the two extremes High and Low should be accurately defined in the numerical technique is intuitively acceptable. A question is raised with respect to the lower accuracy of the Medium class. Is this the result of a lack of information to isolate the specific set of variables characterizing this class or is it due to inconsistent interpretation on the part of the Expert?

 

Clearly, when all the variables are incorporated, the success rate increased sharply; demonstrating the importance of geoecological variables. There is a difference of 20% when three LS classes are considered between Scenario I and Scenario IV. This suggests that the expert was strongly influenced by geoecological conditions to establish the Medium LS class.

 

From the relevant discriminant variables selected by the LDA model, five were related to geoecological conditions. These results are very interesting when it is considered that the Expert’s map does not represent an absolute bases for comparison. It would be generally accepted that the Expert Map is only 75 to 80 % accurate at best. Thus it might be argued that both the Expert Map and the Model results are equivalent.

 

 

Table 1. LS level classification accuracy

 

 

 

   Three LS Classes

Two LS Classes

 

Accuracy

Accuracy

OA

Low

Medium

High

OA

Scenario I

59%

70%

43%

69%

82%

Scenario II

63%

72%

52%

68%

85%

Scenario III

70%

80%

59%

72%

86%

Scenario IV

79%

89%

69%

75%

89%

                  OA: Overall accuracy

 

 

Following the application of the LDA model based on the complete set of map data, we then evaluated the modeling approach using specific geoecological variables supplied by a combination of three satellite sensors - SPOT HRV/XS, Landsat TM and RADARSAT 1. The same training and validation sample sites were used in this second phase of modeling. The accuracy of these new models allows us to estimate the reliability of LS mapping in areas where geoecological information might not be available (Table 2). The results are all in the range of 75 % (+ or – 3 %) in comparison with the Expert Map. Though lower than those based on field data, the differences are mainly related to the inaccuracies of the classification of the images. The data also suggest that all sensors provide approximately equal results, with TM offering a slightly higher score.

 

 

Table 2. LS overall accuracy results using different remotely sensed data

 

 

 

Image Types

 

HRV/XS

TM

RADARSAT

RADARSAT+TM

Two LS Classes

77%

76%

74%

77%

Three LS Classes

72%

74%

72%

75%

 

 

Finally, the map presented in Figure 5 was produced from the automatic application of the model over the study area using all needed variables within a GIS package. The raw data was first incorporated and all the steps (commands) required to produce the final LS level map were computed within an independent script that was available via a user-friendly interface. The map shows clearly the correlation between the ground truth and the Model generated map. It also shows the natural generalization which result from air photo interpretation compared to our modeling method that computes a score for each pixel.

 

 

Low

Medium

High

 

 

Expert’s Results

 

Model Results

 

 

Figure 5. Comparison between Expert’s map and Model generated map

 

 

Conclusions

 

We have demonstrated the possibility of reproducing an Expert’s knowledge for LS mapping. Using a document created by an Expert we have extracted and statistically characterized the specific knowledge for a better understanding of the conditions characterizing each LS level. These results constitute a first significant step toward the automation of LS mapping where artificial intelligence, remote sensing data and GIS are integrated together to solve a complex geospatial problem. In this specific case, the model that was developed will be used to extrapolate the LS mapping outside the study area where geoecological conditions are equivalent. This will be done using a specialized interface dedicated specifically for this application. The use of this system does not require any specific expertise regarding Geosciences, GIS or Remote Sensing.

 

References used

 

Baeza, C. and J. Corominas 1996. Assessment of shallow landslide susceptibility by means of statistical techniques. Proceedings of the Seventh International Symposium on Landslides, Trondheim, Norway, June 17-21 , vol. 1, p. 147-152.

 

Bruce, J.P. 1993. Natural disasters and global change. IDNR Newsletter no 15, Observatorio Vesuviano, p. 3.

 

Carrara, A. 1983. Multivariate models for landslide hazard evaluation. Mathematical Geology, vol. 15, no 3, p. 403-427.

 

Carrara, A. 1988. Landslide hazard mapping by statistical methods. A black box approach. Workshop on Natural Disasters in European Mediteranean Countries, Perugia, Italy, p. 205-224.

 

Carrara, A., Cardinali, M. and Guzzeti, F. 1992. Uncertainty in assessing landslide hazard and risk. ITC Journal, no 2, p. 172-183.

 

Claure, B., Maldonado, J., Vargas, O. and Valenzuela, C. R. 1994. A conceptual approach to evaluate watershed hazards: the Tunari watershed, Cochabamba, Bolivia. ITC Journal, no 3, p. 283-291.

 

Cruden, D.M. and Varnes D.J. 1995. Landslide types and processes. In Landslides: Investigation and Mitigation. Transportation Research Board, National Academy of Sciences, Special Report 247, Washington D.C., p. 76-90.

 

Ficher, R.A. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics, no 7, 179-188.

 

Greenbaum, D., McDonald, A.J.W. and Marsh, S. H. 1996. Rapid method of landslide hazard mapping. 11th Thematic Conference and Worshops on Applied Geologic Remote Sensing , Las Vegas, Nevada, February 27-29, p. I-287-I-295.

 

Klecka, W.R. 1975. Discriminant Analysis. In Statistical Package for the Social Sciences, Chapter 23, Ed. McGraw - Hill, New York, p. 434-467

 

Leir, M.C., Singhroy, V.H. and Savigny, S.V. 1996. Landslide and lineament mapping with airborne SAR. 11th Thematic Conference on Remote Sensing for Exploration Geology, Las Vegas, February 27-29 , vol. 2, p. 405-414.

 

Leroi, E. 1996. Landslide hazard - risk maps at different scales: Objectives, tools and developments. Proceedings of the Seventh International Symposium on Landslides, Trondheim, Norway, June 17-21 , vol. 1, p. 35-51.

 

Luzi, L. and Pergalani, F. 1996. Applications of statistical and GIS techniques to slope instability zonation (1:50 000 Fabriano geological map sheet). Soil Dynamics and Earthquake Engineering, vol. 15, p. 83-94.

 

Mantovani, F., Soeters, R. and Van Westen, C.J. 1996. Remote sensing techniques for landslide studies and hazard zonation in Europe. Geomorphology, vol. 15, p. 231-225.

 

Mejia-Navarro, M., Wohl, E.E. and Oaks, S.D. 1994. Geological hazards, vulnerability, and risk assessment using GIS: model for Glenwood Springs, Colorado. Geomorphology, vol. 10, p. 331-354.

 

Neuland, H. 1976. A predictive model of landslip. Catena, vol. 3, p. 215-230.

 

Salinas, R. 1995. Capacidad de uso mayor de la tierra y su relaci—n con el uso actual en la Cuenca Taqui–a. PROMIC, Cochabamba, Document interne, 19 p.

 

Soeters, R., Van Westen, C.J. and Rengers, N. 1993. Mountain hazard mapping making use of remote sensing and geographic information systems. 25th International Symposium of Remote Sensing and Global Environmental Change, Graz, Austria, April 4-8 , p. 54-65.

 

Van Westen, C. J. 1993. Application of geographic information systems to landslide hazard zonation. ITC , Publication no 15, 245 p.

 

Varnes, D.J. and Commission on Landslides and other Mass-Movements - IAEG 1984. Landslide hazard zonation: a review of principles and practice. The UNESCO Press, 63p.

 


AUTHORS

Stéphane Péloquin, Vice-Président

INFOTIERRAInc. North Hatley (Québec), Canada.

peloquin@infotierra.com Tel: (819) 864-6027

 

Q. Hugh J. Gwyn, Professor

CARTEL (Centre d’applicationset de recherches en télédétection)

Universitéde Sherbrooke, Sherbrooke (Québec), Canada.

hgwyn@courrier.usherb.ca Tel: (819)821-8000 (2187)