• Keine Ergebnisse gefunden

2 Methodical Background

2.5 Multi-Source Data Integration and GIS in Vegetation Mapping

Advances in space and computer technology during the last decades and the increased availability of digital geographic information about the Earth’s surface entail that in more and more cases, digital land cover classification is based not just on multispectral data from one sensor, but on data sets including data from different sources. The integration of complementary data in order to enhance the information which is extracted from a data set can involve different combinations of data and different integration methods. Possible data combinations include the integration of different kinds of remote sensing data (multispectral, panchromatic, textural), from one or multiple sensors, or the integration of remote sensing data with ancillary geographical data like digital elevation model (DEM) data with the help of GIS (geographical information system) techniques.

Most authors integrate multispectral and textural data at the pixel level by including the texture features as additional channels in the data set (e.g. Treitz & Howarth 2000b). Various methods are

used for the integration of multi-sensor remotely sensed data. They can be fused at the pixel level (e.g. sharpening a multispectral image with a higher resolution panchromatic image), at the feature or object level (e.g. after segmentation) and at the decision level (after first classifying them separately) (Pohl & van Genderen 1998). Especially the data integration at the pixel level requires that the data are precisely co-registered to each other. Scarth et al. (2001) use local variance derived from high resolution video data to optimise a geometric-optical model applied to Landsat TM data for the estimation of forest structure parameters. Kiema (2002) combines SPOT panchromatic, SPOT-derived texture data and multispectral Landsat TM data by simply adding channels, having resampled the Landsat data to 10 m pixel size. Amarsaikhan & Douglas (2004) integrate SPOT XS multispectral optical and SAR data creating new, fused image channels through band ratioing, intensity-hue-saturation transformation, among other techniques. For the integration of remotely sensed data, pixel-based methods like the additional channel concept and the channel fusion for enhancing the resolution of a multispectral image are most common.

Apart from the integration of remotely sensed data and data layers that are directly derived from them (e.g. texture parameters or vegetation indices) there are also other sources of geographic information which can be included in the classification process in the form of ancillary data.

Ancillary data can be defined as non-remotely-sensed data, collected independently and used to assist in the analysis or classification of remotely sensed data (Campbell 1996). They may consist of digital elevation data or digitised soil and geological maps, among others. Not all types of ancillary data that may be available for an area are compatible with the remotely sensed data. The additional information should only be used if it is significant for the distribution of the classes which are to be mapped. It must be possible to integrate all the data in a GIS setting, so the ancillary data must be in digital form and in the same reference system, as well as accurately registered to the other data layers. They should also be compatible with respect to scale (level of detail). Some ancillary data derived from thematic maps contain discrete classes, contrary to the continuous nature of the remotely sensed data. In addition, the map derived information may not be metric (quantitative) at all but nominal / qualitative (e.g. soil types). These differences in data types limit the methodical possibilities of data integration.

Topographic (DEM derived) data have been used to improve vegetation mapping efforts based on satellite imagery from the early times of satellite remote sensing (e.g. Strahler et al. 1978). The topographic variables elevation, slope and aspect can be seen as surrogates for climatic data such as temperature and moisture conditions (Woodcock et al. 2002). The natural vegetation of an area is directly influenced by the local moisture regime, temperature regime, nutrient availability and solar radiation. These environmental gradients are in turn determined by climate, topography and geology (Franklin 1995). While climatic variables are most important for determining the vegetation patterns of large areas, at landscape scales, the topography modifies the macroclimatic variables.

The elevation has an effect on temperature and precipitation and can thus correspond to a vertical

zonation of vegetation. Aspect and slope determine the local insolation regime and slope is also related to hydrology and thus soil moisture. Other topographic variables which can be derived from DEMs and which are related to vegetation patterns include the slope curvature and the specific catchment area (Florinsky 1998, Franklin 1995). It is possible to model the potential vegetation using topographic variables (Felicísimo et al. 2002) and / or other ancillary data, and then to use remotely sensed data only to differentiate between natural vegetation (e.g. forest) on the one hand and disturbed areas or land use types on the other hand. For the purpose of improved vegetation mapping however, it is only practical to use such GIS methods for stratifying the study area into areas of potential vegetation if sufficiently accurate and detailed maps of the controlling environmental variables (or their surrogates) are available or if these variables are easier to map than the vegetation itself (Franklin 1995). Modelling vegetation distributions with GIS also requires extensive knowledge about the relationships between the known environmental variables and the local vegetation types.

There are three stages at which ancillary environmental data can be incorporated with the satellite imagery for vegetation classification (Hutchinson 1982):

- before classification for a stratification of the study area;

- during classification (inclusion as additional variables in a classification or for modifying prior probabilities in a maximum likelihood classification);

- after classification for post-classification sorting.

Stratification divides the study area into smaller areas or strata. Its first aim is to reduce the number of potential classes in any sub-area, making it possible to assign different classes to spectrally similar units depending on which stratum they belong to. The second aim is to reduce the variation within strata, considering that the spectral characteristics of one informational class tend to vary over distance, for example because of varying atmospheric conditions or different substrate (Hutchinson 1982). This is a deterministic approach requiring the creation of strata masks using the ancillary data (Florinsky 1998). The approach is most promising for large study areas. Helmer et al.

(2002) stratified the area of Costa Rica into 10 sub-areas based on geoclimatic and geologic maps, before mapping vegetation formations using Landsat TM data. This approach helped in the mapping of forest formations, but the abrupt boundaries between strata also resulted in some misclassifications where the real transition between vegetation formations was more gradual and patchy than predicted by the strata boundaries. Stratification has to be conducted with care, because incorrect stratification criteria will prevent correct classification later on. The separate classification of the strata requires an adequate number of training samples for every class in every sub-area.

Inconsistent training sets for classes in neighbouring strata may result in class boundary offsets in the final merged map (Hutchinson 1982).

When including ancillary data in the classification itself, the most straightforward approach is to use the ancillary data as additional variables in a multivariate classification. The new data are simply combined with the remotely sensed data as additional channels. This technique has been called the

‘logical channel’ or ‘stacked vector’ approach (e.g. Benediktsson et al. 1990). Dymond & Johnson (2002) use this probabilistic approach for the combination of modelled biophysical data and Landsat TM spectral data, to improve the classification of Canadian mountain forest. Treitz &

Howarth (2000b) use terrain variables successfully as additional channels in a linear discriminant analysis. In contrast, Arora & Mathur (2001) find that the inclusion of slope and aspect as additional channels in a maximum likelihood classification (MLC) did not improve the average accuracy in their land cover classification. The additional channel method can introduce new problems in the classification (Florinsky 1998), because of different data types and the difficulty of obtaining sufficient samples to represent the range of all variables for each class. There is the danger that the classification accuracy is decreased for land cover classes which are independent of the additional variables used. These classes may thus need to be treated separately during the classification (Strahler et al. 1978). Another way to include ancillary information during the classification is to modify the prior probabilities in a maximum likelihood classification depending on the ancillary data (Strahler 1980), but this approach has not been widely adopted. Liu et al.

(2002b) integrate satellite and ancillary data by assigning conditional probabilities to all data layers in a rule-based expert system classifier using Bayesian probability reasoning.

Both the stacked-vector approach and the prior probabilities method require very intensive sampling to adequately characterise the relationship between the classes and both the spectral and the ancillary data (Hutchinson 1982). Sampling costs can be the limiting factor in studies of tropical mountain forests with difficult access, which may rule out the use of these methods in some cases.

An alternative to these probabilistic approaches are ‘artificial neural network’ (machine learning) approaches (see chapter 2.6). Using ancillary data as additional input for neural network classifications has the advantage that these systems are non-parametric so that input data can be of any type (Florinsky 1998). This can be a more suitable method for the inclusion of terrain parameters in a classification than the stacked vector approach in MLC (Arora & Mathur 2001), at least if the appropriate neural network models are used and if the training samples are representative (Benediktsson et al. 1990).

After classification, ancillary variables can be used to divide a spectral class into different informational classes according to e.g. elevation or slope (Hutchinson 1982). The sorting technique is derived from GIS overlay analysis. Post-classification sorting, like stratification, is a deterministic approach and requires an understanding of the relationship between the classes of interest and the ancillary data for the formulation of sorting rules. The method can be used to resolve confusion between spectral classes using elevation data (Liu et al. 2002a, Colby & Keating

1998), other topographic variables (Vogelmann et al. 1998, Hutchinson 1982) or other GIS layers providing information about soil types, the location of water bodies, or other features (Sader et al.

1995). Post-classification sorting is a simple, inexpensive way to make use of ancillary data and expert knowledge. It has the additional advantage that, being applied after the classification, it can be confined to deal with ‘problem classes’ only, and errors made in the construction of sorting rules can be easily undone.

Some authors integrate ancillary data at several stages: Ma et al. (2001) use topographic data both as additional channels during the classification and for post-classification sorting. Ehlers et al.

(2003) use a digital surface model for a stratification of their image, as part of a stacked vector during classification and also during post-processing.