Image Segmentation - Methodical Background

2 Methodical Background

2.4 Image Segmentation

Image segmentation aims at dividing an image into a number of spatially continuous, non-overlapping regions which are homogeneous with respect to some characteristic or characteristics (Pal & Pal 1993). Image segmentation followed by per-segment (per-object) classification instead of per-pixel classification is one way to reduce within-class variability and noise in the classification of textured high resolution imagery. Using the statistics of segments (groups of pixels) instead of single pixels can increase class separabilities and classification accuracy (Lobo 1997). In contrast to block averaging, the aggregated groups of pixels which are the result of segmentation processes are intended to be meaningful image units, representing image objects or parts of objects (‘object primitives’). This allows the use of object oriented image processing methods, analysing image units which carry more semantic information than single pixels. This method also avoids the reduction of the spatial resolution entailed by block averaging and makes use of the spatial information which is present in the image.

Sometimes the borders between objects are known beforehand, for example in the form of vector data depicting the boundaries of agricultural fields or forest stands in a GIS database, making it possible to use per-field classification techniques on that basis (Aplin et al. 1999, Kayitakire et al.

2002). As long as such vector data is not available, image segmentation is the prerequisite for object oriented image processing.

Image segmentation techniques have long been applied in the field of computer vision and pattern recognition, as segmentation is the foundation of low level vision. Haralick & Shapiro (1985) and Pal & Pal (1993) present reviews of segmentation methods. However, not all of the methods which are developed for the segmentation of e.g. grey tone images or images used in the field of medicine are also suitable for segmenting multi-spectral satellite images of forested areas.

In order to group contiguous pixels in a meaningful way, some kind of homogeneity criterion has to be found which links these pixels, while excluding pixels from neighbouring regions representing different objects of interest. For the purpose of forest type classification, objects of interest are regions of a common forest type covering all the elements of this forest type, so the desired segments may be quite heterogeneous with respect to the spectral properties of the included pixels (Pekkarinen 2002). This makes it more difficult to join pixels to meaningful units and to find the borders between different vegetation types.

The techniques which are used to segment remotely sensed images can be grouped into pixel-, edge- and region-based methods and their combinations.

Pixel-based segmentation methods start by grouping the pixels using thresholding or clustering in the feature space (see also chapter 2.6). This can be regarded as unsupervised classification rather than segmentation, because spatially unconnected areas can end up belonging to the same cluster.

Segments are then defined as connected components of the same cluster (Haralick & Shapiro 1985).

Such methods, based on the image histogram or multiple-band feature space are not suitable for the segmentation of noisy or textured images (Pal & Pal 1993).

Edge-based segmentation methods employ edge detection algorithms to find the edges of the different regions of an image. Edge operators such as the Sobel filter are used to enhance edges (points of abrupt changes in grey values). Thresholding is employed to determine the edge pixels in the enhanced image and finally, these edge pixels are linked to contours. If these contours surround a region, a segment is created, but there is the problem that there are often gaps in the edges so that the segmentation cannot be completed with this method alone (Pekkarinen 2002, Haralick &

Shapiro 1985).

The majority of authors who seek to segment remotely sensed images use region-based methods.

These methods involve region growing, region merging, region splitting or combinations thereof.

Pure region splitting, where the image is split into successively smaller segments (e.g. by quartering) as long as the resulting segments are not homogeneous enough, are usually not used in practice because of the squarish, artificial looking results (Haralick & Shapiro 1985). Region growing involves the identification of starting points (seed pixels) from which regions are grown in an agglomerative manner by joining similar neighbouring pixels to them. In the case of centroid linkage region growing, each time a new pixel joins a region, the average value of the growing

region is updated. If pixels are rejected because they are not similar enough, they form starting points for new regions. St-Onge & Cavayas (1997) use this method for the segmentation of simulated and real high resolution forest images, based on the similarity of a multidimensional value composed of three texture parameters. They come to the conclusion that precise segmentation is easier for closed than for open forest stands. Hill (1999) uses a combination of edge detection and region growing to segment separate bands of a Landsat TM image with the aim of reducing the spectral overlap between tropical forest types.

Region merging consists of merging adjacent regions if they conform to a homogeneity criterion.

Several techniques involving region merging have been developed to deal with textured data. Lobo (1997) starts by repeatedly running an edge preserving smoothing algorithm over the first principal component of his multispectral image, creating small homogeneous regions. Then he employs an iterative region merging technique to merge adjacent regions if they are the most similar neighbours to each other (local mutual best fitting criterion) and if their distance in the multispectral space does not exceed a user-defined threshold. Pekkarinen (2002) suggests a two-phase approach for the segmentation of a high resolution image of a forested landscape. The method consists of a low-level segmentation on a per-pixel basis (clustering and labelling connected components) and a second step using a region merging technique. However, using the derived segments did not significantly improve the results of his tree volume estimations. Fransson et al. (1999) perform a segmentation on a SPOT optical satellite image and transfer the resulting segment boundaries to noisy SAR data in order to be able to extract per-segment information from both optical and radar data. Beaulieu (2004) describes the development of a region merging technique for SAR data with speckle noise which includes a shape criterion favouring merges which produce compact segments.

A similar shape criterion is also part of heterogeneity criterion used to decide which regions to merge in the segmentation technique which is embedded in the ‘eCognition’ software. This segmentation technique starts with single pixels as the basic regions or image object primitives, and merges neighbouring object primitives in subsequent steps. In order to let regions grow simultaneously in the whole image and to end up with comparable image object primitives of similar sizes, subsequent merges are placed as far as possible from each other in the image, and each region is treated once per cycle, looking for the pair with the ‘local mutual best fitting’ in the vicinity of each object. Two adjacent regions are thus merged if they fulfil the local mutual best fitting condition and if their degree of fitting (defined as the change in heterogeneity caused by the proposed merge, weighted by the region size) does not exceed a user defined threshold. The user can set a so called ‘scale parameter’ which defines the threshold and thus influences the size of the resulting image object primitives. It is also possible to construct a hierarchical network of image objects or object primitives at different scales, from the pixel level to large objects (multiresolution segmentation). The degree of fitting, which is calculated for every possible merge, is a combination of the influence this merge would have on the heterogeneity in the feature space (‘spectral

heterogeneity’) and a criterion describing the change in spatial compactness (‘shape heterogeneity’).

The heterogeneity values are weighted by the size of the objects, so that smaller regions are favoured for merges. The heterogeneity in feature space is calculated, for a user-defined number of image channels or layers of the source data, as the sum of the standard deviations of the values in each channel, weighted by the channel weights (Baatz & Schäpe 2000, Baatz et al. 2002).

In recent years, this segmentation technique offered by the eCognition software has been tested in a number of studies (e.g. Tufte 2003, Mitri & Gitas 2004). Due to its ability to produce meaningful-looking segments even in textured high resolution data, the software was used for the segmentation of IKONOS data by Giada et al. (2003) to gain information about tents in a refugee camp, by Meinel et al. (2001) for an urban land use classification, by Hay et al. (2003) to extract image-objects from an IKONOS sub-image of a landscape with a mix of forestry and agriculture and by Wang et al. (2004c) for mangrove mapping. Other forest applications are described by Herrera et al.

(2004), who segment scanned aerial photographs with 3 m spatial resolution, and by de Kok et al.

(2000). Koch et al. (2002, 2003) use eCognition for a segmentation based forest classification using a combination of Landsat ETM+ and IRS pan (5.6 m resolution). They conclude that this is a suitable method for the classification of forest types but that some information is lost due to the generalisation entailed by the treatment of larger objects as a whole. Wang et al. (2004c) also point out that, while segmentation and object-based processing reduce within-class variabilities and increase the spectral separability of the classes, there is the risk of incorporating pixels from different classes into one object primitive, resulting in misclassifications. This ‘mixed object’ effect could not be avoided in their attempts to separate different mangrove types.

Possible per-segment (object-oriented) classification methods are discussed in chapter 2.6.

Im Dokument Exploiting the Spatial Information in High Resolution Satellite Data and Utilising Multi-Source Data for Tropical Mountain Forest and Land Cover Mapping (Seite 39-42)