• Keine Ergebnisse gefunden

Per class:

3.12 Model Evaluation

Model evaluation is of paramount importance for checking the predictive power of the habitat suitability map which is composed of pixels, carrying HS values from 0 to 100. The higher the value of the suitability index the more suitable is the habitat for the focal species. Most of the applied evaluation measures in former studies were based on presence- absence models by using the HS threshold of 0.5. Below the threshold unsuitable habitat is assumed while above the threshold the habitat expected to be suitable for the focal species.

Many evaluators are also based on a confusion matrix that counts presence and absence evaluation points (Fielding and Bell, 1997). Presence only model evaluation was applied for this study because absence data was not available for tigers. Presence only models are more difficult to assessing model evaluation than presence-absence models, because standard statistics such as Kappa cannot be applied. The main problem is that half of the confusion matrix is missing and so it is impossible to assess specificity (see Table 16).

Figure 34: An example of a habitat suitability map computed with the ENFA model. The color bar on the right side represents the habitat suitability range (0 to 100); light shading denotes areas more suitable and dark shading denotes less suitable.

Table 16: The confusion matrix used for model predictions against the actual observation.

(a) for presence-absence models and (b) is for presence only models, missing half of the matrix.

Among the evaluation measures for the presence-only models, the continuous Boyce index has become the most accurate for computing the predictive power (Hirzel et al., 2006).

BioMapper software provides this threshold-independent evaluator; a way to relieve the threshold constraint is to partition the HS range into several bins, instead of only two groups. It calculates two frequencies for each class i, such as the predicted frequency (Pi ) and the expected frequency (Ei). The predicted frequency can be calculated by Equation 6 and the expected frequency can be calculated by means of Equation 7.

(eq.6)

Where pi = no. of evaluation points predicted by the model in HS class i ∑pj = The total number of evaluation points

(eq.7)

Where ai = area covered by HS class i,

∑aj = The overall number of cells in the whole study area.

Finally the predicted-to-expected (P/E) ratio Fi for each class can be calculated by Equation no. 8.

(eq.8) A predicted-to expected ratio (Fi) curve can be derived that explains the model quality by

measures such as robustness, HS resolution and deviation from randomness (Fi = 1). If Fi

<1, the model delineates the suitable species areas (Hirzel et al., 2006) and it can be denoted as unsuitable class. On the other hand, high suitability classes posses the value of

(a) Observed 1 Observed 0 Predicted 1 True + False + Predicted 0 False - True -

(b) Observed 1 Observed 0 Predicted 1 True + False + Predicted 0 False - True -

Fi >1. A good model shows a monotonic increase of the Fi curve (see the yellow dotted line in Figure 35).

But, the Boyce Index is sensitive to the number of HS classes and to their boundaries (Boyce et al., 2002). To tackle this issue, a moving window of width (eg. W=10% or 10 on the HS range from 0 to 100) is used to substitute for fixed classes to compute the HS. HS of the first class covers the suitability range (e.g. 0, 10). Over the HS class, the Fi value is plotted as a line (red plotted line in Figure 35) at the average value of the HS class, e.g.

10/2=5. That means if the HS range of class i is 10, then the Fi value will be plotted over the HS value of 5. Then, window is shifted a small amount to the right covering the suitability range (1, 11) and the Fi value is plotted over the HS value of 6. This process continues until the window finally arrives at the end of the possible range (90, 100). By this iterating process, a continuous Boyce index can be computed to form a smooth P/E curve. In this study, according to the recommendation of Hirzel et al. (2006), the window size „20‟ was used to derive the best results.

A cross-validation process can be applied to calculate a confidence interval to address the applied ENFA model performance. It evaluates and compares the algorithms by dividing data into two segments: one is used to train a model algorithm and the other is used to validate the model (Refaeilzadeh et al., 2008). K-fold cross validation is the basic form for the optimal use of small data sets to calibrate and evaluate a model (see Figure 36).

Figure 35: Computing the continuous Boyce index by using a moving window of width 10.

HS of the first class covers the suitability range (e.g. 0, 10). Fi value is plotted as a line (red plotted line) at the average value of the HS class (10/2=5). (Modified from Hirzel, 2006).

Fi

1

0 10 Habitat suitability classes

This study was based on a small dataset of tiger presence points (31). Hence, a cross-validation process was used to split the dataset randomly into “k” equally sized independent partitions. The „k-1‟ partitions were used to calibrate the model and the evaluation was done based on the left-out partition. This process is repeated „k‟ times and different partitions of the data set are moved out each time for validation. The central tendency and variance were assessed by the „k‟ evaluations. Based on the number of species‟ presence points, the number of partitions can be changed between 3 and 10. The shape, variance and confidence interval of curves resulting from the cross-validation process can provide meaningful interpretation. The variance reflects the model robustness whereas the confidence interval represents model sensitivity to calibration points. A constantly increasing linear curve results for a perfect model to give good information on all HS values. Figure 37 demonstrates examples of the best model that exhibits this monotonic increase of the Fi curve and a bad model which Fi curve falls down in high HS areas (Hirzel, 2006).

In this study, for model accuracy assessment, 6 groups of EGVs were categorized as level 1 of ENFA model. Then, EGVs that scored highest level were picked out and the next ENFA models were performed till to get the preliminary and final/best ENFA models. A variety of ENFA models covering all possible combinations of the best EGVs from variable selection with at least 6 EGVs at a time were performed (see Appendix VIII).

Altogether 129 times of combination (eight out of 9, seven out of 9 and six out of 9 Figure 36: Procedure of three-fold/partition cross validation process (k=3); the darker colored data sets are used for calibrating/training while the lighter one is used for validation (Modified from Refaeilzadeh et al., 2008).

Model

without replication) were performed. The model with the highest value of the Boyce index and at the same time good Fi curve characteristics was chosen as the best (final) model.

According to the highest Boyce index value, a 3-fold cross validation (as in Figure 36) was used based on Huberty‟s rule in Biomapper 4. The data set was randomly split into 3 partitions of which 2 were used to calibrate whereas the remaining one was used to evaluate the model. Mean and standard deviation as well as a median and 90% confidence interval were used to assess the central tendency and variance of the model. The evaluation outputs can be found in the results section.