A comparison of indicators reflecting the status of the North Sea benthos . 116

G. Van Hoey, H. L. Rees, and E. Vanden Berghe 5.6.1 Introduction

Interest in the use of summary measures (indicators) to quantify the responses of benthic communities to natural and human-induced changes has increased in recent years, especially to meet the requirements of initiatives such as the United Nations Convention on Biological Diversity (1992), the EU Water Framework Directive (European Communities, 2000), the European Marine Strategy (Commission of the European Communities, 2002), and OSPAR (see Lanters et al., 1999 and ICES, 2004). There exists a wide variety of measures to describe the biological status or changes (Diaz et al., 2004). In their simplest form, the primary variables (typically the total numbers of individuals, species, or biomass) are employed as stand-alone summary measures. Derived univariate measures such as the Hill diversity series (Whittaker, 1972; Hill, 1973; Magurran, 1988) add an additional layer of complexity and typically combine the numerical dominance and species richness components of diversity.

Finally, the multimetric methodologies (below) combine a variety of primary or derived univariate measures in one index or, in a more complex form, employ outputs from multivariate analysis.

In Europe, the development of multimetric methodologies has been stimulated by the requirements of the EU Water Framework Directive (Borja et al., 2007). Many countries have used the AZTI Marine Biotic Index (AMBI) combined with other measures of diversity in their multimetric approach (including the UK, Spain, Portugal, Denmark, and Norway; Borja et al., 2007). Other approaches include a method developed by Swedish scientists, based on Hurlbert’s index (Rosenberg et al., 2004), while the German approach is based on taxonomic spread (MarBIT; Meyer et al., 2006). The Dutch methodology is based on a hierarchical approach incorporating three levels and aims to evaluate not only the benthos, but also the associated habitats and ecosystem functioning (NIOO-method; Ysebaert and Herman, 2004).

The NSB 1986/2000 datasets provide an opportunity to evaluate a range of commonly used measures over wider geographical and timescales than is normally feasible. The findings are intended to complement assessments elsewhere in this report. This section will investigate which of a range of univariate measures (employed singly or in combination as multimetric derivations) are the most suitable for identifying and quantifying any changes to the North Sea benthos between 1986 and 2000. Our aim was not to exhaustively test the relative efficiency of the wide range of measures currently available (we selected a subset; see Material and methods below). Some of our observations, nevertheless, may be useful in prioritizing the use of selected measures according to the objectives of different investigations, including evaluations of quality status under OSPAR/ICES auspices and the developing European Marine Strategy.

5.6.2 Material and methods 5.6.2.1 Data origin

For all analyses, only the NSB data for the nearest matching stations between 1986 and 2000 were used. Thus, after excluding stations separated by a distance of more than 40 km, 156 matching stations were selected. The assemblages employed in this account were based on those identified from the outcome of cluster analysis after fourth-root transformation as described in Section 5.2.

5.6.2.2 Selection of measures for evaluating indicator utility Singular (i.e. unimetric) measures of community structure

The following commonly used summary measures were selected: density (N), species richness (S), the Shannon–Wiener index (H’), Simpson’s index, and Hurlbert’s index (ESn). Density is expressed as individuals per m² (ind. m⁻²), and species richness as the number of species found at a station. The Shannon–Wiener (H’) index (Shannon and Weaver, 1949) considers both species richness and the evenness component of diversity. Simpson’s index is a more explicit measure of the latter, i.e. the proportional numerical dominance of species in a sample (Simpson, 1949). These two indices are sensitive to sample size while, in contrast, Hurlbert’s is less so. This index determines the expected number of species (ES) in a randomly selected subset of individuals, e.g. 50 (hence ES(50)), as used in this section (Hurlbert, 1971).

Combined (multimetric) univariate measures

Only the multimetric approaches m-AMBI (Spain), IQI (UK), DKI (Denmark), NQI (Norway), and BQI (Sweden) will be described and discussed. The methodologies were, at the time of analyses, accepted by these countries as appropriate to meet the needs of the EU Water Framework Directive, though most were still undergoing a process of validation.

The incorporation of the AMBI (AZTI Marine Biotic Index) is common in the multimetric approaches m-AMBI, IQI, DKI, and NQI and, for that reason, we also examined its performance separately. For the AMBI index, benthic species are assigned to five ecological groups ranging from sensitive to highly tolerant of stress. The complementary Biotic Coefficient is calculated according to the percentage of each ecological group within a sample.

Further details are described in Borja et al. (2000). The other measures employed within the multimetric approaches can differ depending on the formulations (below).

The Spanish methodology (m-AMBI) combines AMBI, species richness, and the Shannon–

Wiener index in a multimetric approach based on factor analysis (FA), which was used to determine the Ecological Quality Ratio (EQR) for each of the “typologies”, with their corresponding references (Borja et al., 2004; Bald et al., 2005; Muxika et al., 2007). The outcome of factor analysis was applied to the results of each station and in each sampling period, and virtual reference stations for “high” and “bad” ecological status were considered in the analysis. Data for analysis were log(1+x)-transformed and standardized by subtracting the mean and dividing by the standard deviation in order to achieve a normal distribution of the data. The FA solution was rotated (using the Varimax rotation method) in order to simplify the interpretation of the results, and the scores of the first three factors were extracted. After obtaining the sampling stations’ relative positions (extracted FA scores), the projection of each sampling station in the axis connecting both reference stations was calculated in the new three-dimensional space created by the FA. The Euclidian distance of each projection to the virtual station possessing a “bad” status was measured.

The UK’s IQI (Infaunal Quality index) used AMBI, Simpson’s Index (as a measure of evenness in the apportioning of individuals among the species), and the number of taxa as parameters. The individual measures have been weighted and combined within the multimetric, in order to best describe the changes in the benthic invertebrate community caused by anthropogenic pressure (A. Miles, UK Environment Agency, pers. comm.).

IQI = (((0.38 × AMBI^IQI) + (0.08 × (1-λ’)^IQI) + (0.54 × S^IQI 0.1)) − 0.4) ÷ 0.6

where AMBI^IQI is (1 − (AMBI BC ÷ 7)) ÷ (1 − (AMBI BC ÷ 7))^MAX; (1-λ’)^IQI is (1 − λ’) ÷ (1 − λ’^MAX); and S^IQI is S ÷ S^MAX. Each metric is normalized to a maximum value expected for that metric. ^MAX parameters relate to the reference condition for that metric for a specific habitat which, in this exercise, is defined as the maximum value observed in the dataset for each benthic assemblage.

The Danish methodology (DKI) used a combination of the Shannon–Wiener index, AMBI, the number of species, and the number of individuals.

DKI = (((1 − (AMBI ÷ 7)) + (H ÷ HMAX)) ÷ 2 × ((1 − (1 ÷ S)) + (1 − (1 ÷ N))) ÷ 2) where H is the Shannon–Wiener index with log base 2; HMAX is the reference value that H can reach in undisturbed conditions; N is the number of individuals; and S is the number of species. The factors N and S only have significant effect when the number of individuals and species are <10.

The Norwegian methodology (NQI) includes AMBI, the number of individuals, the Shannon–

Wiener index, and the diversity index SN (combination of number of species and individuals) in its multimetric approach.

NQI = 0.5 × (1 − AMBI63 ÷ 7) + 0.5 × (SN63 ÷ 2.7) × (N ÷ (N + 5))

where SN = ln(S) ÷ ln(ln(N)): S number of species and N number of individuals.

In the BQI (Benthic Quality index) of Rosenberg et al. (2004), the Hurlbert index was used to categorize benthic species according to their sensitivity to disturbance. They assumed that tolerant species are mainly found in disturbed environments and so mainly occur at stations with low ES(50), whereas sensitive species mainly occur at stations with high ES(50). Based on this conclusion, a species tolerance level (ES(50)0.05) was calculated, which reveals the minimum ES(50) value for 5% of each macrofauna population. On this basis, the BQI was proposed:

BQI = (Σⁿi = 1((Ai ÷ totA) × ES500.05i)) × log(S+1)

where A is the mean relative abundance of species i, and S is the number of species at the station.

Analysis

The changes observed in the selected measures between 1986 and 2000 for the entire North Sea and for the major assemblages were visualized using box plots (without outliers). The Kruskal–Wallis test was used for testing significant differences between the groups, and the Mann–Whitney U-test was used for pairwise testing between the groups.

5.6.3 Results 5.6.3.1 Overall values

There were no spectacular differences in the univariate measures between 1986 and 2000 (Figure 5.6.1). The average of the different diversity indices was slightly lower in 2000 compared with 1986, whereas the density was higher. However, only for ES(50) and density was the overall difference significant (Kruskal–Wallis; p <0.01). The Biotic Coefficient (BC(AMBI)) showed a slight increase (not significant), but not a shift of status.

1986 2000

Figure 5.6.1. Box plots comparing different indices between the overall (i.e. North Sea-wide) data of 1986 and 2000.

The comparison between 1986 and 2000 in the multimetric methods demonstrates that all methodologies show a slight decrease of the average value in 2000, but only NQI and BQI give significant differences (Kruskal–Wallis, p <0.05 and p <0.01, respectively; Figure 5.6.2).

Calculations at the level of the entire North Sea may be too robust, i.e. insensitive to changes occurring on smaller spatial scales. Therefore, it is appropriate to examine for any changes at the benthic assemblage level (see Section 5.2).

1986 2000

Figure 5.6.2. Box plots comparing different multimetric indices between the overall (i.e. North Sea-wide) data of 1986 and 2000.

5.6.3.2 Benthic assemblages Density

In most assemblages, the average density was higer in 2000 compared with 1986 (Figure 5.6.3). In the assemblages of the central North Sea and Southern Bight, the density increased (Dogger Bank (D12), Oyster Ground (D2), Southern Bight (F1)), decreased (D13), or remained stable (Southern Bight (F2)). In the northern North Sea at a depth of >100 m, the density decreased or increased, depending on the assemblage type (A, E1, E2, and G). In the assemblages off the English east coast (B, C, E3, and E4), the density increased, as did the assemblage along the German–Danish coast (D11). In the rest of the North Sea, an increase in density was observed between 1986 and 2000. However, the differences were only significant for assemblages B, E2, and G (p <0.05).

Figure 5.6.3. Box plots comparing density for 1986 (brown) and 2000 (green) across assemblages identified in Section 5.2.

ES(50)

In the Northern assemblages (A, E1, and G), an increase in expected number of species was observed (Figure 5.6.4). The other assemblages elsewhere in the North Sea were characterized by a decrease in expected number of species. Only the assemblage at the Oyster Ground (D2) showed no obvious changes in expected number of species. Overall, only the assemblages C, E4 (English east coast), and D12 (Dogger Bank) showed a significant decrease in expected number of species (p <0.05).

Other measures of diversity (S, Shannon, Simpson)

Box plots for these measures are not shown, but the trends are summarized in Table 5.6.1. The averages were higher in 2000 compared with 1986 for the northern North Sea assemblages (significantly so for A and E1; for assemblage G, only the number of species differed significantly). This trend (though not significant) was also observed for the assemblages situated between 50–100 m depth (B, C, E2, E3, and E4). All the other assemblages showed a similar average value or a decrease for the different diversity measures.

A B C D11 D12 D13 D2 E1 E2 E3 E4 F1 F2 G

Figure 5.6.4. Box plots comparing ES(50) for 1986 (brown) and 2000 (green) across assemblage types identified in Section 5.2.

Biotic Coefficient (BC(AMBI))

In Figure 5.6.5, the different BC values for each assemblage are shown, and a scattered pattern of increases (more polluted) and decreases (less polluted) is observed, of which none is significant. There was no switch in status for any of the assemblages, and all could be classified as having a slightly polluted status (average 1.2 <BC >3.3) according to Borja et al.

(2000).

The biotic coefficient increased in the northern North Sea assemblages (A, E1, E2, and G) similar to the assemblages of the >50 m depth line (E3 and E4). The assemblages of the Oyster Ground (D2) and the Dogger Bank (D12) showed increases in the BC, while the southern North Sea assemblages F1 and F2 showed a slight decrease.

0,0

Figure 5.6.5. Box plots comparing the AMBI Biotic Coefficient for 1986 (brown) and 2000 (green) across assemblage types identified in Section 5.2.

m-AMBI, IQI, NQI, DKI

These multimetric indices are described together because most employ the same measures, and they are strongly correlated with each other (Borja et al., 2007).

A comparison of the different multimetric indices between 1986 and 2000 highlighted mostly a decrease of the ecological quality (EcoQ) score in the central and southern North Sea, whereas the northern North Sea assemblages showed an increase in the EcoQ score (Figure 5.6.6). The NQI showed less variability compared with the others. Some of the observed changes in the assemblages between 1986 and 2000 were significant. The decrease in the Southern Bight assemblage F2 was significant for all indices except the IQI, which was the same for the Northern assemblages (A and E1). The decrease in the Dogger Bank assemblage D12 was only significant for the NQI and IQI. The increase in the English east coast assemblage B was significant for the DKI and m-AMBI. The m-AMBI also showed significant differences for assemblages D11 (decrease) and G (increase).

The status of the different assemblages according to the Danish methodology was high (boundary between high/good = 0.72), whereas according to the m-AMBI, the status was good for all assemblages (0.55 < good <0.85). According to the IQI classification, the status of most assemblages was high (high/good = 0.8). The EcoQ values calculated with the NQI revealed that the status of the assemblages fluctuated around the high/good boundary (0.75). A difference in status between high and good between 1986 and 2000 was only observed for some assemblages with the NQI.

0,5

Figure 5.6.6. Box plots comparing m-AMBI, IQI, NQI, and DKI for 1986 (brown) and 2000 (green) across assemblage types identified in Section 5.2.

BQI

The Swedish index, which differs in formulation from the AMBI-related indices, showed a contrasting trend for some assemblages. In the Northern assemblages A and G and in the

English east coast assemblage B, the average BQI value decreased (Figure 5.6.7). The differences were significant for assemblages B, D11, D12, D2, and F2.

A B C D11 D12 D13 D2 E1 E2 E3 E4 F1 F2 G

Figure 5.6.7. Box plots comparing the BQI of 1986 (brown) and 2000 (green) across the assemblage types identified in Section 5.2.

5.6.4 Discussion

The evaluation of changes between 1986 and 2000, based on a variety of indicator measures, shows that several factors can influence the observed patterns. For example, sampling effort, taxonomic precision, and variable distances between matched (1986/2000) stations could confound the identification of trends in quality status. We conclude that the observed changes in the northern North Sea were mainly caused by improved taxonomic sufficiency in 2000.

For 1986, the available data for each station were pooled across variable numbers of replicates and/or variable surface areas sampled depending on the device used (see Section 3 and Künitzer et al., 1992). Similar considerations with the potential to affect the interpretation of indicator values apply to the 2000 survey. For example, a 0.07 m² corer was used in Dutch waters, while a 0.25 m² corer was used by FRS (Scotland) for sampling in parts of the northern North Sea and at some offshore stations extending south to the German Bight.

Differences in surface area (if not in efficiencies of the sampling gear) may be compensated for by pooling replicates and, for the 2000 dataset, a degree of parity was achieved in this way.

However, it should be noted that the majority of the stations sampled in the southern North Sea using a large corer did not contribute to the reduced dataset in 1986 and 2000, as closer matches were found with those sampled by grab.

Despite the potentially confounding influences of sampling and analytical differences between 1986 and 2000, we consider that the present evaluation of selected indicator measures served a useful purpose, especially in light of the need by the developing European Marine Strategy for information on the relative utility of indicators of status and change, along with reference settings for the North Sea benthos, i.e. on the scale of whole sea areas. In this respect, the data from the North Sea Benthos surveys of 1986 and 2000 provide a valuable resource for testing.

Table 5.6.1. Summary of statistical comparisons between selected measures (1986 and 2000). Key:

+ indicates an increase in value; = indicates equal; – indicates a decrease in value. Significant differences (Mann–Whitney U-test) appear in parentheses. An increase in the value of the biotic coefficient (BC(AMBI)) signifies a decrease in status. This is in contrast to the other multimetric methodologies, where an increase in values signifies an increase in status.

UNIMETRIC MULTIMETRIC

On a “global” scale (i.e. the whole North Sea), the different measures gave no clear pattern (only the increase of density and decrease of ES(50) and decreases of the multimetric indices NQI and BQI were significantly different), but it is recognized that such an approach may mask considerable regional or local variability. Therefore, in order to facilitate smaller scale evaluation, the benthic assemblages defined in Section 5.2 were employed and, at this level, trends were observed for some assemblages, depending on the measures employed (Table 5.6.1).

The Northern assemblages (A, E1, and G) showed mainly an increase in diversity. This was reflected in a variety of measures, including most of the multimetric indices. According to the classification schemes associated with the latter indices, this implies an improvement in quality status. However, we consider that this finding may be explained largely by sampling and analytical factors (see above).

The assemblages located between the 50 m and 100 m depth contours showed a variable pattern. For example, assemblage E2 in the north-central North Sea showed decreases in density, ES(50), and number of species and an increase in the Shannon–Wiener and Simpson diversity indices. This resulted in a decrease in the multimetric indices (except DKI).

Assemblages B and C along the English east coast generally showed a marginally enhanced quality status according to values of the multimetric indices, which could be linked to increases of the diversity indices, except ES(50). (Note that the large FRS core samples did not contribute to the reduced 2000 data for the English east coast because all stations were exactly matched with 1986 stations and sampled by 0.1 m² Day or Hamon grabs). In the central North Sea (E3 and E4), the univariate indices gave a scattered pattern, whereas the multimetric indices mainly showed a decrease, although none was significant.

Assemblage D12 at the Dogger Bank showed a decrease in most measures, except density.

Three of the five multimetric indices showed a significant decrease in values. This is consistent with other observations on changes at the Dogger Bank, which could be linked to changes in the hydroclimate under the influence of the North Atlantic Oscillation, resulting in enhanced current velocities, a limitation of organic input, and a more sandy environment in the vicinity (see Section 5.2 for more details). However, the magnitude of the observed

decreases in values was relatively small and did not signify a change of the ecological status according to the classification systems employed. These factors may also explain the observed changes in indices for assemblage D2 (Oyster Ground) and D13 (spatially scattered in the central North Sea), where a decrease in values was detected, although these were not significant.

Changes to the status of assemblages of the Southern Bight of the North Sea (F1 and F2) and the eastern North Sea (German–Danish coast, D11) were similarly linked to NAO-induced changes affecting the diversity and species composition in those areas (see Section 5.2 for more details). The multimetric values decreased, which might suggest a decline in the ecological quality of those areas, especially for assemblage F2 (four of five indices were significantly lower in 2000 than in 1986). However, the decreases were again relatively small and insufficient to indicate a switch in ecological status according to the classification systems employed.

We conclude that no spectacular differences in the status of the North Sea benthos were observed between 1986 and 2000, especially when evaluated on a wide geographical scale.

However, differences on smaller scales (i.e. within certain assemblages) and of relatively small magnitude were detected between 1986 and 2000. Most of the existing ecological evaluation methodologies give an indication of changes, but they do not always react in the same way. The four multimetric approaches based on AMBI showed mostly the same pattern and were also strongly correlated (R >0.732) with each other (Table 5.6.2). This confirms the findings of the intercalibration exercise in Borja et al. (2000). The BQI, which is based on a different approach of evaluating the sensitivity of species, was also significantly correlated, but with a much lower R value (0.233–0.406).

Table 5.6.2. Spearman rank correlation values between the different indices. The highlighted values were not significant (p >0.05).

IQI DKI NQI M-AMBI BQI BC HI SIMP ES(50)

DKI 0.835

NQI 0.860 0.787

m-AMBI 0.817 0.821 0.732

BQI 0.406 0.233 0.402 0.391 BC −0.179 −0.140 −0.172 0.099 0.078

hi 0.346 0.361 0.377 0.532 0.228 0.156 Simp 0.284 0.345 0.308 0.447 0.187 0.078 0.939 ES(50) 0.184 0.041 0.182 0.155 0.597 0.052 0.497 0.484 S 0.389 0.281 0.429 0.571 0.242 0.229 0.694 0.482 0.315

These multimetric indices were based on different combinations of univariate measures, and it is to be expected that they would be significantly correlated with these when tested. This is true for most indices (Table 5.6.2), but some curiosities require explanation. For instance, the m-AMBI showed no correlation with the Biotic Coefficient (BC(AMBI)), but was relatively strongly correlated with the Shannon–Wiener index and the number of species. This means that, for this dataset, the m-AMBI evaluation is mainly weighted by the changes in the diversity indices. Also, concerning the other multimetric indices, the correlation was strongest

Im Dokument S TRUCTURE AND DYNAMICS OF THE N ORTH S EA BENTHOS (Seite 122-134)