• Keine Ergebnisse gefunden

Mandallaz, D. (1995). Geostatistical Methods for Double Sampling Schemes. In M. Köhl, P. Bachmann, P. Brassel, & G. Preto (Eds.), The Monte Verità Conference on Forest Survey Designs. «Simplicity versus Efficiency» and Assessment of Non-Timber Resour

N/A
N/A
Protected

Academic year: 2022

Aktie "Mandallaz, D. (1995). Geostatistical Methods for Double Sampling Schemes. In M. Köhl, P. Bachmann, P. Brassel, & G. Preto (Eds.), The Monte Verità Conference on Forest Survey Designs. «Simplicity versus Efficiency» and Assessment of Non-Timber Resour"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

4.2 Geostatistical Methods for Double Sampling Schemes Daniel Mandallaz

4.2.1 Introduction

This paper presents a simple and efficient geostatistical estimation technique for double sampling schemes. An important example is combined forest inventory.in which terrestrial plot data is combined with auxiliary information provided by remote sensing.thematic maps.previous inventories etc. The technique can also be used for the estimation of ratios, which is often a relevant quantity in non-timber assessment (e.g. percentage of damaged trees). The statistically interested reader unfamiliar with geostatistical concepts and techniques can find an excellent exposition in (Cressie, 1991). Proofs, further developments and details pertaining to the present work can be found in (Mandallaz, 1993).

4.2.2 Formalism

The first difficulty when applying geostatistics to forest inventory is to clearly define the underlying regionalized variables. Let us consider a forest area V c R2 (i.e a subdomain of the plane ) with N trees whose centres are located at the points u1 e R2• Let Y; be the variable of interest (e.g timber volume, basal area, 0 -1 indicator variables etc.) for the i-th tree, assumed to be error free. The quantities to be estimated are of the form,

- 1

Yv = --o A(Vo) uiEVo'

IY.

(2.1)

or ratios thereof, for some arbitrary domain V0 c V ; 1.(V0) denoting the surface area. It turns out that most inventory schemes can be described in terms of the regionalized variable defined by:

(2.2) z(x) =

I,

Y;<p(x -ul' d1)

u1eV

where d1 e D is usually the diameter at breast height (Mandallaz, 1991). The bivariate function

<p(u, d) is positive and must satisfy the condition:

f

<p(u, d)du = 1 'vd e D

(2)

In practice ci,(

u.

d) :I:- 0 only when

u.

lies _in a bounded set depending on d. Therefore. the sum defining z(x) only extends over trees in a neighborhood of

x.

For instance :

if

lul � r

then

else q,(u. d) = 0 yields the simple circular plot technique, whereas

else cj>(u,d) = 0

gives the angle count (Bitterlich) technique with angle a.. The main point is that. up to boundary effects (Mandallaz. 1991):

I,Y, = J z(x)dx

u1eV0 V0

(2.3)

Thus. the summation over a finite population of trees is equivalent to the integration of a regionalized variable z(x) (in the sense of Matheron, 1970: essentially the only condition is that the integral of z(x) must be a physically meaningfull and additive quantity) over a domain of the plane.

which is precisely what geostatistics is about. In the classical design-based approach. z(x) is the realization of a random variable because one draws the point

x

according to some random mechanism.

In the geostatistical approach, z(x) is viewed as the realization of a stochastic process Z(x) at the point X.

In many non-timber assessment problems the regionalized variable z(x) is given by the problem at hand and must not be constructed according to (2.2). The second-phase , e.g. the terrestrial inventory, yields

the

value z(x) for a finite number of points in the second-phase sample

x, x

e s2• The auxiliary information must be correlated with z(x) and is available at points in the first phase sample x e s1 ; in the following discussion only standard double sampling schemes, for which s2 c s1 ,will be considered. It is assumed that the auxiliary information can be adequately described by a p-dimen­

sional vector A(x) e RP. In practice, most components of A(x) are 0-1 variables coding purely qualitative information. lbis information is transformed, by means of a prediction model.into a quantitative regionalized variable z(x) directly related to z(x) (e.g z(x) = A(x)� for a linear mcx:lel), so that. by definition, the following decomposition holds:

(2.4) z(x) = z(x)

+

r(x)

(3)

Often, the interpretation of (2.4) is that the forest can be viewed as the sum of a regular stand structure (expressed by the prediction process z(x)) and of erratic local fluctuations (expressed by the residual process r(x )). It is important to note that (2.4) actually defines the residual process, so that the prediction model is not assumed to be "true".

To simplify, it is assumed that the model is fully specified a priori. Otherwise, fitting the model to the actual data (z(x), A(x): x e s2) can be a source of difficulties (Mandallaz, 1993). The underlying processes are assumed to be intrinsic, i.e. satisfying the conditions:

(2.S) E(U(x + h ) - U(x)) = 0 , v'x, h

E(U(x + h) - U(x))2 = 2yu (h) 'v'x, h

where U stands for either Z,Z, or R and Yu (h) is the so called (semi-) variogram. In practice, the variograms will have to be estimated before going into the proper estimation process (see Cressie, 1991, for a review). E( ) denotes the expectation with respect to the underlying stochastic model. We shall assume furthermore that the prediction and residual processes are uncorrelated.

In the geostatistical framework, the objective of forest inventory is therefore to estimate the quantity:

(2.6) z(V0) = -- Jz(x)dx for some V1 0 c V

A(Vo ) yo on the basis of either of the following data sets

{z(x). x e s2: z(x), x e sJ

{r(x) = z(x) - z(x), x e s2: z(x), x e s1 }

4.2.3 Double Kriging

1his method is the geostatistical version of the design-based regression estimate and yields the best (i.e.

with minimal mean square error) linear unbiased estimate of the form:

z° (Vo) = :�:)-1r(x() +

L

µ,z(x, )

les2 !es,

(3.1)

Under the previous assumptions, this is equivalent to the separate kriging of predictions and residuals (Mandallaz, 1993), i.e. the optimal coefficients in (3. 1) and the Lagrange's multipliers t1 , t2

(4)

1)1,fYR (Xi - xk ) + 't1 = YR (Xk,Vo ) for k e s2 ies2

L

µfYz(X1 - xk ) + 't2 = rz (Xk,Vo) for k e s1 les1

The mean square error is then found to be:

E{

z·cvo) - Z(Vo)} 2 =

LA,Y

R (x, . Vo) -yR(Vo• Vo) + 't1

IESz

+

Lµ,y

z (Xp Vo) - y z<Vo, Vo) + 't2

les,

where following notations have been used:

(3.2)

(3.3)

(3.4)

and similarly for Yz(X, Vo), Yz<Vo, Vo) ; these quantities must be evaluated by numerical integration.

In other words, the kriging point estimates and mean square errors of predictions and residuals are added; hence the term double kriging.

It is worth noting that the classical regression estimates are a special case of equation (3. 1) with

(5)

4.2.4 Estimation of Ratios

In many applications, one has to estimate quantities of the fonn:

(4.1)

e.g. percentages of trees with some characteristics, or mean timber volume per tree, etc. Note that in most circumstances, it would be wrong to consider quantities like:

th fu . z1 (x) . all cldi . as e nction --- 1s gener y not even a tive.

Z2(x)

The technique we propose to estimate r(V0 ) , and particularly its mean square error, differs from the standard procedure as presented in (Journel and Huijbregts, 1978). Instead, it rests upon a geostatistical refonnulation of Fieller's theorem (Cox, 1967), which is more in line with the design-based approach.

For each p e R1 we define the following random variable:

(4.2)

Note that from the definitions, one has :

(4.3)

Let 8 (p) be the double kriging estimate of S(p); for each given p , one needs to determine the * variograms of the "predictions" z1 (x) -pz2 (x) and of the "residuals" E1 (x) -pe2(x). This can be done either by direct model fitting of the empirical variograms of these two new processes, or by using pre-existing models of variograms and cross-variograms together with the relation

(6)

'Yu-9v (h) = Yu (h) + p2yv (h) - 2pyu_v(h) for any two processes

U,V.

Obviously, the first technique is more time conswning, but also more robust and instructive, since it models the relevant variogram directly.

* *

Toe estimate r = r (� ) of the ratio r(V0) is defined, because of (4.3), as the solution of the equation

Toe procedure is iterative, namely:

* * *

8 (p) = O, i.e. 8 (r ) = 0

(4.4)

(4.5)

* *

where

z,

( � ) ,

z

2 (� ) are the double kriging estimates of the true values z 1 (V0) . Z2 (V0) . The scheme (4.5) is based on the following argument:

* *

*

*

Set ,;.+1 = rn + En in (4.2) to obtain 0(rn+1) = 0(,;. ) - Enz2( � ) . Because of (4.3), this suggests immediately to set:

which is precisely the iteration scheme (4.5).

* * We assume convergence and define Jim rn = r

Let 0 (p) = 8(p) + e(p), where e(p) is the estimation error satisfying E e(p) = O and

2

* * * * *

E e (p) = MSE(S (p)) . Hence, one can write 0 = 8 (r ) = 0(r )+ e(r ) and therefore

so that r ( � ) is asymptotically unbiased, with the mean square error approximately given by: *

(7)

(4.6)

in perfect analogy with the design-based approach (Mandallaz, 1991). In practice it appears that 0, 1 , 2 iterations suffice. It is worth noting that the above procedure also reduces the bias of the first iteration estimate.

4.2.5 Illustration

We now briefly illustrate double kriging and compare it with design-based regression techniques. The data originates from an intensive inventory performed on a Swiss Plateau forest enterprise of 218 ha.

Systematic cluster sampling was used to sample the auxiliary information; this information was provided by a stand map based on aerial photographs. The nominal cluster size was 5 (note that the effective cluster size, i.e. the number of points of a cluster falling into the forest area, is also a random variable,with a mean value of 4) ; the first phase sample resulted in 298 clusters and 1203 points. The ter-restrial inventory rests upon 300 m2 plots with the same cluster structure on a 1 :4 subgrid (73 cluster, 298 plots). A small area of 17 ha has been fully inventoried with deter-mination of the tree co­

ordinates (4784 trees) to allow for validation. The quantities of interest are stem and basal area densities, as well as the percentage of non-healthy trees. The prediction model is a pure main effects ANOV A whose factors are:

(1) development stages (4 levels) (2) degree of mixture (2 levels) (3) crown closure index (2 levels)

The model is fitted with the actual data (the coefficients of determination were 0.5 for stem and 0.2 for basal area, Mandallaz, 1991, 1993). External models based on a yield table give essentially the same results. To estimate the density of non-healthy trees, the prediction of the total stem density was multiplied by a prediction of the percentage of non-healthy trees as obtained by a logistic model with the same factors as the above ANOV A model. The calculations required were perfonned with the software BLUEPACK (Ecole des Mines de Paris, 1990) on a VAX 9000-420. The results are summarized in tables I and 2 below.

(8)

Table 1: Entire Domain, 218 ha Point estimate

Kriging/standard error ( )

Stem density

Basal area density (m2/ba)

Percentage of non-healthy trees

Table 2: Small area, 17 ha Point estimate

Kriging/staodard error ( )

Stem density

Basal area density (m2/ba)

Percentage of non-healthy trees

Double Kriging

325.84 (1 1 .15)

31.17 (0.71)

18.83 (1 .20)

True value

280.23 -

29.60 -

24.20 -

Design-based

325.08 (12.12)

31.23 (0.89)

18.90 (1.60)

Double Design-

Kriging based

281.67 257.14

(26.91) (48.59)

29.46 24.00

(1.39) (3.70)

22.13 26.40

(3.80) (7.30)

The superiority of double kriging, with respect to bias and error, is striking for the small area estimation problem; a finding which is further substantiated if one splits the small area in squares of either 1 ha or 0.25 ha. The true values lie in the 95% confidence intervals for slightly more than 95% of

(9)

the squares. For such small areas the design-based estimates are either not even defined or usually useless.

The above results and other analysis of this data set provide statistical evidence that:

(1) The larger the domain, the smaller the difference between the kriging and design-based point and error estimates.

(2) The smaller the domain, the more pronouced the superiority of double kriging in terms of bias and error.

There are further empirical and theoretical arguments to support the view that these findings should hold for a wide range of forest inventories (Matheron, 1989; Mandallaz, 1991, 1993);

4.2.6 Conclusions

Double kriging is a simple geostatistical method particularly well suited to combined forest inventories based on double sampling schemes. It appears to be more efficient than the design-based regression estimate, especially for small area estimation problems.

4.2. 7 References

BLUEPACK, 1990: Manual, ENSM, Paris.

Cox, D.R., 1967: Fieller's theorem and a generalization. Biometrika 54, 567-572.

CRESSIE, N., 1991: Statistics for Spatial Data. New York, John Wiley & Sons.

JOURNEL, A.G.; HUIJBREGTS, C., 1978: Mining Geostatistics. London, Academic Press.

MANDALLAZ, D., 1991: A unified approach to sampling theory for forest inventory based on infinite population and super-population models. PhD thesis no 9378, ETH Ztlrich, Chair of foest inventory and planning, ETII Ztlrich.

MANDALLAZ, D., 1993: Geostatistical methods for double sampling schemes: application to combined forest inventories, Chair of forest inventory and planning, ETII Zurich.

MATHERON, G., 1970: La theorie des variables regionalisees et ses applications, Cahiers du centre de morphologie mathematique, no 5, Fontainebleau, France.

MATHERON, G., 1989: Estimating and choosing. Berlin, Springer.

Referenzen

ÄHNLICHE DOKUMENTE

There are several types of management units, i.e., Timber Supply Areas (TSAs), Tree Farm Licenses (TFLs), Parks, and other privately and publicly owned lands. Overall, the

This could possibly mean that the level of precision for timber estimates achieved in many inventories will have to be reduced, but it is in the long range not that

Given that the FHM program is still in its infancy with respect to the monitoring of changes in forest health, I am only suggesting an approach to the analysis of

Estimation of the amount of internal structural damage (decay) in standing trees is necess ary for assessing the amount of timber volume available for harvest. Determining which

N-tree distance sampling estimates of basal area per acre were compared to those of fixed radius plot and variable radius point sampling via simulation in four northern

Forest policy has traditionally aimed at a sustained yield of timber and the institutional framework underlying the timber production and processing, but the scope has

To conserve inventory funds, promote data sharing, eliminate information gaps, and to promote more consistency in resource mapping and inventory approaches, the FS issued

According to ANDRZEJEWSKI and WEIGL E (1993), Poland possesses one of the highest ranges of biodiversity characteristic for the lowlands of Europe, especially as regards bog