• Keine Ergebnisse gefunden

Brief review of programs with examples

Im Dokument Data Management Multivariate • 2A (Seite 39-70)

4 Operating the computer programs

4.4 Brief review of programs with examples

/NIT reads the data in the input file ORIG. It computes species frequencies and counts the number of species per releve. Coded data are substituted by numerical values. The first three I ines in ORIG and the scores on the subsequent cards are transformed as specified in the further four options of vector transformations, explained in Section 3.3. Their effect can be inspected in the output if specified on the fourth input line. If the data set to be

.EOR. is the notation for the end-of-record, .EOF. the same for the end-of-file card. The file

TYPE "YES" FOR SIGNUM TRA~SFORMATION:

N O - - - -Card 2

SPECIES NO.:

4.4.2 Program EDIT

Unidimensional vectors are constructed containing the identification numbers of the species, releves and their group membership. These are used in subsequent runs of program TABS to construct ordered vegetation tables and contingency tables, and in program /DEN to reallocate new elements. EDIT can also produce a new input file with fewer elements than file ORIG.

The options available, 15 in all, allow the user to construct new data vectors based on the output of previous runs of other programs or according to a specified personal pre-ference. Some of the options may change (edit) a vector already ~stablished in a previous run of EDIT Such is the case if, for example, the group sequence is changed (option 08, see below) or if new observations are added (option 14) For this, the user must activate the edit mode (see example).

Local input

This program begins by asking for the editing mode. If the answer is" YES': options 09 to 15 refer to an already established vector. Otherwise, the manipulations are performed on the sequences of releve and species numbers as they appear in the input file ORIG. The latter is always used if the option is between 01 and 07. With these options the editing mode is dis-regarded by the program.

After typing the option on line 2, options 07 to 14 require additional input lines (cards). These have to be set up as follows (see also examples):

Option 07: Specify your own releve sequence by typing 1 number per line (14).

Separate the groups by lines containing "0000" and terminate your input with "-100".

Option 08: Specify your new group sequence by typing 1 group number per line (/4). End the input with "0000''.

Option 09: Specify NX (14). Every NXth line will be retained.

Option 10: Specify the lowest frequency NX (/4) Lines with frequencies greater or equal to NX will be retained.

Option 11: Specify the first and last identification number of any subset to be re-tained (214).

Option 12: Type the identification numbers of the lines to be deleted (1 number per line, end with "0000").

Option 13: Specify the identification numbers of the lines to be retained (1 number per line, end with "0000")_

Option 14: Type new line numbers, 1 per line (/4) and end with "0000''.

After terminating the input for the releves, the options for the species have to be specified using the same logic as outlined above (see example)

After forming the new vectors for species and releves, the program asks for a reduc-tion of the original data file ORIG. This can only be of influence if less species or releves are used than are present in ORIG. The new (smaller) file REOR represents another basis for further computations in the same way as if file ORIG would be used. A reduction is never needed for just printing out a vegetation table in program TABS.

Example 2a OPT02: DECREASING FREQUENCY (FROM INIT) OPT03: SEQUENCE ON FIRST AXIS (FROM GRID) OPT04: GROUPS DERIVED FROM GRID

OPT05: GROUPS DERIVED FROM CLTR OR IDEN OPT06: TAKE LINES FROM PROGRAM RANK OPTO7: SPECIFY OWN GROUPS

OPT08: CHANGE GROUP SEQUENCE OPT09: TAKE EVERY NX-TH LINE

OPT15: RETAIN PREVIOUS CLASSIFICATION TYPE OPT:

0 7 - - - -- - - -- - - C a r d 2 7

INPUT: TYPE NUMBERS OF SPECIES/RELEVES, OPT02: DECREASING FREQUENCY (FROM !NIT) OPT03: SEQUENCE ON FIRST AXIS (FROM GRID)

OPT15: RETAIN PREVIOUS CLASSIFICATION TYPE OPT:

0 1 - - - -Card 4 1

TOTAL 11

THE FOLLOWING SEQUENCE HOLDS

1 2 3 4 5 6 7

Example 2b

For this, the CDC system cards are the same as in Example 2a, but the input cards are as follows:

NO 05 NO 1)4 0

This run too, uses results from other programs (Section 4.2). On the DEC System 1090, the following example is available when EX EDIT is typed:

***

EDIT

***

**

RELEVES

**

TYPE "YES" TO CHANGE AN ALREADY EXISTING CLASSIFICATION: NO

NO OPTu1: SEQUENCE OF INPUT (FROM !NIT) OPTu2: DECREASING FREQUENCY (FROM INIT) OPT03: SEQUENCE ON FIRST AXIS (FROM GRID) OPT04: GROUPS DERIVED FROM GRID

OPT05: GROUPS DERIVED FROM CLTR OR IDEN OPT06: TAKE LINES FROM PROGRAM RANK OPT07: SPECIFY OWN GROUPS

OPT08: CHANGE GROUP SEQUENCE OPT09: TAKE EVERY NX-TH LINE

OPT15: RETAIN PREVIOUS CLASSIFICATION

Card 1

THE FOLLOWING SEQUENCE HOLDS

6 2 5 1 7 3 9 8 4 OPT02: DECREASING FREQUENCY (FROM INIT) OPT03: SEQUENCE ON FIRST AXIS (FROM GRID) OPT04: GROUPS DERIVED FROM GRID

OPT05: GROUPS DERIVED FROM CLTR OR IDEN OPT06: TAKE LINES FROM PROGRAM RANK OPT07: SPECIFY OWN GROUPS

OPT08: CHANGE GROUP SEQUENCE OPT09: TAKE EVERY NX-TH LINE

OPT15: RETAIN PREVIOUS CLASSIFICATION

Card 3

TYPE OPT:

0 4 - - - -Card 4 q

TOTAL 11

THE FOLLOWING SEQUENCE HOLDS

1 9 3 5 2 8

Both species and releves are now classified into 4 groups respectively.

4.4.3 Program TABS

A vegetation table is printed. For option 1, the raw data are used with their order un-changed. For option 2, the arrangement of releves and species is according to the solution in the previous run of program EDIT The group labels are also printed. When option 3 is spe-cified, two different data sets are combined into one using the catalog numbers of the species (see Section 4.2).

TABS prints only 1-character codes. If the original data has numerical values and a run of TABS is needed to get a contingency table, an optional FORMAT-card, containing FORTRAN A1-conversion code, can be typed on input that will overcome this restriction. This in fact overwrites the FORMAT given in Card 5 in the original input file (Section 4.2).

Whenever an ordered vegetation table is printed, the program can also construct frequency and contingency tables. The contingency table is stored in file LIN for further processing in program CIAC. In this, the absolute frequencies of species and releve groups are used. The frequency table, stored as file TAST, contains class values:

Relative frequency of species within a group

0

;;;,,o

to <0.2

;;;,,0_2 to <0.4

;;;,,0.4 to <0.6

;;;,,o.6 to <0.8

;;;,,o.8 to 1.0

Class value

0 (empty) 1

2 3 4 5

File TAST can be further processed in the same way as file ORIG. The same applies to file CO, containing a combined data set.

Local input

Four input cards (lines on DEC System 1090) are needed as can be seen in Example 3. The first input option decides which file is to be used. If two different tables are combined, as specified on the second card, the first card is disregarded by the program. After entering an optional FORMAT, card 4 asks for the printing of contingency and frequency tables.

The latter is mandatory but is not effective if none of the groups are present in the data.

The program will fail to run (without warning) if one tries to produce an ordered vegetation table without running program EDIT first!

Example 3

TABLE NO. 6

---+---TYPE "YES" FOR FREQUENCY- AND CONTINGENCY TABLES:

Y E S - - - -Card 4

---+---4.4.4 Program RES£

A square, full-storage resemblance matrix (file RES) with elements relating either the rows (releves) or the columns of the data matrix from file TABLE is generated. Different types resem-blance measured refer to columns (species).

The second input card (line) specifies the resemblance function r(x,y). The following definitions are relevant:

sx

=

:rx, sy

=

:ry, sxx

=

:rx2, syy = .ry2, sxy = .rxy

1. Cross product of centered data:

r(x,y) = sxy ·- sx sy/n

7. Van der Maaret's coefficient:

r(x,y)

=

sxy/(sxx + syy -sxy) others. If the answer is" YES''. the identification number of this row (column) will be speci-fied on the fourth card {see Example 4a). If the answer is "NO'; the fourth card will control the output of the entire data matrix. This is very large and can be suppressed if the data set to be analysed is large.

Example 4a

On DEC System 1090, the command EX RESE gives the following output:

***

RESE

***

Example 4b

This, too, is based on results from Example 1. The CDC system cards remain the same, but the input cards are as follows:

The run produces the following results:

R l NO YES

***

RESE

***

4.4.5 Program /DEN

New data points are assigned to existing groups (file LIN). Data points which have a lower similarity value to the nearest group than that specified, are not assigned but are given the group number 99. These are considered as the outliers of the sample (see Example 5).

Local input GROUP (INCLUDE DECIMAL POINT):

1.J. 0 - - - -Card 1 CLTR (cluster analysis) that were in the file NRA.

4.4.6 Program RANK

The ranking procedure uses the independent components derived from the sum of squares.

These are determined for each species by projecting vectors:

(a) Center the data matrix X within the species to obtain a new matrix A with elements

Ahj = (Xhj - Xh)/Qh {9)

:Rh and Oh represent respectively the mean and a factor of standardization in species h. Xhi is the value of species h in releve j.

(b) Compute the cross products S

=

AA' with a characteristic element Shi =

r

Ahi A;j, j = 1, ... ,n

J

(c) Compute dispersions and find the maximum, SS = max(f S~1/S11 , ... ,~ S~/SPP),

h = 1, ... ,p

(10)

( 11 ) In these, n indicates the sample size (number of rel eves) and p the number of species.

Declare rank 1 for the species m associated with SS The quantity SS is a measure of redundancy in the sample of p species with respect to species m.

(d) Compute the residual,

Shi := Shi - Yhm Yim, for any h,i = 1, ... ,p.

in which

Yhm = Shm/~ and Y;m =Sim / ~ .

(12) ( 13) (e) Compute a new value of SS from the elements of the residual Sand declare rank 2

for the corresponding species. Continue (step d) until all species are ranked.

Steps (a) and (b) are performed by RESE, option 1. The stopping criterion in RANK is specified by the user. Species which account for more than a specified threshold limit of sums of squares have their records written in NRA.

RANK will give meaningful results only if the resemblance measure used is the co-variance or some quantity related to the covariance. Such is the case with the resemblance measures in options 1, 2 and 3 in program RESE.

Local input

Program RANK starts by asking for the stopping criterion. This is the minimum of the cumulative variance to be reached with all ranked species (a fraction of 1). The initial

(dependent) variances are printed on request (input card 2).

RANK may stop before reaching the stopping criterion. This will be the case if the algorithm detects an error due to insufficient precision in the computation.

Example 6

4.4. 7 Program CLTR to form the initial clusters. New points will join an already existing cluster if their similarity to the nearest member in the cluster is greater than that to any other data point outside the cluster.

In complete linkage analysis the process starts by finding the most similar data points to form the initial clusters. New points will join a cluster if their similarity to the

most dissimilar member of a cluster is a maximum.

In single and complete linkage analysis group resemblance is based on distances or po-sitive, the user is informed about the transformations that have automatically been applied.

No special restrictions exist when single or complete linkage has been chosen. Since squared Euclidean distances are used in minimum variance clustering, the similarity measures r corresponding to options 1, 2, 3, 6 and 7 in program RESE are transformed according to:

d2 = 2(1- r[x,y]). (15)

Note that in the case of resemblance measures 1 or 2, formula (15) will only hold true if transformation to unit length (option 2) has been chosen in program /NIT! There is no need for transformation of the measures 4 and 5, where the squared distances are computed.

The program then requests input for the size of the dendrogram - print the dendro-gram on one page only if further processing is planned (see Example 7). The last input

When using DEC System 1090, type EX CLTR to get CLTR going. The output includes:

OPT3: MIN.VARIANCE CLUSTERING/ TYPE OPT:

2 - - - -Card 1 (CDC) 2

TRANSFORMATION OF RESEMBLANCE VALUES (R):

ABS(MIN(R)) IS ADDED

4.4.8 Program PCAB

An R type analysis is computed. The input is the centered raw data preprocessed by /NIT, and a similarity matrix from program RES£ (option 1, 2 or 3). The first 6 sets of compo-nent coefficients (direction cosines) and compocompo-nent scores (coordinates) are written on the disk (file COOR).

Principal component analysis describes a set of objects in terms of a reduced number of new variables. Since the new variables are uncorrelated, they are efficient as ordination axes. The basic (I inear) transformation is specified by

(16) This is a linear transformation of species quantities X1i, ... ,Xpi to yield the ith component score Yii for releve j. The component coefficients a,i, ... ,api relate the species to the ith com-ponent. Because they are direction cosines, they satisfy the condition

at+ .. . +

a~i

=

1.

The objective of the algorithm is to supply component coefficients and to compute scores. If the objects to be ordinated are releves, the R-algorithm will proceed as follows:

(a) Center the original data within the species to obtain the centered data matrix A with elements

Ah·J

=

(Xhj - x'.h)/Si,

h =1, ... ,p;j=1, ... ,n (17)

where Xhi is the original value of species h in releve j, Xh the mean of species hand Si is a factor of adjustment.

(b) Compute cross products between the species, R = AA'. Matrix R is generated in pro-gram RES£ (options 1, 2 or 3). If Si in formula (17) is equal to n-1 (option 2 in program RES£), then R is a covariance matrix. If Si is the standard deviation in the h species (option 3 in RES£) then R is a correlation matrix.

(c) Find the eigenvalues X1, ... ,Am and the corresponding eigenvectors a1, ... ,am of R.

It is assumed that the eigenvectors (with elements representing component coeffi-cients) are standardized so that

a;

ai

=

1.

(d) Compute component scores for the releves according to Y

=

a' A (second set of coordinates in the printout of PCAB, Example 8). A characteristic element

'tii

from the matrix Y represents the component score of the j th releve und the ith component.

The dispersion of the ith component is equal to the ith eigenvalue of matrix R. The efficiency of the ith component in expressing the total linear variation in the sample is given by A/"J:,Ak with k

=

1, ... ,m.

k

The actual results of PCAB depend on the runs of preceding programs. If a crossproduct, covariance or correlation matrix of releves is generated in RES£, the data will be centered within releves and the elements of the eigenvectors will be adjusted so that their sums of squares are equal to the associated eigenvalues. The component scores will become species coordinates (second set of coordinates in the printout of PCAB). Should a resemblance matrix of species be generated, then the data will be centered within the species. After similar adjustments of the eigenvectors, the elements are seen on the printout as the first set of coordinates. The component scores represent releve coordinates (second set of coordi-nates).

If the raw data represent deviations from expectations (option 3 for vector trans-formations in program /NIT), and the elements of the eigenvectors are adjusted as specified in formula (28). the reciprocal ordering scores wi 11 be obtained for the species. For rel eves, the scores will be obtained by averaging according to (29). It follows that to derive recip -rocal ordering scores, the user must call option 3 for vector transformations in program IN/Tand option 1 for the resemblance measure in program RESE.

Local input

The output of PCAB overwrites any set of old ordination coordinates previously stored in COOR Note that if the centering in RESE was chosen to be within the releves, such as in Example 4b, which produced the input for PCAB, the component coefficients will not be meaningful as coordinates for releves and should not be stored (type "NO" on input card, Example 8). However, the component scores for species are meaningful as ordination coordinates. One should therefore perform the analysis on the scalar products of species (option 1, 2 or 3 in program RESE), to render component scores that are meaningful as ordination coordinates for releves. If coordinates for species have been previously com-puted, they will be retained (type "NO" on input card). In either of these cases the algo-rithm is R-type.

In reciprocal ordering, however, both sets of coordinates are meaningful and may be used directly for further processing (type "YES" on input card).

Example 8

This example is dependent on a previous run of Ex-ample 4b (see explanations in the example on page 34). The following system- and input-cards are used on the CDC:

PCAS, 8216 ,01160000, 1'14. PERM!', TAPE14.

ATTACH,OLDPL,lALIB.

CALL,PCAB.

REWIND, PCAB.

F'1N, 8'e2, I=PCAB.

ATfACH,TAPEll,T/\BLE.

ATI'ACH,TAPE12,NRA.

ATTACH,ThPElJ,RES.

ATTACH,TAPE15,COOR.

COPYBF',TAPE15,TAPE14.

REWIND, TAPE14. RETURN, TAPElS.

u:;o.

P\JRGE,OXR,Pd=8215.

RCTURN,COOR.

CATAL()'.;,TAPE14,COOR.

AUDI-r •

EOO. YES .EOO •

EOf.

When using DEC System 1090, type EX PCAB to run PCAB. The output includes:

***

PCAB

***

COMPONENT COEFFICIENTS WILL BE DIRECTION COSINES OF RELEVES.

COMPONENT SCORES WILL BE COOR DINA TES OF SPECIES.

THE ANALYSIS IS BASED ON RESEMBLANCE MEASURE NO. 1 TYPE "YES" TO ALSO STORE COMPONENT COEFFICIENTS:

YES Card 1

(CDC) YES

TERMINAL ERROR= 0 EIGENVALUES:

.417E-02 . 359E-02 . 773E-03 .428E-03 .218E-03 . 178E-03 . 104E-03 .190E-04 .OOOE+00

PERCENTAGES:

43,990 37.859 8. 155 4. 513 2.304 1. 879 1. 100 0.201 0.000

NO., COMPONENT COEFFICIENTS, FIRST 6 AXES

1 -0.01700 0.00004 0.00455 0.00641 -0.00051 0.00525 2 -0.01997 0. 01969 -0.00680 -0. 00050 0.00830 -0. 0007 9 3 0.02007 -0.00097 -0.01176 -0.00400 0.00531 0.00471 4 0.00860 -0.03837 0. 01144 0.00451 0.00578 -0.00371 5 -0.02122 -0.00663 -0.00437 0.00868 -0.00610 0.00350 6 -0.01873 0.02330 0.00404 -0.00122 -0.00141 -0.00806 7 0.02770 -0.00704 -0.01402 0.00270 -0.00526 -0.00518 8 -0.01449 -0.01785 0.00287 -0.01631 -0.00413 0.00147 9 0.03505 0.02782 0.01404 -0.00027 -0.00197 0.00282 NO., COMPONENT SCORES

1 -0.03778 0.01885 -0.00556 -0.00382 -0.00332 0.00158 2 0.02334 0.00474 0.00604 -0.00625 0.00021 0.00591 3 0.00951 -0.03040 -0.00564 -0.00158 0.00140 -0.00667 4 0.00082 -0.02970 -0.00633 0.00970 -0.00411 0.00350 5 0.01652 0.02141 0.00759 0.00267 -0.00855 -0.00614

6 -0. 01750 0.01306 -0.00288 0.01112 0.00184 0.00177

7 0.00672 -0.00270 -0.01279 -0.01048 -0.00271 0.00247 8 0.02668 0.00781 0.00498 0.00435 0.00376 0.00389

9 0. 00872 0.02040 -0.00932 0.00029 0.00669 -0.00451

10 -0.01625 -0.01462 0.01387 -0. 00309 -0.00176 -0.00071 1 1 -0.02078 -0.00886 0.01003 -0.00292 0.00654 -0.00107

4.4.9 Program GRID

Grid analysis uses the ordination coordinates from PCAB to recognize clusters. Cluster contents and coordinates are written on tape. Scatter diagrams may be plotted by ORDB to present the groups. The algorithm proceeds through the steps as outlined (Wildi, 1979):

1. Compute coordinates on m component axes {PCAB) Designate the

i

th coordinate in the ith releve by x,i·

2. Determine the range ri for the coordinates on axis j, such that rj

=

max {X1j, ... ,Xnjl - min {X1j, ... ,Xn1 ).

Letter n signifies the number of releves.

(18)

3. Transform the coordinates so that all sample points are in a hypercube which has unit side lengths. Since the lengths of the edges equal the maximum ranges, max (rj, j

=

1, ... ,m) in the original coordinates, the transformed coordinates are given by

Y;i

=

(xii - min [xij])/max (ri),

for i = 1, ... ,n; j = 1, ... ,m. (19)

4. Determine the segment zii in which a given releve i falls on axis j:

zii

=

integer {Yii [wi + 1 ]). (20)

In this, w1 is the number of segments on the jth axis. The values of zii will range from 1 to Wj.

5. Determine the number of releves in each segment by counting all the points with identical zii labels.

6. Identify the densest segment q, declare it as a region of local maximum density and continue with step 9.

7. Identity the next densest segment q and denote its density as d.

8. Check the neighbourhood of this segment for densities greater than d. The neigh-bourhood of any segment i includes all segments having any combination of zii values betv.-een zii - 1 and zii + 1, except Zij, for all dimensions j. If the density of none of the neighbouring segments exceeds d, declare segment q a local maximum and continue with step 9. Otherwise go to step 7.

9. Compute centroid coordinates of releves within the densest segment q according to

Cqi

=

1/q

. :r

'Y;j, for j

=

1, ... ,m (21)

1€q

vvhere ieq indicates that all terms in the summation are from the qth segment.

10. Shift segment q a little and determine coordinates for the midpoint according to:

Mqi

=

Uqi + f (U.qi - Cqi), for j

=

1, ... ,m. {22) In this, f, the shifting factor, vvhich multiplies the vector UqjCqi,'is arbitrarily set --+

equal to 1. Uqi represents a coordinate of the midpoint of the unshifted segment obtained by

Uqi

=

{zii - 0.5) 1/wi· (23)

11. Count the number of points in the shifted segment by checking the coordinates.

Designate this count by d'.

12. If the number of releves in the shifted segment is greater than in the unshifted, accept the density within the shifted segment as a new approximation of local maxi-mum, denote the shifted as segment q, and go to step 9. If not, go to step 13.

13. Establish the centroid (Cqi) of the points in the unshifted segment as the final ap-proximation to the nodum and go to step 7, or stop if

13. Establish the centroid (Cqi) of the points in the unshifted segment as the final ap-proximation to the nodum and go to step 7, or stop if

Im Dokument Data Management Multivariate • 2A (Seite 39-70)