Correlation coe ffi cients - Data analysis technique

4.3 Data analysis technique

4.3.2 Correlation coe ffi cients

A correlation coefficient is a measure of the statistical relationship between two or more random variables. The direction and strength of the correlation are described by the sign

and absolute value of correlation coefficient, respectively. The correlation coefficient has values in a range between -1 (perfect anticorrelation) and 1 (perfect correlation), while 0 describe a lack of correlation (uncorrelated variable). In the case of two variables, the correlation is positive, when changes of both variables characterise the same trends. For the opposite trends, the correlation is negative (anticorrelation). If the absolute value of the correlation coefficient is closer to one, this indicates a stronger dependence between the variables. When the correlation coefficient is near zero, then the association is weak.

There are many types of correlation coefficients, such as, Pearson’s linear correla-tion coefficient which measure a linear association, Spearman’s order rank correlation coefficient describing the non-linear relationship, the cross-correlation is used to analyse association between 2D arrays (images). Here, we present a brief overview of correlation coefficients that are used in this thesis.

Pearson’s correlation coefficient

The Pearson’s correlation coefficient measures a linear relationship between two variables (xandy). We follow Press et al. (2002) in assumming the bivariate normal distribution of variables; xandyare stochastically independent; and the relation between them is linear (y= ax+b), then the linear correlation coefficient (r) is:

r≡ of measurements andx,yare mean of x and y, respectively.

The Pearson’s correlation coefficient is most often used in linear relationship studies.

In Fig. 4.4, we present examples of scatter plots for different relationships between two variables and calculated their correlation coefficients. The scaling and order of the data do not have an influence for the correlation coefficient, however, it is sensitive to outliers.

For a more detailed description of Pearson’s correlation coefficient we refer to Press et al. (2002). The calculation of the Pearson linear correlation coefficient is provided by theCORRELATE³function in IDL (from version 4.0).

Cross-correlation coefficient

The cross-correlation coefficient is used to identify similar patterns in image analysis.

This coefficient describes the relationship between intensity pattern of two or more im-ages. If M and N are two-dimmensional arrays of intensities with the same sizes (i× j), then the cross-correlation coefficient is defined as:

r ≡ The images in Fig. 4.7 demonstrate the usage of the cross-correlation coefficient to describe the relationship between the original image and images contaminated by noise.

In our study, we use the IDL functionC_CORRELATE⁴to calculate the cross-correlation

3CORRELATE:https://www.harrisgeospatial.com/docs/CORRELATE.html

4C_CORRELATE:https://www.harrisgeospatial.com/docs/C_CORRELATE.html

a) b) c) d)

e) f) g) h)

r=1.0 r=0.75 r=0.5 r=0.25

r=0.0 r=−0.5 r=−0.75 r=−1.0

Figure 4.4: Correlation scatter plots and Pearson’s correlation coefficients. In the case of perfect correlation (a), all points are located on the straight line and both variables increasing. The correlation coefficient falls out with increasing scattering of the points (b, c). Ther =0 indicates the lack of dependence between variables (e). When two variables follow an opposite trend then the correlation coefficient is negative (f, g, h). In perfect anti-correlation (h), all points are exactly on the straight line andr =−1.

coefficient. For more details about the cross-correlation we refer to Fuller (1995).

Spearman correlation coefficent

The Spearman’s rank order correlation coefficient is a test of a non-linear association between two variables (xandy). It is used for a monotonic (not necessarily linear) relation between variables, for any distribution.

The algorithm for computing this correlation is presented by Press et al. (2002). In this method, pairs of (xi,yi) are grouped in the rank and for each of them the mean value is calculated. The Spearman correlation is computed as a linear correlation of ranks.

The Spearman’s correlation coefficient is insensitive to outliers. Therefore it is used to analyse a highly scattered or poor quality data. It can be used also to study the linear relationship between variables instead of the Pearson’s correlation coefficient, when the assumption of the normal distribution is not applicable or when the number of outliers is significant. However, the Pearson’s correlation is a quantity measure of an association, whereas the Spearman’s correlation coefficient is a quality measure. The Spearman’s correlation is symmetric due to changes of variables.

In Fig. 4.6, we present the usage of the Spearman’s correlation coefficient. The IDL functionR_CORRELATE⁵ allows to calculate the Spearman’s correlation coefficient.

5R_CORRELATE:https://www.harrisgeospatial.com/docs/R_CORRELATE.html

a) b) c) d)

e) f) g) h)

oryginal (r=1.0) r=0.75 r=0.5 r=0.25

r=−1.0 r=−0.75 r=−0.25 r=0.0

Figure 4.5: A series of images with added different noise level and the cross-correlation coefficients. The cross-correlation coefficient is calculated between image without noise (a) and rest of images. The correlation of image without noise with itself (a) givesr= 1, perfect correlation. The increasing noise level (differences between images) causes de-crease of the correlation coefficient (b, c, d). These analysis are provided to the comple-ment image (e) and complecomple-ment images with noise (f, g). The complecomple-ment images (e, f, g) are anti-correlate with original one (a). The two images without common patterns (h) haver =0.

a) b) c) d)

e) f) g) h)

r=1.0 r=0.75 r=0.5 r=0.25

r=0.0 r=−0.5 r=−0.75 r=−1.0

Figure 4.6: Scatter plots with non-linear correlation and Spearman’s correlation coeffi -cients. A perfect correlation (a),r = 1, is for a positive monotonic relationship between two variables (not necessarily a linear relationship). The correlation coefficient decreases with increasing scattering of the points (b, c, d). Forr = 0 (e), the variables are uncorre-lated. The negative correlation coefficient indicates anti-correlation (f, g, h).

Im Dokument Small-scale structures in the upper atmosphere of the Sun (Seite 32-36)