• Keine Ergebnisse gefunden

4 Methods of data analysis

4.2 Method of regression analysis

Method of regression analysis

__________________________________________________________________________________________

this, a certain accepted probability of error is defined as, for example, 5 percent in this study.

Thus, a significant result can be correctly transferred to the main unit with a probability of 95 percent. In correlation analysis, the significance of the result is tested by the T-test. The null hypothesis for this test is formulated as (Aaker et al. 2004, p. 513):

0 : H0 ρ=

Where:

ρ = rank correlation coefficient of Spearman

Thus, the null hypothesis can be refused if the Spearman correlation coefficient differs significantly from 0. As explained by Köhler et al. (1996, p. 104) the T-value for the Spearman correlation coefficient can be calculated in the same way as for the Pearson correlation coefficient, given that the sample size is ≥ 12. The formula for the empirical T-value reads as follows (Aaker et al. 2004, p. 514):

emp 2

1 2 T n

ρ

⋅ − ρ

=

Where:

Temp = empirical T-value

ρ = rank correlation coefficient of Spearman n = number of observations

The calculated empirical T-value is compared to the critical T-value given in the table of the t-distribution (“student distribution”). The null hypothesis can be refused when the empirical T-value is higher than the respective critical T-value. For reading off the critical T-value from the table, the determined level of significance is needed as well as the degrees of freedom which are n-2.

Method of regression analysis

__________________________________________________________________________________________

increase of sales in dependence of a decreased price and/or other factors influencing the amount of sales (Hair et al. 1998, pp. 141).

The researcher must have a clear view which of the variables is the dependent and which are the independent variables. An example for a hypothesis to be tested by simple regression analysis would be: “The sold amount of a product can be explained by the price for this product”. If more than one explaining variable is included into the model, the hypothesis for a multiple regression analysis could be formulated as: “The sold amount of a product can be explained by the price for the product and by the costs for advertisement”. If the researcher is familiar enough with the variables she/he can, in addition, assume which of the variables will be the more important influencing factor.

In the following the focus is on linear regression analysis. This method assumes that all investigated associations between variables have a linear run. Backhaus et al. (2000, pp. 6) explain that with regression analysis two problems are solved. (1) The first is to compute the association of dependent and independent variable within the sample. The linear relationship between the variables is described by the following regression equation (based on Hair et al.

1998, pp. 153), given that only one explaining variable has been included in the calculation:

yˆ = a + bx + e

Where:

= estimated value of the dependent variable

a = intercept; constant; point of intersection of the regression line with the y-axis b = regression coefficient; gradient of the regression line

x = independent variable e = residual; prediction error

For multiple regression analysis the regression equation is enlarged by n additional independent variables and their regression coefficients:

yˆ = a + b1x1 + b2x2 + bnxn + e

The regression analysis aims to calculate the point of intersection of the regression line with the y-axis and the regression coefficient(s) which describe the gradient of the regression line.

Once these values are determined, statements can be made in the style of: “When the x-value increases by one unit, the y-value will change by the value of the regression coefficient” (see Voß 2000, p. 130). With regression analysis the weight of the independent variables’

contribution to explaining the dependent variable is investigated (Hair et al. 1998, pp. 148).

The regression line is computed in a way that the distance between the measured values and the estimated values, located on the regression line, is as small as possible. Thus, each measured value has a corresponding theoretical value, which is located on the estimated regression line. a and b have to be computed with the aim to minimise the sum of the squared deviations of all measured values and their corresponding theoretical values. This procedure is called least-square-method (Aaker et al. 2004, pp. 517).

(2) The second problem to be solved by regression analysis is to prove if the association which has been determined in the sample can be regarded as valid for the main unit. This is to find out the quality of the estimated regression equation. This is conducted by several steps.

Method of regression analysis

__________________________________________________________________________________________

First, for the entire regression equation it is checked to which extent the dependent variable is explained by the regression model. This is shown by the coefficient of determination, r2, and by the result of the F-test.

The coefficient of determination, r2, shows the relation between the squared and summed up deviations of the estimated y-values from the arithmetic mean of the observed y-values (“explained variance”) and the total deviation of the individual observed y-values from the arithmetic mean of all observed y-values (“total variance”). This can be expressed by the following formula (Backhaus et al. 2000, p. 22):

( )

( )

totalvariance

iance var lained exp y

y y yˆ

r n

1 i

2 i n

1 i

2 i

2 =

=

=

=

Where:

r2 = coefficient of determination i = placeholder for the y-values 1-n

y = arithmetic mean of all observed values of the dependent variable

= estimated value of the dependent variable

R2 can reach values between 0 and 1. The higher the explained variance, the nearer r2 will be to 1. If it was exactly 1, the regression model would be able to completely explain the value of the dependent variable.

As Backhaus et al. (2000, p. 24) state, the value of r2 is influenced by the number of the regression coefficients within the regression model. With each independent variable which is added to the model the value of r2will increase even if the influence of the added variable is only random. Therefore, in multiple regression analysis r2 is corrected by the degrees of freedom, taking into account the number of observations as well as the number of included independent variables. The result is called “adjusted coefficient of determination”. It is calculated according to the following formula (Backhaus et al. 2000, p. 24):

( )

1 n n

r 1 r n

r

x y

2 2 x

adjusted 2

− −

=

Where:

r2adjusted = adjusted coefficient of determination r2 = coefficient of determination

nx = number of independent variables ny = number of observations

ny - nx -1 = degrees of freedom

By calculating r2, it can be assessed how well the regression line fits to the observed values of the dependent variable. This is the descriptive side of the regression analysis. In addition, the researcher usually is interested in assessing if the estimated model is also valid for the main unit, i.e. if the conditions found within the sample can be used to make a generalised

Method of regression analysis

__________________________________________________________________________________________

statement concerning the influence of the independent variables on the dependent variable.

This is investigated by the F-statistic.

The F-test is based on the hypothesis that the “true” regression coefficients - these are the coefficients which exist in the “true” regression equation showing the cause-effect relationship of the variables in the main unit - have to be different from 0 to guarantee that they influence the dependent variable (Backhaus et al. 2000, p. 25). Thus, H0 is formulated as follows (based on Backhaus et al. 2000, p. 25):

0 :

H0 β1n =

The null hypothesis indicates that no influence of the independent variables would exist if all regression coefficients were 0. This is proved by the F-test for which an empirical F-value is calculated which is then compared to a theoretical value given in tables of the F-distribution. If the empirical F-value exceeds a critical value of the theoretical F-value, H0 has to be refused. The conclusion is that not all values of β1-n are 0, and that therefore an influence of the independent variables exists in the main unit.

Before looking up the theoretical F-value in the table, a level of significance has to be determined by the researcher. In this study, a probability of error α of 5 percent is required.

This means that the probability of refusing the null hypothesis although it is valid for the main unit has to be lower than 5 percent for getting a significant result. In addition to the probability of error the degrees of freedom of the explained variance, nx, as well as the degrees of freedom of the not explained variance, ny - nx -1, are needed for reading off the critical F-value from the table.

The empirical F-value is calculated according to the following formula (on the basis of Backhaus et al. 2000, p. 26):

( )

(

y

)

/

(

n n 1

)

notexpexplainedlainedvarvarianceiance/

(

n/n n 1

)

n / y yˆ F

x y

x n

1 i

x y 2 i i

n

1 i

x 2 i

emp = − −

=

=

=

Where:

Femp = empirical F-value

y = observed value of the dependent variable

= estimated value of the dependent variable

y = arithmetic mean of all observed values of the dependent variable i = placeholder for the y-values 1-n

nx = number of independent variables; degrees of freedom of the explained variance ny = number of observations

ny - nx -1 = degrees of freedom of the not explained variance

After having assessed the goodness of fit of the entire regression model and given that a significant influence of the variables was found out - i.e. not all regression coefficients are zero - , the calculated values of the individual regression coefficients are checked concerning their respective significance of influencing the value of the dependent variable. This is investigated by the T-test. H0 is again formulated as follows (Backhaus et al. 2000, p. 29):

__________________________________________________________________________________________

0 : H0 βi =

The empirical T-value is computed according to the following formula (Backhaus et al. 2000, p. 30):

bi i

emp s

T = b

Where:

Temp = empirical T-value for the regression coefficient of the independent variable i bi = regression coefficient of the independent variable i

sbi = standard error of the regression coefficient of the independent variable i i = placeholder for the independent variables 1-n

The empirical T-value is compared to the theoretical T-value shown in the table of the t-distribution. Here, the determined probability of error α, in this study 5 percent, and the degrees of freedom of the not explained variance, ny - nx -1, have to be taken into consideration for finding the needed critical T-value.

As the t-distribution varies around the arithmetic mean zero, the T-value can become both negative and positive. Therefore, the absolute value of the empirical T-value is compared to the theoretical T-value taken from the table. If the absolute value of the empirical T-value is higher than the respective theoretical T-value, H0 is refused, which means that the regression coefficient of the main unit differs significantly from zero and therewith has a clear influence on the dependent variable.

__________________________________________________________________________________________