How to interpret linear regressions Introduction to MENA economics WiSe 12/13 Introduction:
A Linear Regression is a method widely used in economics to establish a possible causation:
Does X have an effect on Y?
Y is called the dependent variable, it is the one we are trying to explain (For example if we are trying to see if X has an effect on GDP, GDP would be our Y, that is, our dependent variable). If we are trying to figure out if investments in education have an effect on economic growth (=GDP), Investments on education would be our independent variable. X (independent) = Investments on education. Y (dependent) = GDP.
How to interpret linear regression outputs:
1. Look at the pvalue (or ***)
a. The pvalue tells you how confident you can be that the results are significant (That the independent variable actually explains the dependent)
b. If it’s smaller than 0.05 then you say the variable is significant (For example if the p is 0.03 then it is significant 0.03<0.05)
i. You say: The Variable is significant at the 5% level
ii. Alternatively: We can say with 95% confidence that this result is significant c. If its smaller than 0.01 its even better! you can say it is significant at the 1% level
(Or 99% confident)
d. In some cases, instead of the pvalue you have stars (*;**;***)
i. The stars tell you how confident you can be that the results are significant;
the more stars, the better
ii. * = 90% confidence ; ** = 95% Confidence; *** = 99% Confidence
2. If you want you can also look at the T statistics. But it will tell you the same as the p value (In fact, the p value is taken out of the T statistics). If T (in absolute terms, that means regardless of if its positive or negative) is bigger than 2 its significant at the 5% level 3. interpret the coefficients. If the coefficient is positive it means that an increase in the
(independent) variable increases the dependent variable. If its negative, this means that an increase in the (independent) variable decreases the dependent variable.
a. The bigger the number (In absolute terms) the more effect it has on the dependent variable.
b. [BEWARE] If the values are in Log we are using percentage changes!
i. If the dependent variable is LOG_GDP and one of the independent variables is GLOB_ECON and the coefficient is 1.233 we would interpret it this way: An increase in one unit of Economic Globalization increases GDP by 1.233% (percent)
ii. If the dependent variable is LOG_GDP and one of the independent variables is Log_MilitarySpending, and the coefficient is 0.33 we would interpret it this way: An increase in 1 percent of Military spending decreases GDP by 0.333% (percent).
4. Interaction terms: [For example: In the probe Klausur DEMOG*RES] We use interaction terms to see which effect an independent variable has when we take into account another variable [How does GDP react when there is a population increase in resource rich countries (DEMOG*RES)]
a. If the coefficient of DEMOG*RES were positive, this would mean that a population increase in a resource rich country increases GDP more than a population increase in a resource poor country.
b. If the coefficient of DEMOG*RES were negative (as in the Probeklausur), this would mean that a population increase in a resource rich country has less effect on GDP than a population increase in a resource poor country
c. The ones that may come in the exam (According to Theory) are DEMOG*RES, RES*INST (Resource Curse!) [And also others.. will add when I go through the script once more)
5. Diminishing Marginal returns: This means that an increase of the independent variable (for example GDP) affects the dependent variable (Inequality), but each additional unit of increase affects the dependent variable less, until it reaches a maximum. After this maximum a increase in the independent variable has the contrary effect on the dependent variable
a. Example: The best example for this is from the Vorlesung with the Kuznets Curve (Both for environment as for inequality). What this means is (in the case of the environment) that when a country is poor (low GDP) an increase of GDP decreases the environmental quality. This happens until the maximum is reached (Certain GDP), afterwards an increase in GDP increases the environmental
b. In the regression we can see it when we have the same independent variable twice, but the second time it has ^2 (hoch zwei).
i. In all regressions we will see (probably) the first one will be positive and the second one will be negative (A Kuznets curve)
ii. The only possibilities of regressions that will have this are the effects of GDP on inequality and on environmental quality (If I see another one I will put it)
6. The constant ( C) just tells us which value would the dependent variable have IF all the independent variables were 0
a. Note: When the dependent variable is in LOG form, this can NOT be interpreted significantly
b. → You just say: “Because the dependent variable has a lOG form, the constant can not be interpreted significantly”
If you are not that strong in interpreting regressions I strongly suggest you focus more on the theory (which will give you the right explanations for the regressions) and then you will already have the right answer. You will just have to make the numbers you see fit the story you already have.
Why do we use so many (independent) variables?
We use so many variables to have a ceteris paribus effect, which means “all other things held constant”. We interpret it this way: If everything else is held constant (freezes) X has an effect on Y. It is important to do so, because there might be other variables which cause both X and Y (correlation does not imply causation). For example: If we did a regression with Life expectancy as our dependent variable (Y), and shoe size as our independent variable (X) the results would be: Shoe size determines how long does a person live. But that makes no sense! What happened is that there was a third variable (Sex (Geschlecht) )that explains both! (Men tend to have longer feet and live less, while women tend to have smaller feet and live more). So If we did a regression with life expectancy as dependent variable and shoe size and sex as independent variable, we would see that shoe size has NO effect on life expectancy.
Example of Interpreting linear regressions
This is how I would do it [WHICH DOES NOT MEAN IT IS RIGHT!]: The example is taken out of the UBUNG (GINI)
● We are trying to explain how globalization (ECOGLOB, SOCGLOB, POLGOBLOB) and Income (LOGGDP) affect inequality (GINI) using panel data from 1970 to 2009, and regions as control variables.
○ [We can tell they used panel data because of the 70_90 which means they used data from all those years]
○ From now on, it is important to remember that the higher the GINI coefficient the higher the inequality
○ Remember that everything with LOGs is in percentages
● From looking at the pvalues, we can tell that economic (0.0027 < 0.05) and political (0.023 < 0.05) globalization are significant at the 5% level; whereas social globalization (0.1852 > 0.05) is insignificant; which means we can not interpret the coefficient significantly. Changes in GDP per capita(LOG_GDP & LOG_GDP^2) also seem to have an effect on inequality (significant at the 1% level)
● Economic globalization seems to affect inequality positively (Because it has a positive coefficient). In fact, an increase in 1 unit of economic globalization increases the Gini coefficient by 0.0039%. → The more economic globalization there is, the more inequality.
● Political globalization, in turn, seems to decrease inequality. An increase in one unit of the Political Globalization Index decreases the Gini Coefficient by 0.0019%
● From the regression, we can tell that economic growth (measured with LOG_GDP &
LOG_GDP^2) has diminishing marginal returns on inequality. This means that: at first, an increase in GDP increases inequality, but each additional increase increases inequality less; until a maximum is reached. After that point, an increase in GDP decreases inequality.
● This seems to be in accordance to the theory (Kuznetz Curve), which tells us that in countries with high GDP, an increase in GDP decreases inequality, whereas in a country with low GDP an increase in GDP increases inequality.
● [OPTIONAL: I don’t think this is necessary, but can help you have a better grade]
● Our model explains 70.37 % of the variance in the GINI coefficient, we can tell this by looking at the R2 .
● The constant (C ) cannot be interpreted significantly, because the GINI coefficient is in LOG form. If it could be interpreted significantly, it would tell us which value GINI would have if all the independent variables were 0.
If you have any questions email me :) rrosasl@gmail.com Have fun!