• Keine Ergebnisse gefunden

Generalized Linear Models and Network Analysis – Project 0 (a/b) The data file

N/A
N/A
Protected

Academic year: 2021

Aktie "Generalized Linear Models and Network Analysis – Project 0 (a/b) The data file"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Generalized Linear Models and Network Analysis – Project 0 (a/b) The data file Spirometry.xlsx (available from the link below) contains information about spirometric examinations of 79 male healthy non-smokers from two different regions of Styria.

1. (Data Access) Access the data. You find the data either in http://www.stat.tugraz.

at/courses/files/Spirometry.xlsx (sheet bacteria) or by clicking on Aimu Data at http://www.stat.tugraz.at/courses/regression_analysis.html (only for those of you experiencing troubles with the use of read.xlsfrom within R).

Factor levelMstands for Murau, a typicalregionin upper Styria with plenty of woods and mostly without any industries. The other level is A and relates to Aichfeld, which is the center of F1 and MotoGP racing and has also lots of industries around. It’s therefore of central interest to check if there are region-specific differences in the Forced Vital Capacity (FVC, the volume of air that can forcibly be blown out, after full inspiration, measured in centiliter) and the Forced Expiratory Volume in 1 second (FEV1, the volume of air that can forcibly be blown out in first 1 second, after full inspiration, measured, in centiliter).

Also contained in the data is the persons age in years, body height in cm and body weightin kg.

2. (Explanatory Data Analysis) Start with a basic analysis of each variable of interest and then carefully study the relationship between the response and each potential predictor variable.

3. (Linear Regression) Find a proper regression model forFVC(orFEV1) dependending on age,weight, and height.

Assess your resulting regression model with respect to departures from the normal dis- tribution and from the assumption of constant variance (homoscedasticity) by means of suitable plots.

4. (Box-Cox Transformation) Now search for the optimal Box-Coxtransformation. Use a meaningful value close to the estimate ˆλjust found and transform your response variable.

Then check if the same set of predictors as in the linear model is still necessary. Test on the general necessity of such a transformation (H0 :λ= 1) a square root transformation (H0 :λ= 1/2) as also on the adequacy of a log-transformation (H0 : λ= 0). What are your findings?

Compare the goodness-of-fit of the linear model with that of the optimal Box-Cox model.

Has the structure in the respective diagnostic plots from the Box-Cox-model now improved (compared with that from the multiple linear regression model from before)?

5. (Generalized Linear Model) Now model the response by a GLMassuming a normal distribution and a log-link function to assure positive means. How does the model fit compare with the one when using a standard linear model with identity link function?

Would a GLM based ongammaresponses with log-link function even give better results?

Try to graphically compare the prediction regions under the normal and under the gamma model (both based on using the log-link) for a person from region M (weight 80 kg and height 180 cm) depending on age(shown as horizontal axis).

For the specialists amongst you, it could be a challenge to also construct pointwise 95 % confidence intervals (under both these GLMs) and to compare them graphically. Especially

(2)

check the area where both bands are nicely overlapping. Hint: remember what we have done to find confidence intervals for counts in Sectionlog-linear Poisson models.

Referenzen

ÄHNLICHE DOKUMENTE

On its most basic level, ORBIS calculates the time cost of travel between various nodes of the Roman

When I use a log-normal distribution for simulation purposes, I use a truncated version of the distribution, one which extends from zero up to some maximum value, let us say 6.

In our approach we have to synchronize the freight transport between these two networks: On the service network we search for future transport requests that

The IPCC community’s Shared Socio-economic Pathways (SSPs) are a set of alternative global development futures focused on drivers of challenges to mitigation of and adaptation

The conclusion driven in literature is that prices ( p 1 * ,p * 2 ) that bring the market into an equilibrium with positive profits for both enterprises, do not induce tendencies

4 Today, TIDES stands for Transformative Innovation for Development and Emergency Support, and refers pri- marily to the core group of staff and activities located at NDU’s

If we compare this procedure to the estimators based on the conventional propensity score, conven- tional propensity score combined with trimming rule 1 as well as the

Scenario 3 demonstrated that the merger of a relatively poor (in terms of income per capita) two-individual population with a relatively rich (in terms of