Generalized Linear Models and Network Analysis – Project 2
This is now a statistical analysis of data on the passengers on the ship called RMS Titanic.
When launching on May, 31st 1911 it was the largest ship on the world. Shortly afterwards, it clashed with a floating iceberg on April, 14th, 1912.
1. (Titanic Data) One of the best known data sets with categorical content is based on notes about the 2201 passengers on board of the Titanic. This data set is available fromhttp://
www.rdatamining.com/data/titanic.raw.rdataand contains information aboutClass with levelsFirst,Second,Third, andCrew, the passengersAgewith the two levelsAdult andChild, theGenderwithFemaleorMale, as also about the survival status (Survived) with levelsYesand No.
2. (Logistic Regresion) Find a suitable model for the survival status and interpret, how survival depends from the other attributes.
Hint: do net get confused when carefully checking thesummaryof your model fit. Therein, NA stands for not available and this is caused by. . . (any idea?).
3. (Odds and Odds Ratios) Calculate all the relevant odds, and the odds ratios that are especially relevant for you (probably for someone with Age = Adult and the respective gender level forGender).
Determine the odds ratio of your favorite class that allows for a comparison of age (to answer the question: would it have been better, if you would have been a child instead of an adult?)
Which type of passengers has generally (under the model you’ve found) the best and which the worst chance to survive.
In what follows we focus on Problem 5.2 in Alan Agresti’s book on Categorical Data Analysis.
This is a pretty basic analysis and should be easily done by every student after attending the classes.
1. (Data Definition) For the 23 space shuttle flights before theChallenger mission disaster in 1986, the table below shows the temperature at the time of the flight and whether at least one primary O-ring suffered thermal distress.
Ft ◦F TD Ft ◦F TD Ft ◦F TD Ft ◦F TD Ft ◦F TD
1 66 0 2 70 1 3 69 0 4 68 0 5 67 0
6 72 0 7 73 0 8 70 0 9 57 1 10 63 1
11 70 1 12 78 0 13 67 0 14 53 1 15 67 1
16 75 0 17 70 0 18 81 0 19 76 0 20 79 0
21 75 1 22 76 0 23 58 1
2. (Logistic Regression) Use logistic regression to model the effect of temperature on the probability of thermal distress. Is a quadratic temperature effect necessary? Plot a figure of the fitted model and interpret its shape.
3. (Confidence Interval) Estimate the probability of a thermal distress at 31◦F, the tem- perature at the place and time of theChallenger flight. Also construct a 95 % confidence for it.
4. (Confidence Interval) Now construct a confidence interval for the effect of temperature on the odds of thermal distress, and test the statistical significance of the effect.
Hint: odds(t) = Pr(T D= 1|t)/Pr(T D= 0|t) = exp(β0) exp(βt)t, i.e. a confidence interval for exp(βt) is to be considered.