• Keine Ergebnisse gefunden

Generalized Linear Models and Network Analysis – Project 2

N/A
N/A
Protected

Academic year: 2021

Aktie "Generalized Linear Models and Network Analysis – Project 2"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Generalized Linear Models and Network Analysis – Project 2

This is now a statistical analysis of data on the passengers on the ship called RMS Titanic.

When launching on May, 31st 1911 it was the largest ship on the world. Shortly afterwards, it clashed with a floating iceberg on April, 14th, 1912.

1. (Titanic Data) One of the best known data sets with categorical content is based on notes about the 2201 passengers on board of the Titanic. This data set is available fromhttp://

www.rdatamining.com/data/titanic.raw.rdataand contains information aboutClass with levelsFirst,Second,Third, andCrew, the passengersAgewith the two levelsAdult andChild, theGenderwithFemaleorMale, as also about the survival status (Survived) with levelsYesand No.

2. (Logistic Regresion) Find a suitable model for the survival status and interpret, how survival depends from the other attributes.

Hint: do net get confused when carefully checking thesummaryof your model fit. Therein, NA stands for not available and this is caused by. . . (any idea?).

3. (Odds and Odds Ratios) Calculate all the relevant odds, and the odds ratios that are especially relevant for you (probably for someone with Age = Adult and the respective gender level forGender).

Determine the odds ratio of your favorite class that allows for a comparison of age (to answer the question: would it have been better, if you would have been a child instead of an adult?)

Which type of passengers has generally (under the model you’ve found) the best and which the worst chance to survive.

(2)

In what follows we focus on Problem 5.2 in Alan Agresti’s book on Categorical Data Analysis.

This is a pretty basic analysis and should be easily done by every student after attending the classes.

1. (Data Definition) For the 23 space shuttle flights before theChallenger mission disaster in 1986, the table below shows the temperature at the time of the flight and whether at least one primary O-ring suffered thermal distress.

Ft F TD Ft F TD Ft F TD Ft F TD Ft F TD

1 66 0 2 70 1 3 69 0 4 68 0 5 67 0

6 72 0 7 73 0 8 70 0 9 57 1 10 63 1

11 70 1 12 78 0 13 67 0 14 53 1 15 67 1

16 75 0 17 70 0 18 81 0 19 76 0 20 79 0

21 75 1 22 76 0 23 58 1

2. (Logistic Regression) Use logistic regression to model the effect of temperature on the probability of thermal distress. Is a quadratic temperature effect necessary? Plot a figure of the fitted model and interpret its shape.

3. (Confidence Interval) Estimate the probability of a thermal distress at 31F, the tem- perature at the place and time of theChallenger flight. Also construct a 95 % confidence for it.

4. (Confidence Interval) Now construct a confidence interval for the effect of temperature on the odds of thermal distress, and test the statistical significance of the effect.

Hint: odds(t) = Pr(T D= 1|t)/Pr(T D= 0|t) = exp(β0) exp(βt)t, i.e. a confidence interval for exp(βt) is to be considered.

Referenzen

ÄHNLICHE DOKUMENTE

circulating mf; 2 determine whether serum from singly infected hamsters during the prepatent, patent or latent stages, or hyperinfection serum would influence the

That is the final step in the formulation of model (1); screening experiments can be carried out now. 1) Input variables can be separated into groups with the help of

3.2.2 Using concern-specific modularization mechanisms to imple- ment features This category of feature approaches basically uses a representa- tion of some other concern, such as

We derive a cost- sensitive perceptron learning rule for non-separable classes, that can be extended to multi-modal classes (DIPOL) and present a natural cost-sensitive extension of

Because personnel costs and operations and support costs consistently outpaced inflation, Secretary of Defense Gates reckoned the Department would need to see real defense

An axiomatization of the Public Good index for simple games was given in [Holler and Packel, 1983], so that some people also speak of the Holler–Packel index, and the generalization

\\amniocentesis for the withdrawal of amniotic fluid and radioimmunoassays for very sensitive and specific measurement of hormones and other physiological compounds have made

This thesis proposed general Variable Neighborhood Search (VNS) approaches for solving the Generalized Minimum Spanning Tree (GMST) problem and the Generalized Minimum Edge