• Keine Ergebnisse gefunden

−0.4−0.20.00.20.40.6coeff | bounds

N/A
N/A
Protected

Academic year: 2021

Aktie "−0.4−0.20.00.20.40.6coeff | bounds"

Copied!
19
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

134

5 M o d e l d e v e lo p m e n t

5 .1 T h e T a s k

Whichexplanatoryvariablesshallappearinthemodelformulainwhichform?

ExampleConstructionCostofnuklearpowerplants

ExplanationTypeTrsf.

K

ConstructionCostamountlog

G

Capacityamountlog

D

Dateofpermissioncontin.–

W Z

Waitingtimebetw.application&perm.amount–

B Z

Constructiontimeamount–

Z

Follow-upplant(existingplantonsite)binary–

N E

SiteinNEoftheUSbinary–

K T

Coolingtowerbinary–

B W

ReactorbyBabcock-Wilcoxbinary–

N

Numberofplantsbuiltbythesamearchitects/engineersearlier,+1countsqrt

K G

Partialpriceguaranteeoftheprojectleadingenterprisebinary–

(2)

5.1

cFirstaidtransformations:

d

lo g

10

h K i = β

0

+ β

1

lo g

10

h G i + β

2

D + β

3

W Z + β

4

B Z + β

5

Z + β

6

N E + β

7

K + β

8

B W + β

9

√ N + β

10

K G +

Fehler

eAsingleterm

ttestfora

β

j

factor(nominalvariable)

− →

Ftestforseveral

β

j

Doesasignificancetestmakesenseinthiscontext?

(3)

136

Coefficients:ValueStd.ErrortvaluePr(

> | t |

)Signif(Intercept)-6.025862.34729-2.570.018*lg10(G)0.692540.137135.050.000***D0.095250.035802.660.015*WZ0.002630.009550.280.785BZ0.002290.001981.160.261Z-0.045730.03561-1.280.213NE0.110450.033913.260.004**KT0.053400.029701.800.087.BW0.012780.045370.280.781sqrt(N)-0.029970.01780-1.680.107KG-0.099510.05562-1.790.088.Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1

(4)

5 .2 A u to m a tic M o d e l S e le c tio n

aStepwisebackwards

bdelete

W Z

!

delete

B W , B Z , Z , √ N

und

K T

!

Coefficients:ValueStd.ErrortvaluePr(

> | t |

)Sign(Intercept)-3.46121.1458-3.020.005**log10(G)0.66290.12955.120.000***D0.06100.01603.820.001***NE0.08310.03302.520.018*KG-0.18440.0424-4.350.000***

cStepwiseforward...

(5)

138.2

eallsubsets.

fCriteria

1.„CoefficientofDetermination"

R

2ormultiplecorrelation

R

,

2.ValueofFteststatisticforthemodel,

3.PvalueforFtest,

4.Estimatedvarianceoferror

b σ

2,g

5.“Adjusted”coef.ofdet.:

R

2adj

= 1 −

n1np

(1 − R

2

)

6.

C

p

:=

SSQ (E)

/ b σ

2m

+ 2 p

− n = n (

MSE

/ b σ

2m

− 1 + 2 p

/ n )

, 7.Akaike’sInformationcriterionAIC

≈ C

p.

Largermodelarenotalwaysbetter!

(6)

5.2 h

C

pintheexample:

Add

K T

and

√ N

!

PvalueforKG:0.049.

(7)

140.2

iLassoPenalizedRegression:Penaltyonlargecoefficients

Minimize

Q β ; λ = X

i

R

2i

+ λ X

j

| β

j

| .

λ

:weightofpenalty

Variationof

λ − →

somecoefficients=0

− →

Modelselection

Standardize

X

(j)torender

β

jcomparable.

AdaptiveLasso:

Attachweightsto

β

j:weights

1 / b β

jwith

b β

jfrompreliminaryLassoestimate.

Lassoissuitableforlargenumbersof

X

(j),

p > > n

.

− →

Genomics,Proteomics

(8)

0.00.20.40.60.81.0

−0.4 −0.2 0.0 0.2 0.4 0.6

bounds

coeff | bounds

D D D D D D D D D D D D D D D D DD

WWWWWWWWWWWWWWWWW WBBBBBBBBBBB B B B B B B B

GG G G G G G G G G G G G G G G G G

ZZZZZZZZZZZZZZZZZZ EEEEE E E E E E E E E E E E EE

KKKKKKK K K K KKKKKKKK

bbbbbbbbbbbbbbbbb bNNNNNNNNNNNNNNNNNN *

*

*

*

******** * * * * * *

(9)

142.2

jChoiceoftheweightoftheL1penaltyterm,

λ

“Crossvalidation”,10-fold.

kIsthe“best”modelthetruemodel?

Considerseveralmodelsastheresultoftheanalysis

Amongall“good”modelschooseoneormoresuitableone(s)

byplausibilityandsubjectmatterknowledge!

ExploratorydataanalysiswillNOTfindthe“true”model

butseveralwhichfitthedatawell.

(10)

5 .3 C o lli n e a rit y

aModel

Y = X β + E

X

issingular,

X

(j)’scollinearif

X

singulär

⇐ ⇒ d e t h X i = 0 ⇐ ⇒

esgibt

c

mit

X c = 0 ( c 6 = 0 ) ⇐ ⇒

esgibtein

j

mit

x

(j)i

= e c

0

+ X

k6=j

e c

k

x

Parameternotuniquesince

X β = X ( β + γ c ) , γ

beliebig

bSolution:Deleteacolumn!

Caution:Interpretationofparametersmaychange!

(11)

144.3

cApproximatecollinearity

− →

parameterilldetermined

++++++

05

0 5

+ + + + + + +

0 3

Y

geschätzt

Modell

x

(2)

x

(1)

(12)

5.3

dLargestandarderrorsofestimates

− →

coefficientsinsignifica

eHowtodetectcollinearity?

–Standarderrorofthe

b β

j’s –Istherearelation

x

(j)i

≈ e c

0

+ P

k6=j

e c

k

x

(k)i ?

=Regressionproblem!Coefficientofdetermination

R

2j

orvarianceinflationfactorVIFj

= 1 / (1 − R

2j

)

(13)

146.3

fWhatremediesagainstcollinearity?

–Choiceofexperimentalconditions,

g–lineartransformationof

x su p j

’s,e.g.,sumanddifference

or“moreimportant”variableplusresidualsoftheotherone.

h–deletevariablewihthighest

R

2j !(Usuallyinsignifikant!)

i

*

RidgeRegression=PenalizedRegression.–Penalizesquared

β

j:

Q β ; λ = X

i

R

2i

+ λ X

j

β

2j

.

(14)

5 .4 S tr a te g ie s o f M o d e l D e v e lo p m e n t

aModelselectionisaninterplaybetween

availableknowledgefromsubjectmatter&statistics,

Residualanalysis,„detektivework",

automaticmodelselectionprocedures,

Residualanalysis,„detektivework",

Prinzipleofsimplicity,

Assessmentofplausibilityand

critiquebysubjectmatterknowledge.

(15)

148

0 .

Readdata,definevariablenames(soundstrivial...)

checkplausibility(screening),getacquainted

1 .

“Firstaid”Transformations.

2 .

Alargemodel

allvariables(maineffects),

Resultofastepwiseforwardselection

(16)

3 .

ExaminationoftheRandompart:

Outliersinresiduals,

Distributionofresiduals,

Equalityofvariances,

Independenceoferrors.

Itmaybewarrantedinviewoftheresultsto

transformthetargetvariable,

introduceweights,

userobustmethods(ifnotdoneroutinely)

(17)

150

4 .

Non-linearities.

5 .

Automaticmodelselection

6 .

Addvariables

7 .

Interactions

8 .

Influentialobservations

9 .

Critiquebysubjectmatterknowledge

1 0 .

Examinefit

1 1 .

Revision

1 2 .

Checkdeletedtermsagain

Celebrate!

(18)

5.4

bExampleconstructioncost

Question:Doespriceguaranteehelp?

Detectivewordgivesthemostconvincinganswer!

(19)

152

M e s s a g e s M o d e l d e v e lo p m e n t

1.Automaticmodelselectionproceduresareahelpfultool

butdonotfind“thetruth!”

2.Modelselectionisaninterplaybetween

availableknowledgefromsubjectmatter&statistics,

Residualanalysis,„detektivework",

automaticmodelselectionprocedures,

Residualanalysis,„detektivework",

Principleofsimplicity,

Plausibility&critiquebysubjectmatterknowledge.

Referenzen

ÄHNLICHE DOKUMENTE

Die Funktion f ist somit nur auf der imagin¨ aren und der reellen Achse komplex differenzierbar (nat¨ urlich mit Ausnahme des Nullpunktes, wo sie gar nicht definiert ist)..

Nagu lugeja juba teab, toimub suurte tehnoloogiliste süsteemide areng vastavalt mustrile, mida kirjeldan käesoleva töö suure tehnoloogilise süsteemi arengumudeli

[r]

115 = 19,1 zu nehmen sei, so wird damit eine annehmbare Dehnungsspannung vielleicht vorgeschrieben, der Aus- druck für die Sicherheit aber nicht mit dem wirklich

Во многих источниках отмечается, как важно в современном мире создавать разновозрастные группы, сколько возможностей есть у детей и педагогов

~ WHEN HIGH SPEED LINES ARE INSTALLED, THE ADJACENT ADDRESSES CANNOT BE USED.. ITS USE IS AUTHORIZED ONLY FOR RESPONDING TO A REQUEST FOR QUOTATION OR FOR THf

[r]

[r]