• Keine Ergebnisse gefunden

Can one extract causal information from high-dimensional observational data?

N/A
N/A
Protected

Academic year: 2022

Aktie "Can one extract causal information from high-dimensional observational data?"

Copied!
87
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Applied Multivariate Statistics – Spring 2012 (not relevant for exam)

Can one extract

causal information from

high-dimensional observational data?

(2)

What is a causal effect?

(3)

What is a causal effect?

3 Markus Kalisch, ETH Zurich

Drowning accidents

(4)

What is a causal effect?

Drowning accidents Ice cream sales

(5)

What is a causal effect?

5 Markus Kalisch, ETH Zurich

Drowning accidents Ice cream sales

(6)

What is a causal effect?

Drowning accidents Ice cream sales

(7)

What is a causal effect?

7 Markus Kalisch, ETH Zurich

Drowning accidents Ice cream sales

?

(8)

What is a causal effect?

Drowning accidents Ice cream sales

(9)

What is a causal effect?

9 Markus Kalisch, ETH Zurich

Drowning accidents Ice cream sales

(10)

What is a causal effect?

Drowning accidents Ice cream sales

(11)

Another example: Smoking

11 Markus Kalisch, ETH Zurich

(12)

Scenario 1: Observe 1000 smoker and count the

incidence of lung cancer

(13)

Scenario 1: Observe 1000 smokers and count the incidence of lung cancer

Scenario 2: Make 1000 random people smoke and count the incidence of lung cancer

13 Markus Kalisch, ETH Zurich

(14)

Scenario 1: Observe 1000 smokers and count the incidence of lung cancer

Scenario 2: Make 1000 random people smoke and count the incidence of lung cancer

are different.

(15)

What is a causal effect?

15 Markus Kalisch, ETH Zurich

CHANGE BY

INTERVENTION

(16)

How to find causal effects?

(17)

How to find causal effects?

17 Markus Kalisch, ETH Zurich

Experimental Data

?

(18)

How to find causal effects?

Two groups of plots: Identical in all aspects (sunlight, water, soil quality, …)

Experimental

Data

(19)

How to find causal effects?

19 Markus Kalisch, ETH Zurich

Two groups of plots: Identical in all aspects (sunlight, water, soil quality, …) Practice: Randomized assignment

Experimental

Data

(20)

How to find causal effects?

Experimental

Data

(21)

How to find causal effects?

21 Markus Kalisch, ETH Zurich

Experimental

Data

(22)

How to find causal effects?

Experimental Data

Outcome due to fertilizer,

since everything else was equal

(23)

How to find causal effects?

Sometimes, randomized controlled experiments are

 too expensive (gene experiments)

 too time-consuming (gene experiments)

 unethical (HIV treatment)

 just not practical (smoking).

23 Markus Kalisch, ETH Zurich

(24)

If experiment is impossible… Observational

Data

(25)

… observe fields of two farmers.

25 Markus Kalisch, ETH Zurich

Observational

Data

(26)

… observe fields of two farmers. Observational Data

Groups not guaranteed

to be identical in all aspects (sunlight, water, soil quality, …)

(27)

… observe fields of two farmers.

27 Markus Kalisch, ETH Zurich

Observational

Data

(28)

… observe fields of two farmers. Observational Data

Is outcome due to fertilizer?

We can’t tell !

(29)

… observe fields of two farmers.

29 Markus Kalisch, ETH Zurich

Observational

Data

(30)

… observe fields of two farmers. Observational

Data

(31)

How to find causal effects?

Can one extract causal information from observational data alone?

31 Markus Kalisch, ETH Zurich

(32)

Goal of this talk

IDA finds a set of possible causal effects given

observational data consistently even in high dimensions.

 One element of the set is the true causal effect;

bounds on set are useful

 Does not replace randomized experiments

 Helps prioritizing and designing random experiments

IDA

(33)

Example

 Yeast: Saccharomyces cerevisiae

33 Markus Kalisch, ETH Zurich

(34)

Example

 Yeast: Saccharomyces cerevisiae

(35)

Example

 Yeast: Saccharomyces cerevisiae

 What are the causal effects among the thousands of genes?

35 Markus Kalisch, ETH Zurich

(36)

Example

 Yeast: Saccharomyces cerevisiae

 What are the causal effects among the thousands of genes?

Approach:

Model gene expression of each gene as a random variable.

Can we use the

joint distribution of gene expression to extract

causal information?

(37)

37 Markus Kalisch, ETH Zurich

Distribution oracle

Here is a distribution oracle.

Now find the causal

effect!

(38)

Outline in Theory

Causal Structure

do-calculus with known causal structure

Causal effects Distribution oracle

IDA

(39)

Pearl’s do-operator

 Notation for causal intervention

P(Y=y | do(X=x))

“distribution of Y, if there is an intervention in variable X”

 Causal effect

C(x’) = d/dx E[Y=y | do(X=x)]| x=x’

“change in expected value of Y, if there is an intervention in variable X”

39 Markus Kalisch, ETH Zurich

do-calculus

with known

causal structure

(40)

P(Y=y | X=x) ≠ P(Y=y | do(X=x))

P(rain | wet) = high

P(rain | do(wet)) =

= P(rain) =

= low

Pick a random day:

do-calculus

with known

causal structure

(41)

Pearl’s do-calculus

41 Markus Kalisch, ETH Zurich

Causal structure

X

Y

Z Rules:

Expression with “do”

Expression without “do”

Judea Pearl, “Causality”, 2010, Cambridge University Press

do-calculus

with known

causal structure

(42)

Example: Back-door Adjustment

Causal structure

X

Y Z

Rules

P(Y=y | do(X=x))

P(Y=y | X=x, Z=0) * P(Z=0) + P(Y=y | X=x, Z=1) * P(Z=1) Assume Z is binary (0/1)

do-calculus

with known

causal structure

(43)

Example: Back-door Adjustment

43 Markus Kalisch, ETH Zurich

Causal structure

X

Y Z

Rules

P(Y=y | do(X=x))

P(Y=y | X=x, Z=0) * P(Z=0) + P(Y=y | X=x, Z=1) * P(Z=1) Assume Z is binary (0/1)

“do”

do-calculus

with known

causal structure

(44)

Example: Back-door Adjustment

Causal structure

X

Y Z

Rules

P(Y=y | do(X=x))

P(Y=y | X=x, Z=0) * P(Z=0) + P(Y=y | X=x, Z=1) * P(Z=1) Assume Z is binary (0/1)

No “do”

do-calculus

with known

causal structure

(45)

Conclusion 1

45 Markus Kalisch, ETH Zurich

If causal structure is known, we can infer causal effects

from observations

do-calculus

with known

causal structure

(46)

Outline in Theory

Causal Structure

do-calculus with known causal structure

Causal effects Distribution oracle

IDA

(47)

Estimate Causal Structure

47 Markus Kalisch, ETH Zurich

Causal Structure

Oftentimes, causal structure is unknown

Estimate causal structure

(48)

Causal Directed Acyclic Graph (DAG)

X W

Z Y

Causal

Structure

(49)

49 Markus Kalisch, ETH Zurich

Causal Directed Acyclic Graph (DAG)

X W

Z Y

Random Variables Direct

cause

Causal

Structure

(50)

Causal Directed Acyclic Graph (DAG)

X W

Z Y

Random Variables Direct

cause

implies

Conditional independence relations among variables

Causal

Structure

(51)

Estimate a DAG model

51 Markus Kalisch, ETH Zurich

DAG encodes independence information

Independencies among

variables given by oracle

Reverse

engineering DAG

Causal

Structure

(52)

Estimate a DAG model

DAG encodes independence information

Independencies among

variables given by oracle

Reverse

engineering DAG

PC Algorithm

P . Spirtes, C. Glymour, R. Scheines, “Causation, Prediction, and Search”, 2000, MIT Press

Causal

Structure

(53)

Ambiguity: Equivalence class

53 Markus Kalisch, ETH Zurich

Several DAGs describe exactly the same list of independence relations

X W

Z Y

X W

Z Y

Causal

Structure

(54)

Ambiguity: Equivalence class

Several DAGs describe exactly the same list of independence relations

X W

Z Y

X W

Z Y

Causal

Structure

(55)

Ambiguity: Equivalence class

55 Markus Kalisch, ETH Zurich

Several DAGs describe exactly the same list of independence relations

X W

Z Y

X W

Z Y

X W

Z Y

Equivalence class: PARTIALLY Directed Acyclic Graph (PDAG)

Causal

Structure

(56)

Ambiguity: Equivalence class

Several DAGs describe exactly the same list of independence relations

X W

Z Y

X W

Z Y

X W

Z Y

Equivalence class: PARTIALLY Directed Acyclic Graph (PDAG)

Causal

Structure

(57)

Ambiguity: Equivalence class

57 Markus Kalisch, ETH Zurich

Some DAGs describe exactly the same list of independence relations

X W

Z Y

X W

Z Y

X W

Z Y

Equivalence class: PARTIALLY Directed Acyclic Graph (PDAG) PC Algorithm finds

equivalence class

Causal

Structure

(58)

Outline in Theory

Causal Structure

do-calculus with known causal structure

Causal effects Distribution oracle

IDA

Up to equivalence

class

(59)

Putting everything together

59 Markus Kalisch, ETH Zurich

Distribution

oracle PDAG

DAG 1

DAG n

Effect 1

Effect n

Set of causal effects

(60)

Putting everything together

Distribution

oracle PDAG

DAG 1

DAG n

Effect 1

Effect n

Set of causal effects

PC Algorithm

(61)

Putting everything together

61 Markus Kalisch, ETH Zurich

Distribution

oracle PDAG

DAG 1

DAG n

Effect 1

Effect n

Set of causal effects

PC Algorithm do-calculus

(62)

Putting everything together

Distribution

oracle PDAG

DAG 1

DAG n

Effect 1

Effect n

Set of causal effects

PC Algorithm do-calculus

Bounds, e.g.

minimum absolute value

(63)

Outline in Theory

63 Markus Kalisch, ETH Zurich

Equivalence class of

Causal Structure

Set of Causal effects Distribution oracle

 

do-calculus with known

causal structure

IDA

(64)

I’m busy!

Find your own

information on the

distribution…

(65)

Outline in Theory Practice

65 Markus Kalisch, ETH Zurich

Equivalence class of

Causal Structure

Set of Causal effects

Observational data

IDA do-calculus

with known

causal structure

(66)

Outline in Theory Practice

Equivalence class of

Causal Structure

Set of Causal effects

Observational data

IDA do-calculus

with known causal structure

Conditional

independence tests

(67)

Outline in Theory Practice

67 Markus Kalisch, ETH Zurich

Equivalence class of

Causal Structure

Set of Causal effects

Observational data

IDA do-calculus

with known causal structure

Conditional

independence tests

Estimated properties

of distribution

(68)

Outline in Theory Practice

Equivalence class of

Causal Structure

Set of Causal effects

Observational data

IDA do-calculus

with known causal structure

Conditional

independence tests

Estimated properties

of distribution

(69)

Consistency in high-dimensions: Gaussian case

Estimating graphical models with PC algorithm

69 Markus Kalisch, ETH Zurich

M. Kalisch, P. Bühlmann, “Estimating high-dimensional DAGs with the PC algorithm”, 2007, JMLR 8, 613 - 636

Do-calculus in high dimensions

M.H. Maathuis, M. Kalisch, P. Bühlmann,

“Estimating high-dimensional intervention effects from observational data”,

2009, Annals of Statistics 37, 3133 - 3164

(70)

Consistency in high-dimensions: Gaussian case

Estimating graphical models with PC algorithm

M. Kalisch, P. Bühlmann, “Estimating high-dimensional DAGs with the PC algorithm”, 2007, JMLR 8, 613 - 636

Do-calculus in high dimensions

M.H. Maathuis, M. Kalisch, P. Bühlmann,

“Estimating high-dimensional intervention effects from observational data”, 2009, Annals of Statistics 37, 3133 - 3164

Intervention effects if DAG is

Absent

(71)

Main assumptions & requirements

71 Markus Kalisch, ETH Zurich

• Gaussian data from unknown causal DAG

• Faithfulness to this DAG

• No hidden or selection variables

• Involves a tuning parameter

(72)

Experimental validation

Complex system

Experiment

Top causal effects

Observational data

Top causal effects

Agreement ?

IDA

(73)

Back to the beer:

Experimental

validation of IDA in

Saccharomyces cerevisiae

73 Markus Kalisch, ETH Zurich

(74)

Setting

 5361 observed genes

 Experiments: 234 single-gene deletion mutants

 Observational data: 63 wild-type cultures

 Very high dimensional: 5361 variables, 63 observations

(75)

75 Markus Kalisch, ETH Zurich

234 * 5360 effects

(76)

Top 10% causal effects from experiment

234 * 5360 effects

(77)

77 Markus Kalisch, ETH Zurich

Top 5000 Causal effects

Using IDA Top 10% causal

effects from experiment

234 * 5360 effects

(78)

Top 5000 Causal effects

Using IDA Top 10% causal

effects from experiment

Top 5000 effects using other

methods

234 * 5360 effects

(79)

79 Markus Kalisch, ETH Zurich

Top 10% causal effects from experiment

234 * 5360 effects False

Positives

True

Positives

(80)

T ru e Pos iti v es

False Positives 1000

800

600

400

200

0

0 1000 2000 3000 4000

IDA

Lasso

Elastic net

Random guessing

M.H. Maathuis, D. Colombo, M. Kalisch, P. Bühlmann,

“Predicting causal effects in large-scale systems from observational data”,

2010,

Nature Methods

7, 247 - 248

(81)

81 Markus Kalisch, ETH Zurich

T ru e Pos iti v es

False Positives 1000

800

600

400

200

0

0 1000 2000 3000 4000

IDA

Lasso

Elastic net

Random guessing

M.H. Maathuis, D. Colombo, M. Kalisch, P. Bühlmann,

“Predicting causal effects in large-scale systems from observational data”,

2010,

Nature Methods

7, 247 - 248 Top 1000 estimated effects

100

900

(82)

T ru e Pos iti v es

False Positives 1000

800

600

400

200

0

0 1000 2000 3000 4000

IDA

Lasso

Elastic net

Random guessing

M.H. Maathuis, D. Colombo, M. Kalisch, P. Bühlmann,

“Predicting causal effects in large-scale systems from observational data”,

2010,

Nature Methods

7, 247 - 248 Top 1000 estimated effects

130

870

(83)

83 Markus Kalisch, ETH Zurich

T ru e Pos iti v es

False Positives 1000

800

600

200

0

0 1000 2000 3000 4000

IDA

Lasso

Elastic net

Random guessing

M.H. Maathuis, D. Colombo, M. Kalisch, P. Bühlmann,

“Predicting causal effects in large-scale systems from observational data”,

2010,

Nature Methods

7, 247 - 248 Top 1000 estimated effects

400

600

(84)

Outline in Theory

Equivalence class of

Causal Structure

Set of Causal effects Distribution oracle

do-calculus with known causal structure

IDA

(85)

Outline in Theory Practice

85 Markus Kalisch, ETH Zurich

Equivalence class of

Causal Structure

Set of Causal effects

Observational data

IDA do-calculus

with known

causal structure

(86)

Summary of assumptions

• Data is faithful to an underlying causal DAG

• No hidden or selection variables

• Consistent in high-dimensions if - data multivariate normal

- some regularity conditions on partial correlations - underlying DAG is sparse

• For IDA also: All conditional expectations are linear

(87)

R

 Function “ida” in package “pcalg”

87 Markus Kalisch, ETH Zurich

Referenzen

ÄHNLICHE DOKUMENTE

[r]

11–13 In inter- national publications, social inequalities in terms of lung cancer incidence were also reported for income, educa- tion and occupational position 14 ; a recent

The n-butanol extract was used to evaluate the mechanism of induction of apoptosis in A549 human lung cancer cells and its effects on mitochondrial function and production

1A, compared with the control group, the sur- vival rate of A549 cells decreased when the cells were treated with the indicated concentrations of TTME for different times..

In section 2, Case 2, we show that when societies F and M merge and marriages are formed such that the number of households in the merged population is equal to the number

When we employ a fairly general measure of societal unhappiness, we find, quite startlingly, that holding incomes constant, the merger of two populations consisting each

Form what I understand about pilot studies it’s simply to show that you can conduct your study accurately on a small sample, as proof it will work on the larger one.. You also

The clinical tests conducted can be individual tests or a combi- nation of tests for assessment of CTS like Phalen’s test, reverse Phalen’s test, Durken’s carpel compression test,