• Keine Ergebnisse gefunden

Historical start: Microarray data (Golub et al., 1999)

N/A
N/A
Protected

Academic year: 2022

Aktie "Historical start: Microarray data (Golub et al., 1999)"

Copied!
94
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Regression and classification

Let X be a p-dimensional predictor variable and Y the target variable of interest. Assume a linear model in which

Regression: Y ∈R

Y =Xβ+ε,

Classification: Y ∈ {0,1} or{−1,1}

P(Y = 1) =f(Xβ), wheref(x) = 1/(1 + exp(−x)) for some (sparse) vector β∈Rp, noise ε∈R.

Regression (or classification) is high-dimensional if p n.

(2)

Historical start: Microarray data (Golub et al., 1999)

Gene expression levels of more than 3000 genes are measured for n = 72 patients, either suffering from acute lymphoblastic leukemia (“X”, 47 cases) or acute myeloid leukemia (“O”, 25 cases). Obtained from Affymetrix oligonucleotide microarrays.

(3)

A look at (a binary version of) the data for a subset of patients and genes.

Gene 1 is here either modelled as on (above average activity; filled green square) or off (below average activity; empty square)

A M L A L L

?

peopl e

activity gene 1

(4)

A M L A L L

?

peopl e

activity gene 2

(5)

A M L A L L

?

peopl e

activity gene 20

(6)

A M L A L L

?

peopl e

activity gene 60

(7)

We have more variables (genes) than observations (patients):

high-dimensional data

(8)

AML ALL

?

Red bars show three types of people:

AML: known to haveacutemyeloidleukemia ALL: known to have acutelymphocyticleukemia

?: we dont known which subtype it is

(9)

select first gene 8 times... (non-integer values are also allowed)

AML

ALL

?

(10)

select second gene 9 times...

AML

ALL

?

(11)

select third gene once..

AML

ALL

?

(12)

select fourth gene 4 times...

AML

ALL

?

(13)

select fifth gene not at all, sixth gene 7 times...

AML

ALL

?

(14)

AML

ALL

?

(15)

AML

ALL

?

(16)

AML

ALL

?

(17)

AML

ALL

?

(18)

AML

ALL

?

(19)

AML

ALL

?

(20)

AML

ALL

?

(21)

AML

ALL

?

(22)

AML

ALL

?

(23)

AML

ALL

?

(24)

AML

ALL

?

(25)

AML

ALL

?

(26)

AML

ALL

?

(27)

AML

ALL

?

(28)

AML

ALL

?

(29)

AML

ALL

?

(30)

AML

ALL

?

(31)

AML

ALL

?

(32)

AML

ALL

?

(33)

AML

ALL

?

(34)

AML

ALL

?

(35)

AML

ALL

?

(36)

AML

ALL

?

(37)

AML

ALL

?

(38)

AML

ALL

?

(39)

AML

ALL

?

(40)

AML

ALL

?

(41)

AML

ALL

?

(42)

AML

ALL

?

(43)

AML

ALL

?

(44)

AML

ALL

?

(45)

AML

ALL

?

(46)

AML

ALL

?

(47)

AML

ALL

?

(48)

AML

ALL

?

(49)

AML

ALL

?

(50)

AML

ALL

?

(51)

AML

ALL

?

(52)

AML

ALL

?

(53)

AML

ALL

?

(54)

AML

ALL

?

(55)

AML

ALL

?

(56)

AML

ALL

?

(57)

AML

ALL

?

(58)

AML

ALL

?

(59)

AML

ALL

?

(60)

AML

ALL

?

(61)

AML

ALL

?

(62)

AML

ALL

? People with known type

(63)

AML

ALL

? People with unknown type

type

"ALL" ?

(64)

Selecting a small subset of variables

How do we get the best set of 10 genes out of all available variables?

- If we check all possible combinations ofbest set of 10 genes out of 60 genes in total, and a computer that checks a million sets per second, it takes about

20.9 hours ≈ 1 day.

- If we have to select thebest set of 10 genes out of 3000 genes, and have thousand such machines, it takes about

500 x estimated time since big bang

(65)

Selecting a small subset of variables

How do we get the best set of 10 genes out of all available variables?

- If we check all possible combinations ofbest set of 10 genes out of 60 genes in total, and a computer that checks a million sets per second, it takes about

20.9 hours ≈ 1 day.

- If we have to select thebest set of 10 genes out of 3000 genes, and have thousand such machines, it takes about

500 x estimated time since big bang

(66)

Basis Pursuit (Chen et al. 99) and Lasso (Tibshirani 96)

Let Y be the n-dimensional response vector andX then×p-dimensional design.

Basis Pursuit:

βˆ= argminkβk1 such that Y =Xβ.

Lasso:

βˆτ = argminkβk1 such that kY −Xβk2 ≤τ.

Equivalent to

βˆλ = argminkY −Xβk2+λkβk1. Combines sparsity (some ˆβ-components are 0) and convexity.

(67)
(68)
(69)

When does it work?

For predictionoracle inequalities in the sense that kX( ˆβ−β)k22/n ≤ cσ2log(p)s

n

for some constant c >0 and noise varianceσ2 >0, needRestricted Isometry Property(Candes, 2006) or weaker compatibility condition (Geer, 2008). Slower convergence rates possible with weaker

assumptions (Greenstein and Ritov, 2004).

For correct variable selection in the sense that P

∃λ:{k : ˆβkλ6= 0}={k :βk 6= 0}

≈1,

need strong irrepresentable(Zhao and Yu, 2006) or neighbourhood stability condition (NM and B¨uhlmann, 2006).

(70)

When does it work?

For predictionoracle inequalities in the sense that kX( ˆβ−β)k22/n ≤ cσ2log(p)s

n

for some constant c >0 and noise varianceσ2 >0, needRestricted Isometry Property(Candes, 2006) or weaker compatibility condition (Geer, 2008). Slower convergence rates possible with weaker

assumptions (Greenstein and Ritov, 2004).

For correct variable selection in the sense that P

∃λ:{k : ˆβkλ6= 0}={k :βk 6= 0}

≈1,

need strong irrepresentable(Zhao and Yu, 2006) or neighbourhood stability condition (NM and B¨uhlmann, 2006).

(71)

Compatibility condition

The usual minimal eigenvalue of the design min{kXβk22 :kβk2= 1}

always vanishes for high-dimensional data with p >n.

The φbe the (L,S)-restricted eigenvalue (Geer, 2007):

φ2(L,S) = min{skXβk22 :kβSk1= 1 andkβSck1≤L}, where

S ={k :βk6= 0}, s =|S|, and

S)kk1{k ∈S} .

(72)

Compatibility condition

The usual minimal eigenvalue of the design min{kXβk22 :kβk2= 1}

always vanishes for high-dimensional data with p >n.

The φbe the (L,S)-restricted eigenvalue (Geer, 2007):

φ2(L,S) = min{skXβk22 :kβSk1= 1 andkβSck1≤L}, where

S ={k :βk6= 0}, s =|S|, and

S)kk1{k ∈S}

.

(73)

Ifφ(L,S)>c >0 for someL>1, then we get oracle rates for prediction and convergence of kβ−βˆλk1.

Ifφ(1,S)>0, then the following two are identical argminkβk0 such thatXβ =Xβ argminkβk1 such thatXβ =Xβ.

The latter equivalence requires otherwise the stronger Restricted Isometry Property which implies that∃δ <1 such that

∀b with kbk0≤s : (1−δ)kbk22≤ kXbk22≤(1 +δ)kbk22, which can be a useful assumption for random designs X, as in compressed sensing.

(74)

Ifφ(L,S)>c >0 for someL>1, then we get oracle rates for prediction and convergence of kβ−βˆλk1.

Ifφ(1,S)>0, then the following two are identical argminkβk0 such thatXβ =Xβ argminkβk1 such thatXβ =Xβ.

The latter equivalence requires otherwise the stronger Restricted Isometry Property which implies that∃δ <1 such that

∀b with kbk0≤s : (1−δ)kbk22≤ kXbk22≤(1 +δ)kbk22, which can be a useful assumption for random designs X, as in compressed sensing.

(75)

Applications of linear models

(76)

Applications of linear models

(77)

Applications of linear models

(78)

Medical data

OMOP: Observational Medical Outcomes Project (omop.org)

1 Collect medical information (drugs taken, symptoms diagnosed) for 100.000 patients

2 In total, about 15.000 drugs and 15.000 distinct symptoms encoded.

(79)

Try to detect drug-drug interactions or make risk assesments based on medical data:

Is drug A changing the risk of a heart attack if taken together with drug B for patients with a symptom S ?

Can generate very high-dimensional data quickly if expanding interactions as new dummy variables (more than>1012 interactions of third order).

(80)

Try to detect drug-drug interactions or make risk assesments based on medical data:

Is drug A changing the risk of a heart attack if taken together with drug B for patients with a symptom S ?

Can generate very high-dimensional data quickly if expanding interactions as new dummy variables (more than>1012 interactions of third order).

(81)

Compressed sensing: one-pixel camera

Images are often sparse after taking a wavelet transformation X: u =Xw, where

w ∈Rn: original image asn-dimensional vector X ∈Rn×n: wavelet transformation

u ∈Rn: vector with wavelet coefficients

(82)

Original wavelet transformation:

u =Xw, where

The wavelet coefficients u are often sparse in the sense that it has only a few large entries. Keeping just a few of them allows a very good

reconstruction of the original image w.

Let ˜u =u1{|U| ≥τ}be the hard-thresholded coefficients (easy to store).

Then re-construct image as ˜w =X−1u.˜

(83)

Conventional way:

measure image w with 16 million pixels convert to wavelet coefficientsu =Xw

throw away most ofu by keeping just the largest coefficients Is efficient as long as pixels are cheap.

(84)

For situations where pixels are expensive (different wavelengths, MRI) can do compressed sensing: observe only

y = Φu= Φ(Xw),

where forq n, matrix Φ∈Rq×nhas iid entries drawn from N(0,1).

One entry ofq-dimensional vectory is thus observed by a random transformation of the original image.

(Pseudo) Random Optical Projections

Bi tt l d d i t i

• Binary patterns are loaded into mirror array:

– light reflected towards the lens/photodiode (1)

– light reflected elsewhere (0) – pixel-wise products summed

by lensy

• Pseudorandom number generator outputs measurement basis vectors …

Each random mask corresponds to one row of Φ.

Reconstruct u by Basis Pursuit:

ˆ

u = argminkuk˜ 1 such that Φ ˜u =y.

(85)

Observe

y = Φu= Φ(Xw),

where forq n, matrix Φ∈Rq×nhas iid entries drawn from N(0,1).

Reconstruct wavelet coefficients u by Basis Pursuit:

ˆ

u = argminkuk˜ 1 such that Φ ˜u =y.

Matrix Φ satisfies for q ≥slog(p/s) with high probability theRandom Isometry Property, including the existence of aδ <1 such that (Candes, 2006) for all s-sparse vectors

(1−δ)kbk22 ≤ kΦbk22 ≤(1 +δ)kbk22.

Hence, if original wavelet coeffcients are s-sparse, we only need to make of orderslog(n/s) measurements to recoveru exactly (with high probability)!

(86)

Observe

y = Φu= Φ(Xw),

where forq n, matrix Φ∈Rq×nhas iid entries drawn from N(0,1).

Reconstruct wavelet coefficients u by Basis Pursuit:

ˆ

u = argminkuk˜ 1 such that Φ ˜u =y.

Matrix Φ satisfies for q ≥slog(p/s) with high probability theRandom Isometry Property, including the existence of aδ <1 such that (Candes, 2006) for all s-sparse vectors

(1−δ)kbk22≤ kΦbk22 ≤(1 +δ)kbk22.

Hence, if original wavelet coeffcients are s-sparse, we only need to make of orderslog(n/s) measurements to recoveru exactly (with high probability)!

(87)

Rice CI Camera

Object Light

Lens 1

DMD+ALP Board

Lens 2

Photodiode circuit

dsp.rice.edu/cs/camera

(88)

Image Acquisition

dsp.rice.edu/cs/camera

(89)

Mind reading

Can use Lasso-type inference to infer for a single voxel in the early visual cortex which stimuli lead to neuronal activity using fmri-measurements (Nishimoto et al., 2011 at Gallant Lab, UC Berkeley).

Voxel A

Show movies and detect which parts of the image a particular voxel of 100k neurons is sensitive to.

(90)

Voxel A Voxel B Voxel C

page 22

December 10, 2012

Back to fMRI prblem:

Spatial Locations of Selected Features

Voxel A Voxel B Voxel C

CV

ES-CV

Prediction on Voxels A-C: CV 0.72, ES-CV 0.7

page 22

December 10, 2012

Back to fMRI prblem:

Spatial Locations of Selected Features

Voxel A Voxel B Voxel C

CV

ES-CV

Prediction on Voxels A-C: CV 0.72, ES-CV 0.7

page 22

December 10, 2012

Back to fMRI prblem:

Spatial Locations of Selected Features

Voxel A Voxel B Voxel C

CV

ES-CV

Prediction on Voxels A-C: CV 0.72, ES-CV 0.7

Dots indicate large regression coefficients and thus important regions for a region/voxel in the brain:

- Voxel A is stimulated by activity in the centre-left of the visual field - Voxel B is stimulated by activity in the top right of the visual field - Voxel C is stimulated by activity in the very centre of the visual field

(91)

Allows to forecast brain activity at all voxels, given an image.

Voxel A

?

(92)

Given only brain activity, can reverse the process and ask which image best explains the neuronal activity (given the learned regressions).

?

(93)

Top: seen image/movie

Bottom: image reconstructed from brain activity

(94)

Referenzen

ÄHNLICHE DOKUMENTE

Our real-time PCR experiments confirmed the results of the microarray analysis that OSTL is expressed at lower levels in ALL than in myeloid leukemia, especially CML.. The ALL

Overexpression of ABR leads to an induction of transcription factor C/EBPα, and thereby increases the expression of M-CSF-R, G-CSF-R and miR-223, suggesting that ABR

In the DUX4 subtype, we report lncRNAs signature (n = 185, both cis and trans based analysis) associated with pathways reported to play a key role in

Leon-Rot, Germany L-Glutamine Pan Biotech, Aidenbach, Germany Lithium acetate Sigma, Taufkirchen, Germany LSI/WCP hybridization buffer Abbott Molecular Inc., USA

In this investigation we applied state of the art proteomic technology that includes the characterisation and identification of MS based peak pattern analysis to

To further explore the association between RUNX1 and SRSF2 mutations, we analyzed the SRSF2 gene in a cohort of 14 patients with a known RUNX1 mutation and normal karyotype

As a consequence, we assessed the number of apoptotic cells after in vitro drug treatment of single cell clones by gating on living cells in

However, since the CD47 binding component in the liCAD molecule is the naturally occurring extracellular domain of Sirp α, which only weakly interacts with CD47, we hypothesize