• Keine Ergebnisse gefunden

Exercise sheet 6 To be handed in on Thursday, 02.06.2016

N/A
N/A
Protected

Academic year: 2021

Aktie "Exercise sheet 6 To be handed in on Thursday, 02.06.2016 "

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Wissenschaftliches Rechnen II/Scientific Computing II

Sommersemester 2016 Prof. Dr. Jochen Garcke Dipl.-Math. Sebastian Mayer

Exercise sheet 6 To be handed in on Thursday, 02.06.2016

1 Some very basic probability theory

Let (X, Y ) be a tuple of random variables, each taking values in R , with joint probabi- lity density p(x, y), that is, P [(X, Y ) ≤ (x

0

, y

0

)] = R

x0

−∞

R

y0

−∞

p(x, y)dxdy. The marginal density of X is given by p

X

(x) = R

R

p(x, y)dy. The expectation of X is given by E[X] = R

R

xp

X

(x)dx. The covariance of X, Y is defined as cov(X, Y ) = E[(X−E[X])(Y −E[Y ])].

The conditional density of X given we have observed Y = y

0

(which can happen if p

Y

(y

0

) > 0) is defined by p(x|y

0

) =

p(x,yp 0)

Y(y0)

. The random variables X, Y are said to be independent if p(x, y) = p

X

(x)p

Y

(y). Bayes’ rule states

p(x|y) = p(y|x)p(x) p(y) .

A multivariate Gaussian random vector X with mean µ ∈ R

d

and symmetric, positive definite covariance matrix Σ ∈ R

d×d

has a probability density

p(x) = (2π)

−D/2

|Σ|

−1/2

exp(− 1

2 (x − µ)

T

Σ−1(x − µ)).

We write X ∼ N (µ, Σ).

2 Group exercises

G 1. (Bayesian analysis of linear regression) Consider the standard linear regression model

y

i

= x

Ti

w + ε

i

, i = 1, . . . , n.

where X = (x

1

, . . . , x

n

) ∈ R

d×n

is the matrix of given input vectors, w ∈ R

d

the unknown weight vector, and the ε

i

are i.i.d with ε

i

∼ N (0, σ

n2

).

a) Determine the probability density of y = (Y

1

, . . . , Y

n

).

b) The Bayesian approach is to specify a prior distribution over w, which expresses the belief about the value of w before observing the data. Assume w ∼ N (0, Σ

p

) with covariance matrix Σ

p

∈ R

d×d

. Derive via Bayes’ rule the posterior density of W , which expresses our beliefs about the value of w after observing the concrete data y = (y

1

, . . . , y

n

). Determine also the posterior density p(y

|y) of the predicted value y

= x

T

w given a new data point x

.

c) Show that E[y

] = x

T

Σ

p

X(K + σ

2n

)

−1

y, where K = X

T

Σ

p

X. Make a connection

between the Bayesian approach and regularization.

(2)

G 2. You are given a random vector U ∼ N (0, I

d

), that is, U is standard normally distributed and takes values in R

d

. For given mean m ∈ R

d

and covariance K ∈ R

d×d

, find a transformation ϕ : R

d

→ R

d

such that ϕ(U ) ∼ N (m, K).

G 3. (Lemma 46 revisited)

Assume to be given data (x

1

, y

1

), . . . , (x

n

, y

n

) and a Hilbert space H with kernel k. Let f

(xn,yn)

be the solution of

min

f∈H n−1

X

i=1

(f (x

i

) − y

i

)

2

+ λkf k

k

.

Let ˜ y

n

= f

(xn,yn)

(x

n

). Give an alternative proof of Lemma 46 based on the representer theorem. To this end, consider the system of linear equations (K + λI

n

) ˜ α = ˜ y, where

˜

y

i

= y

i

for i < n, and show that ˜ α

n

= 0.

3 Homework

H 1. (Smoothing spline)

For given data (x

1

, y

1

), . . . , (x

n

, y

n

) with x

0

= 0 < x

1

< x

2

< · · · < x

n

< 1 and x

i

∈ [0, 1]

and regularization parameter λ > 0, consider the problem min

f∈W2([0,1]) n

X

i=1

(y

i

− f (x

i

))

2

+ λ Z

1

0

(f

00

(x))

2

dx.

a) Give an explicit formula for the kernel R

1

(x, y) = R

1

0

G

2

(x, z)G

2

(y, z )dz, where G

2

is the Green’s function computed in Exercise G3 on Sheet 5.

b) Show that the optimal solution ˆ f

λ

has a representation ˆ f

λ

(x) = β

0

φ

0

(x) + β

1

φ

1

(x) + P

n

i=1

α

i

R

1

(x

i

, x). Specify φ

0

, φ

1

and show that β

0

, β

1

are unique.

c) Show that ˆ f

λ

is a polynomial of degree 3 on every interval [x

i

, x

i+1

] for i = 0, . . . , n−2 and a polynomial of degree 1 on [x

n

, 1].

d) To what reduces the solution ˆ f

λ

in the limits λ → 0 and λ → ∞. You don’t have to provide a proof, just give some plausible arguments.

(6 Punkte) H 2. (Cross-validation)

Provide a proof for Theorem 47 presented in the lecture. Hint 1: According to Lemma 46, we know that we obtain f

Dv

by learning on the modified data vector ˜ y

Dv

∈ R

N

given by

˜

y

Dv

= y − I

DDv

v

y + I

DDv

v

y

Dv

.

Use ˜ y

Dv

and the linearity of the smoothing matrix KG, which maps training values y to fitted values ˆ y, to prove Theorem 47.

Hint 2: Every positive definite m × m matrix M defines a positive definite kernel on R

m

via k

M

(x, y) = x

T

M y.

(4 Punkte) H 3. (Programming exercise: Cross-validation)

See the accompanying notebook.

(5 Punkte) H 4. (Programming exercise: Gaussian processes)

See the accompanying notebook.

(5 Punkte)

2

Referenzen

ÄHNLICHE DOKUMENTE

All group members need to attend the pre- sentation of your solution. Closing date for the programming exercise is

Furthermore assume the corresponding value function as continuous in Ω.. assume the extrema to be a

Wissenschaftliches Rechnen II/Scientific Computing II.. Sommersemester

(Worst-case integration error in kernel spaces). Consider the same setting as

Exercise sheet 11 To be handed in on Thursday, 07.06.2016. Application of PCA

Please note: The exercises will be neither collected, corrected, nor graded. Compute the least model step by step using the fixpoint iteration. a) Provide all minimal Herbrand

b) Please model important relationships between those concepts (e.g. “A student attends a lecture which is given by a Professor and an Assistant...”). You may also add additional

Applied Automata Theory (WS 2013/2014) Technische Universit¨ at Kaiserslautern. Exercise