Wissenschaftliches Rechnen II/Scientific Computing II
Sommersemester 2016 Prof. Dr. Jochen Garcke Dipl.-Math. Sebastian Mayer
Exercise sheet 6 To be handed in on Thursday, 02.06.2016
1 Some very basic probability theory
Let (X, Y ) be a tuple of random variables, each taking values in R , with joint probabi- lity density p(x, y), that is, P [(X, Y ) ≤ (x
0, y
0)] = R
x0−∞
R
y0−∞
p(x, y)dxdy. The marginal density of X is given by p
X(x) = R
R
p(x, y)dy. The expectation of X is given by E[X] = R
R
xp
X(x)dx. The covariance of X, Y is defined as cov(X, Y ) = E[(X−E[X])(Y −E[Y ])].
The conditional density of X given we have observed Y = y
0(which can happen if p
Y(y
0) > 0) is defined by p(x|y
0) =
p(x,yp 0)Y(y0)
. The random variables X, Y are said to be independent if p(x, y) = p
X(x)p
Y(y). Bayes’ rule states
p(x|y) = p(y|x)p(x) p(y) .
A multivariate Gaussian random vector X with mean µ ∈ R
dand symmetric, positive definite covariance matrix Σ ∈ R
d×dhas a probability density
p(x) = (2π)
−D/2|Σ|
−1/2exp(− 1
2 (x − µ)
TΣ−1(x − µ)).
We write X ∼ N (µ, Σ).
2 Group exercises
G 1. (Bayesian analysis of linear regression) Consider the standard linear regression model
y
i= x
Tiw + ε
i, i = 1, . . . , n.
where X = (x
1, . . . , x
n) ∈ R
d×nis the matrix of given input vectors, w ∈ R
dthe unknown weight vector, and the ε
iare i.i.d with ε
i∼ N (0, σ
n2).
a) Determine the probability density of y = (Y
1, . . . , Y
n).
b) The Bayesian approach is to specify a prior distribution over w, which expresses the belief about the value of w before observing the data. Assume w ∼ N (0, Σ
p) with covariance matrix Σ
p∈ R
d×d. Derive via Bayes’ rule the posterior density of W , which expresses our beliefs about the value of w after observing the concrete data y = (y
1, . . . , y
n). Determine also the posterior density p(y
∗|y) of the predicted value y
∗= x
T∗w given a new data point x
∗.
c) Show that E[y
∗] = x
T∗Σ
pX(K + σ
2n)
−1y, where K = X
TΣ
pX. Make a connection
between the Bayesian approach and regularization.
G 2. You are given a random vector U ∼ N (0, I
d), that is, U is standard normally distributed and takes values in R
d. For given mean m ∈ R
dand covariance K ∈ R
d×d, find a transformation ϕ : R
d→ R
dsuch that ϕ(U ) ∼ N (m, K).
G 3. (Lemma 46 revisited)
Assume to be given data (x
1, y
1), . . . , (x
n, y
n) and a Hilbert space H with kernel k. Let f
(xn,yn)be the solution of
min
f∈H n−1X
i=1
(f (x
i) − y
i)
2+ λkf k
k.
Let ˜ y
n= f
(xn,yn)(x
n). Give an alternative proof of Lemma 46 based on the representer theorem. To this end, consider the system of linear equations (K + λI
n) ˜ α = ˜ y, where
˜
y
i= y
ifor i < n, and show that ˜ α
n= 0.
3 Homework
H 1. (Smoothing spline)
For given data (x
1, y
1), . . . , (x
n, y
n) with x
0= 0 < x
1< x
2< · · · < x
n< 1 and x
i∈ [0, 1]
and regularization parameter λ > 0, consider the problem min
f∈W2([0,1]) n
X
i=1
(y
i− f (x
i))
2+ λ Z
10
(f
00(x))
2dx.
a) Give an explicit formula for the kernel R
1(x, y) = R
10
G
2(x, z)G
2(y, z )dz, where G
2is the Green’s function computed in Exercise G3 on Sheet 5.
b) Show that the optimal solution ˆ f
λhas a representation ˆ f
λ(x) = β
0φ
0(x) + β
1φ
1(x) + P
ni=1
α
iR
1(x
i, x). Specify φ
0, φ
1and show that β
0, β
1are unique.
c) Show that ˆ f
λis a polynomial of degree 3 on every interval [x
i, x
i+1] for i = 0, . . . , n−2 and a polynomial of degree 1 on [x
n, 1].
d) To what reduces the solution ˆ f
λin the limits λ → 0 and λ → ∞. You don’t have to provide a proof, just give some plausible arguments.
(6 Punkte) H 2. (Cross-validation)
Provide a proof for Theorem 47 presented in the lecture. Hint 1: According to Lemma 46, we know that we obtain f
Dvby learning on the modified data vector ˜ y
Dv∈ R
Ngiven by
˜
y
Dv= y − I
DDvv
y + I
DDvv