The conditional probability distribution is p(x|k) =τ/2·exp −τ· |x−µk

(1)

MACHINE LEARNING II

11^th SEMINAR – DISCRIMINATIVE LEARNING

Exercise 1. Consider the following probability distribution for two classes and a real- valued observationx∈R. Let the prior probabilities for classes p(k),k=1,2 be given.

The conditional probability distribution is p(x|k) =τ/2·exp

−τ· |x−µ_k| ,

whereτ>0 is equal for both classes. Derive the posterior probability distributionp(k|x).

Hint: Note, that the conditional probability distribution is not everywhere differentiable.

Hence, perform a case-by-case analysis.

Exercise 2. Consider a quadratic classifier f :Rⁿ→ {0,1}for inputsx∈Rⁿ:

f(x) =

1 wenn x^T·A·x+hx,bi+c<0 0 sonst,

with an n×nmatrix A, a vector b∈Rⁿ and a constantc∈R. Show, how to learn the unknown parameters of the classifier by the Perceptron algorithm.

Hint: Transform the input spaceRⁿinto an appropriately chosen space of higher dimen- sion, in which the considered classifier is a linear one.

Exercise 3. Consider the "usual" Perceptron task, i.e. a learning of unknown parameters of a linear classifier hx,wi≷bgiven a training set. In addition, let it be required that certain parameters of the classifier are positive. For instance, for a particular component i_∗of the weight vectorw_i_∗>0 should hold. How to modify the Perceptron algorithm in such a way that it allows only those classifiers that fulfil these additional constraints?

Exercise 4. Prove the multi-perceptron algorithm considered in the lecture (slide 20).

Show that it is a Perceptron algorithm (slide 14) in an appropriately chosen space.

Hint: Represent the constraints of the multi-class problem hx^l,w_yli>hx^l,w_ki, for all l,k6=y^l as scalar productshx,˜ wi˜ >0 using suitably defined ˜xand ˜w.

1

(2)

Exercise 5. Definition: A classifier family shatters the set of data points if, for all classifications of these points, there exists a classifier such that the model makes no errors when evaluating that set of data points.

Let a set of data points(x¹,x². . .x^L),x^l∈Rⁿbe given. Give a transformationφ :Rⁿ→ R^d so that the corresponding family of generalized linear classifiershφ(x),wi≷0 shatters this training set.

Hint: Define a transformationφ :Rⁿ→R^L so that in the vector φ(x)each component

"is responsible" for one example of the training set.