• Keine Ergebnisse gefunden

Intelligent Systems

N/A
N/A
Protected

Academic year: 2022

Aktie "Intelligent Systems"

Copied!
18
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Intelligent Systems

Decision Making:

1. Bayesian Decision Theory

2. Non-Bayesian Formulations

(2)

Recognition (recap)

The model

Let two random variables be given:

β€’ The first one (π‘˜ ∈ 𝐾) is typically discrete and is called β€œclass”

β€’ The second one (π‘₯ ∈ 𝑋) is general (often continuous) and is called β€œobservation”

Let the joint probability distribution 𝑝(π‘₯, π‘˜) be β€œgiven”.

As π‘˜ is discrete it is often specified by 𝑝 π‘₯, π‘˜ = 𝑝 π‘˜ β‹… 𝑝(π‘₯|π‘˜) The recognition task: given π‘₯, estimate π‘˜.

Usual problems (questions):

β€’ How to estimate π‘˜ from π‘₯ (today)?

β€’ The joint probability is not always explicitly specified.

β€’ The set 𝐾 is sometimes huge.

(3)

Idea – a game

Somebody samples a pair according to a p.d.

He keeps hidden and presents to you

You decide for some according to a chosen decision strategy

Somebody penalizes your decision according to a Loss-function, i.e.

he compares your decision to the β€œtrue” hidden

You know both and the Loss-function (how does he compare) Your goal is to design the decision strategy in order to pay as less as possible in average.

(4)

Bayesian Risk

Notations:

The decision set . Note: it needs not to coincide with !!!

Examples – decisions β€œI don’t know”, β€œsurely not this class” etc.

Decision strategy (mapping) Loss-function

The Bayesian Risk – the expected loss:

(should be minimized with respect to the decision strategy)

(5)

Some special cases

General:

Almost always:

decisions can be made for different independently (the set of decision strategies is not restricted). Then:

Very often:

the decision set coincides with the set of classes, i.e.

(6)

Maximum A-posteriori Decision (MAP)

The Loss is the simplest one:

i.e. we pay 1 if the answer is not the true class, no matter what error we make.

From that follows:

(7)

A MAP example

Let be given. Conditional probability distributions for observations given classes are Gaussians:

The loss-function is , i.e. we want MAP.

The decision strategy (the mapping ) partitions the input space into two regions: the one corresponding to the first and the one corresponding to the second class. How this partition looks like?

(8)

A MAP example

For a particular we decide for 1, if

Special case (for simplicity)

β†’ the decision strategy is (derivation on the board)

β†’ a linear classifier – the hyperplane that is orthogonal to More classes, equal and β†’ Voronoi-diagram

More classes, equal and different β†’ Fischer-classifier Two classes, different β†’ a quadratic curve

etc.

(9)

Decision with rejection

The decision set is , i.e. extended by a special decision

β€œI don’t know”. The loss-function is

βˆ’ we pay a (reasonable) penalty if we are lazy to decide.

Case-by-case analysis:

1. We decide for a class , decision is MAP , the loss for this is

2. We decide to reject , the loss for this is

β†’ Compare with and decide for the variant with

(10)

Other simple loss-functions

Let the set of classes be structured Example:

The probability distribution is with observations and

continuous hidden value . Suppose, we know for a given for which we would like to infer .

The Bayesian Risk reads:

(11)

Other simple loss-functions

Simple delta-loss-function β†’ MAP (not interesting anymore) Loss may account for differences between the decision and the

β€œtrue” hidden value, for instance i.e. we pay depending on the distance. Than (see board again):

Other choices:

, combination with

(12)

Non-Bayesian decision making

Despite the generality of Bayesian approach, there are many tasks which cannot be expressed within the Bayesian framework:

β€’ It is difficult to establish a penalty function, e.g. it does not assume values from the totally ordered set.

β€’ A priori probabilities 𝑝 π‘˜ are not known or cannot be known because π‘˜ is not a random event.

An example – Russian fairy tales hero

When he turns to the left, he loses his horse, when he turns to the right, he loses his sword, and if he turns back, he loses his beloved girl.

Is the sum of 𝑝1 horses and 𝑝2 swords is less or more than 𝑝3 beloved girls?

(13)

Example: decisions while curing a patient

We have:

π‘₯ ∈ 𝑋 – observations (features) measured on a patient π‘˜ ∈ 𝐾 = {β„Žπ‘’π‘Žπ‘™π‘‘β„Žπ‘¦, π‘ π‘’π‘Ÿπ‘–π‘œπ‘’π‘ π‘™π‘¦ π‘ π‘–π‘π‘˜} – hidden states 𝑑 ∈ 𝐷 = {π‘‘π‘œ π‘›π‘œπ‘‘ π‘π‘’π‘Ÿπ‘’, π‘Žπ‘π‘π‘™π‘¦ π‘Ž π‘‘π‘Ÿπ‘’π‘”} – decisions Penalty function 𝐢: 𝐾 Γ— 𝐷 β†’ ?

Penalty problem – how to assign real number to a penalty?

𝐾\D π‘‘π‘œ π‘›π‘œπ‘‘ π‘π‘’π‘Ÿπ‘’ π‘Žπ‘π‘π‘™π‘¦ π‘Ž π‘‘π‘Ÿπ‘’π‘”

β„Žπ‘’π‘Žπ‘™π‘‘β„Žπ‘¦ Correct decision small health damage π‘ π‘’π‘Ÿπ‘–π‘œπ‘’π‘ π‘™π‘¦ π‘ π‘–π‘π‘˜ death possible Correct decision

(14)

An example – enemy or allied airplane?

Observation π‘₯ describes the observed airplane.

Two hidden states π‘˜ = 1 allied airplane π‘˜ = 2 enemy airplane

The conditional probability 𝑝(π‘₯|π‘˜) can depend on the observation π‘₯ in a complicated manner but it exists and describes dependence of the observation π‘₯ on the situation π‘˜ correctly.

A priori probabilities 𝑝(π‘˜) are not known and even cannot be known in principle.

β†’ the hidden state π‘˜ is not a random event.

(15)

Neyman-Pearson task

Observation π‘₯ ∈ 𝑋, two states: π‘˜ = 1 normal

π‘˜ = 2 π‘‘π‘Žπ‘›π‘”π‘’π‘Ÿπ‘œπ‘’π‘ 

The probability distribution of the observation π‘₯ depends on the state π‘˜ to which the object belongs, 𝑝 π‘₯ π‘˜ are known.

Given observation π‘₯, the task is to decide if the object is in the normal or dangerous state.

The set 𝑋 is to be partitioned into two such subsets 𝑋1 (normal states) and 𝑋2 (dangerous states), 𝑋 = 𝑋1 βˆͺ 𝑋2, 𝑋1 ∩ X2 = βˆ… Note: the observation π‘₯ can belong to both states β†’ there is no faultless strategy.

(16)

Neyman-Pearson task

The strategy is characterized by two numbers:

1. β€œProbability” of the false positive (false alarm) πœ” 1 =

π‘₯βˆˆπ‘‹2

𝑝(π‘₯|1)

2. β€œProbability” of the false negative (overlooked danger) πœ” 2 =

π‘₯βˆˆπ‘‹1

𝑝(π‘₯|2)

β†’ minimize the conditional probability of the false positive subject to the condition that the false negative is bounded:

π‘₯βˆˆπ‘‹2

𝑝(π‘₯|1) β†’ min

𝑋1,𝑋2

𝑠. 𝑑.

π‘₯βˆˆπ‘‹1

𝑝(π‘₯|2) ≀ πœ€

(17)

Neyman-Pearson task

Solution: Neyman–Pearson (1928, 1933)

The optimal strategy separates observation sets 𝑋1 and 𝑋2 according to a likelihood ratio by a threshold value πœƒ:

π‘„βˆ— =

π‘˜ = 1 𝑖𝑓 𝒑(𝒙|𝟏)

𝒑(𝒙|𝟐) > πœƒ π‘˜ = 2 otherwise

Other interesting non-Bayesian formulations:

1. Generalised Neyman-Pearson task for two dangerous states 2. Minimax task

3. Wald task 4. Linnik tasks

(18)

Conclusion and Outlook

Before:

1. Probability Theory 2. Decision making:

1. Bayesian Decision Theory: loss, risk …

2. Non-Bayesian formulations: Neyman-Pearson task … Next topics:

1. Probabilistic and discriminative learning (till 15.01) 2. Undirected graphical models (after 22.01)

Merry Christmas and happy New Year !!!

Referenzen

Γ„HNLICHE DOKUMENTE

FestkΓΆrperlaser sind Neodym- Glas und

This section provides an overview of the UV-l, including the hardware, baud rate selection, and the use of the hardware ports. 5.1.1 WARNINGI Extreme care must

CEPS acts as a leading forum for debate on EU affairs, distinguished by its strong in-house research capacity, complemented by an extensive network of partner institutes

Eve ry person and every company who purchases an Altair 8800 is entitled to a free, one year membership in the Altair Users Group_ This group (now numbering over 3,000)

gularitcrpundtum qucm habct mcm litera participtj: nempc in picl habent pundtum fcheua,iii hiphil pathah.85 in hithpael hirec:id quod coniugatio/. num formulX tibi

Hauptquantenzahl: n, Nebenquantenzahl: k (azimutale Quantenzahl mit den Werten: k = 1, 2, ... a) Berechnen Sie die kurze und lange Halbachse aller zu n = 1,2 und 3 gehΓΆrenden Bahnen

a) Berechnen Sie die kurze und lange Halbachse aller zu n = 1,2 und 3 gehΓΆrenden Bahnen und skizzieren diese Ellipsen (mit dem Kern im gemeinsamen Brennpunkt). BerΓΌcksichtigen

Practically all Egyptian rulers following the first period of Persian rule - that is every ruler whose name was actually mentioned in the text at all - are subject to