• Keine Ergebnisse gefunden

An intuitive introduction to An intuitive introduction to information theory information theory

N/A
N/A
Protected

Academic year: 2022

Aktie "An intuitive introduction to An intuitive introduction to information theory information theory"

Copied!
26
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

An intuitive introduction to An intuitive introduction to

information theory information theory

Ivo Grosse

Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre Gatersleben-Halle

(2)

Outline Outline

Why information theory?

An intuitive introduction

(3)

33

History of biology History of biology

St. Thomas Monastry, Brno

(4)

Genetics Genetics

Gregor Mendel 1822 – 1884

1866 Mendel‘s laws

Foundation of Genetics Ca. 1900:

Biology becomes a

quantitative science

(5)

55

50 years later … 1953 50 years later … 1953

James Watson & Francis Crick

(6)

50 years later … 1953

50 years later … 1953

(7)

77

(8)

DNA DNA

Watson & Crick 1953

Double helix structure of DNA

1953:

Biology becomes a

molecular science

(9)

99

1953 – 2003 … 50 years of revolutionary discoveries

1953 – 2003 … 50 years of revolutionary discoveries

(10)

1989 1989

(11)

1111

1989 1989

Goals:

Identify all of the ca. 30.000 genes

Identify all of the ca.

3.000.000.000 base pairs

Store all information in databases

Develop new software for data

analysis

(12)

2003 Human Genome Project officially finished

2003 Human Genome Project officially finished

(13)

1313

2003 – 2053 … biology = information science

2003 – 2053 … biology = information science

(14)

2003 – 2053 … biology = information science 2003 – 2053 … biology = information science

Systems Systems

Biology

Biology

(15)

1515

What is information?

What is information?

Many intuitive definitions

Most of them wrong

One clean definition since 1948

Requires 3 steps

- Entropy

- Conditional entropy - Mutual information

(16)

Before starting with entropy … Before starting with entropy …

Who is the father of information theory?

Who is this?

Claude Shannon 1916 – 2001

A Mathematical Theory of Communication. Bell System

Technical Journal, 27, 379–423 &

623–656, 1948

(17)

1717

Before starting with entropy … Before starting with entropy …

Who is the grandfather of information theory?

Simon bar Kochba Ca. 100 – 135

Jewish guerilla fighter against Roman Empire (132 – 135)

(18)

Entropy Entropy

Given a text composed from an alphabet of 32 letters (each letter equally probable)

Person A chooses a letter X (randomly)

Person B wants to know this letter

B may ask only binary questions

Question: how many binary questions must B ask in order to learn which letter X was chosen by A

Answer: entropy H(X)

Here: H(X) = 5 bit

(19)

1919

Conditional entropy (1) Conditional entropy (1)

The sky is blu_

How many binary questions?

5?

No!

Why?

What’s wrong?

The context tells us “something” about the missing letter X

(20)

Conditional entropy (2) Conditional entropy (2)

Given a text composed from an alphabet of 32 letters (each letter equally probable)

Person A chooses a letter X (randomly)

Person B wants to know this letter

B may ask only binary questions

A may tell B the letter Y preceding X

E.g.

L_

Q_

Question: how many binary questions must B ask in order to learn which letter X was chosen by A

(21)

2121

Conditional entropy (3) Conditional entropy (3)

H(X|Y) <= H(X)

Clear!

In worst case – namely if B ignores all “information” in Y about X – B needs H(X) binary questions

Under no circumstances should B need more than H(X) binary questions

Knowledge of Y cannot increase the number of binary questions

Knowledge can never harm! (mathematical statement, perhaps not true in real life )

(22)

Mutual information (1) Mutual information (1)

Compare two situations:

I: learn X without knowing Y

II: learn X with knowing Y

How many binary questions in case of I?  H(X)

How many binary questions in case of II?  H(X|Y)

Question: How many binary questions could B save in case of II?

Question: How many binary questions could B save by knowing Y?

Answer: I(X;Y) = H(X) – H(X|Y)

(23)

2323

Mutual information (2) Mutual information (2)

H(X|Y) <= H(X)  I(X;Y) >= 0

In worst case – namely if B ignores all information in Y about X or if there is no information in Y about X – then I(X;Y) = 0

Information in Y about X can never be negative

Knowledge can never harm! (mathematical statement, perhaps not true in real life )

(24)

Mutual information (3) Mutual information (3)

Example 1: random sequence composed of A, C, G, T (equally probable)

I(X;Y) = ?

H(X) = 2 bit

H(X|Y) = 2 bit

I(X;Y) = H(Y) – H(X|Y) = 0 bit

Example 2: deterministic sequence … ACGT ACGT ACGT ACGT …

I(X;Y) = ?

H(X) = 2 bit

H(X|Y) = 0 bit

(25)

2525

Mutual information (4) Mutual information (4)

I(X;Y) = I(Y;X)

Always! For any X and any Y!

Information in Y about X = information in X about Y

Examples:

How much information is there in the amino acid sequence about the secondary structure? How much information is there in the secondary structure about the amino acid sequence?

How much information is there in the expression profile about the function of the gene? How much information is there in the function of the gene about the expression profile?

Mutual information

(26)

Summary Summary

Entropy

Conditional entropy

Mutual information

There is no such thing as information content

Information not defined for a single variable

2 random variables needed to talk about information

Information in Y about X

I(X;Y) = I(Y;X)  info in Y about X = info in X about Y

I(X;Y) >= 0  information never negative

 knowledge cannot harm

I(X;Y) = 0 if and only if X and Y statistically independent

Referenzen

ÄHNLICHE DOKUMENTE

Knowledge, on the other hand, of a secondary proposition involving a degree of probability lower than certainity, together with knowledge of the premiss of the secondary

Thus the task of a many-body theory of field and matter is to derive equations of motion for the Green’s functions of the electromagnetic field (photon Green’s function) coupled to

Remark 7.41 The assembly maps appearing in the Farrell-Jones Isomorphism Conjecture 7.22 and in the Baum-Connes Conjecture 7.23 were defined differ- ently, in terms of index theory

For instance, there is the conjecture that the complex group ring C G is Noetherian if and only if G is virtually poly-cyclic.. Let us figure out whether there are idempotents x in

Hence a finite n-dimensional Poincaré complex is homotopy equivalent to a closed manifold only if the Spivak normal fibration has (stably) a vector bundle reduction.. There exists

A ring R is called regular if it is Noetherian and every finitely generated R-module possesses a finite projective resolution. Principal ideal domains

Theorem 2.16 (Existence and unique- ness of the Spivak normal fibration) Let X be a connected finite n-dimensional Poincar´ e complex.. It is unique up to strong fiber homo-

Hence a finite n-dimensional Poincaré complex is homotopy equivalent to a closed manifold only if the Spivak normal fibration has (stably) a vector bundle reduction.. There exists