An intuitive introduction to An intuitive introduction to information theory information theory

(1)

An intuitive introduction to An intuitive introduction to

information theory information theory

Ivo Grosse

Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre Gatersleben-Halle

(2)

Outline Outline

 Why information theory?

 An intuitive introduction

(3)

33

History of biology History of biology

St. Thomas Monastry, Brno

(4)

Genetics Genetics

Gregor Mendel 1822 – 1884

1866 Mendel‘s laws

Foundation of Genetics Ca. 1900:

Biology becomes a

quantitative science

(5)

55

50 years later … 1953 50 years later … 1953

James Watson & Francis Crick

(6)

50 years later … 1953

(7)

77

(8)

DNA DNA

Watson & Crick 1953

Double helix structure of DNA

1953:

Biology becomes a

molecular science

(9)

99

1953 – 2003 … 50 years of revolutionary discoveries

(10)

1989 1989

(11)

1111

1989 1989

Goals:



Identify all of the ca. 30.000 genes



Identify all of the ca.

3.000.000.000 base pairs



Store all information in databases



Develop new software for data

analysis

(12)

2003 Human Genome Project officially finished

(13)

1313

2003 – 2053 … biology = information science

(14)

2003 – 2053 … biology = information science 2003 – 2053 … biology = information science

Systems Systems

Biology

(15)

1515

What is information?



Many intuitive definitions



Most of them wrong



One clean definition since 1948



Requires 3 steps

- Entropy

- Conditional entropy - Mutual information

(16)

Before starting with entropy … Before starting with entropy …

Who is the father of information theory?

Who is this?

Claude Shannon 1916 – 2001

A Mathematical Theory of Communication. Bell System

Technical Journal, 27, 379–423 &

623–656, 1948

(17)

1717

Before starting with entropy … Before starting with entropy …

Who is the grandfather of information theory?

Simon bar Kochba Ca. 100 – 135

Jewish guerilla fighter against Roman Empire (132 – 135)

(18)

Entropy Entropy

 Given a text composed from an alphabet of 32 letters (each letter equally probable)

 Person A chooses a letter X (randomly)

 Person B wants to know this letter

 B may ask only binary questions

 Question: how many binary questions must B ask in order to learn which letter X was chosen by A

 Answer: entropy H(X)

 Here: H(X) = 5 bit

(19)

1919

Conditional entropy (1) Conditional entropy (1)

 The sky is blu_

 How many binary questions?

 5?

 No!

 Why?

 What’s wrong?

 The context tells us “something” about the missing letter X

(20)

Conditional entropy (2) Conditional entropy (2)

 Given a text composed from an alphabet of 32 letters (each letter equally probable)

 Person A chooses a letter X (randomly)

 Person B wants to know this letter

 B may ask only binary questions

 A may tell B the letter Y preceding X

 E.g.

 L_

 Q_

 Question: how many binary questions must B ask in order to learn which letter X was chosen by A

(21)

2121

Conditional entropy (3) Conditional entropy (3)

 H(X|Y) <= H(X)

 Clear!

 In worst case – namely if B ignores all “information” in Y about X – B needs H(X) binary questions

 Under no circumstances should B need more than H(X) binary questions

 Knowledge of Y cannot increase the number of binary questions

 Knowledge can never harm! (mathematical statement, perhaps not true in real life )

(22)

Mutual information (1) Mutual information (1)

 Compare two situations:

 I: learn X without knowing Y

 II: learn X with knowing Y

 How many binary questions in case of I?  H(X)

 How many binary questions in case of II?  H(X|Y)

 Question: How many binary questions could B save in case of II?

 Question: How many binary questions could B save by knowing Y?

 Answer: I(X;Y) = H(X) – H(X|Y)

(23)

2323

Mutual information (2) Mutual information (2)

 H(X|Y) <= H(X)  I(X;Y) >= 0

 In worst case – namely if B ignores all information in Y about X or if there is no information in Y about X – then I(X;Y) = 0

 Information in Y about X can never be negative

 Knowledge can never harm! (mathematical statement, perhaps not true in real life )

(24)

Mutual information (3) Mutual information (3)

 Example 1: random sequence composed of A, C, G, T (equally probable)

 I(X;Y) = ?

 H(X) = 2 bit

 H(X|Y) = 2 bit

 I(X;Y) = H(Y) – H(X|Y) = 0 bit

 Example 2: deterministic sequence … ACGT ACGT ACGT ACGT …

 I(X;Y) = ?

 H(X) = 2 bit

 H(X|Y) = 0 bit

(25)

2525

Mutual information (4) Mutual information (4)

 I(X;Y) = I(Y;X)

 Always! For any X and any Y!

 Information in Y about X = information in X about Y

 Examples:

 How much information is there in the amino acid sequence about the secondary structure? How much information is there in the secondary structure about the amino acid sequence?

 How much information is there in the expression profile about the function of the gene? How much information is there in the function of the gene about the expression profile?

 Mutual information

(26)

Summary Summary

 Entropy

 Conditional entropy

 Mutual information

 There is no such thing as information content

 Information not defined for a single variable

 2 random variables needed to talk about information

 Information in Y about X

 I(X;Y) = I(Y;X)  info in Y about X = info in X about Y

 I(X;Y) >= 0  information never negative

 knowledge cannot harm

 I(X;Y) = 0 if and only if X and Y statistically independent