An intuitive introduction to An intuitive introduction to
information theory information theory
Ivo Grosse
Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre Gatersleben-Halle
Outline Outline
Why information theory?
An intuitive introduction
33
History of biology History of biology
St. Thomas Monastry, Brno
Genetics Genetics
Gregor Mendel 1822 – 1884
1866 Mendel‘s laws
Foundation of Genetics Ca. 1900:
Biology becomes a
quantitative science
55
50 years later … 1953 50 years later … 1953
James Watson & Francis Crick
50 years later … 1953
50 years later … 1953
77
DNA DNA
Watson & Crick 1953
Double helix structure of DNA
1953:
Biology becomes a
molecular science
99
1953 – 2003 … 50 years of revolutionary discoveries
1953 – 2003 … 50 years of revolutionary discoveries
1989 1989
1111
1989 1989
Goals:
Identify all of the ca. 30.000 genes
Identify all of the ca.
3.000.000.000 base pairs
Store all information in databases
Develop new software for data
analysis
2003 Human Genome Project officially finished
2003 Human Genome Project officially finished
1313
2003 – 2053 … biology = information science
2003 – 2053 … biology = information science
2003 – 2053 … biology = information science 2003 – 2053 … biology = information science
Systems Systems
Biology
Biology
1515
What is information?
What is information?
Many intuitive definitions
Most of them wrong
One clean definition since 1948
Requires 3 steps
- Entropy
- Conditional entropy - Mutual information
Before starting with entropy … Before starting with entropy …
Who is the father of information theory?
Who is this?
Claude Shannon 1916 – 2001
A Mathematical Theory of Communication. Bell System
Technical Journal, 27, 379–423 &
623–656, 1948
1717
Before starting with entropy … Before starting with entropy …
Who is the grandfather of information theory?
Simon bar Kochba Ca. 100 – 135
Jewish guerilla fighter against Roman Empire (132 – 135)
Entropy Entropy
Given a text composed from an alphabet of 32 letters (each letter equally probable)
Person A chooses a letter X (randomly)
Person B wants to know this letter
B may ask only binary questions
Question: how many binary questions must B ask in order to learn which letter X was chosen by A
Answer: entropy H(X)
Here: H(X) = 5 bit
1919
Conditional entropy (1) Conditional entropy (1)
The sky is blu_
How many binary questions?
5?
No!
Why?
What’s wrong?
The context tells us “something” about the missing letter X
Conditional entropy (2) Conditional entropy (2)
Given a text composed from an alphabet of 32 letters (each letter equally probable)
Person A chooses a letter X (randomly)
Person B wants to know this letter
B may ask only binary questions
A may tell B the letter Y preceding X
E.g.
L_
Q_
Question: how many binary questions must B ask in order to learn which letter X was chosen by A
2121
Conditional entropy (3) Conditional entropy (3)
H(X|Y) <= H(X)
Clear!
In worst case – namely if B ignores all “information” in Y about X – B needs H(X) binary questions
Under no circumstances should B need more than H(X) binary questions
Knowledge of Y cannot increase the number of binary questions
Knowledge can never harm! (mathematical statement, perhaps not true in real life )
Mutual information (1) Mutual information (1)
Compare two situations:
I: learn X without knowing Y
II: learn X with knowing Y
How many binary questions in case of I? H(X)
How many binary questions in case of II? H(X|Y)
Question: How many binary questions could B save in case of II?
Question: How many binary questions could B save by knowing Y?
Answer: I(X;Y) = H(X) – H(X|Y)
2323
Mutual information (2) Mutual information (2)
H(X|Y) <= H(X) I(X;Y) >= 0
In worst case – namely if B ignores all information in Y about X or if there is no information in Y about X – then I(X;Y) = 0
Information in Y about X can never be negative
Knowledge can never harm! (mathematical statement, perhaps not true in real life )
Mutual information (3) Mutual information (3)
Example 1: random sequence composed of A, C, G, T (equally probable)
I(X;Y) = ?
H(X) = 2 bit
H(X|Y) = 2 bit
I(X;Y) = H(Y) – H(X|Y) = 0 bit
Example 2: deterministic sequence … ACGT ACGT ACGT ACGT …
I(X;Y) = ?
H(X) = 2 bit
H(X|Y) = 0 bit
2525
Mutual information (4) Mutual information (4)
I(X;Y) = I(Y;X)
Always! For any X and any Y!
Information in Y about X = information in X about Y
Examples:
How much information is there in the amino acid sequence about the secondary structure? How much information is there in the secondary structure about the amino acid sequence?
How much information is there in the expression profile about the function of the gene? How much information is there in the function of the gene about the expression profile?
Mutual information
Summary Summary
Entropy
Conditional entropy
Mutual information
There is no such thing as information content
Information not defined for a single variable
2 random variables needed to talk about information
Information in Y about X
I(X;Y) = I(Y;X) info in Y about X = info in X about Y
I(X;Y) >= 0 information never negative
knowledge cannot harm
I(X;Y) = 0 if and only if X and Y statistically independent