with1 ≤ i ≤ k ),weobtainthe uniformdistribution . p = / foreach i probabilitydistribution .Ifalleventsoccurwiththesameprobability(i.e., e isoneofthepossibleresultsofastochasticexperiment.Theassignmentofprobabilitiestotheelementaryeventsiscalleda ∑ p = Pr

(1)

Some Foundations of Probability Theory

We are concerned with finite probability spaces only, which are specified by a finite set

E ={e₁,e2, . . . ,ek}

of elementary events each of which is assigned a probability p_i=Pr(e_i), 0≤p_i≤1, such that∑^k_i=1p_i = 1.

Each e_i is one of the possible results of a stochastic experiment.

The assignment of probabilities to the elementary events is called a probability distribution.

If all events occur with the same probability (i.e.,p_i=¹/k for eachi with 1≤i≤k), we obtain theuniform distribution.

J. Rothe (HHU D¨usseldorf) Cryptocomplexity I 1 / 63

(2)

Some Foundations of Probability Theory

The assignment of probabilities can be extended from the elementary eventsei to any subsetE⊆E by defining Pr(E) =

∑

ei∈E

pi. Such a subsetE is said to be an event. If we have the uniform distribution on E, then Pr(E) =^kEk/^kEk is simply the ratio of the number of “good” cases and the number of “all” cases.

The following basic properties of the probability functionPr(·) are easy to see:

1 0≤Pr(E)≤1, wherePr(/0) = 0 andPr(E) = 1.

2 Pr(E) = 1−Pr(E), whereE=E−E is the complementary event forE.

3 Pr(E∪F) =Pr(E) +Pr(F)−Pr(E∩F).

(3)

Some Foundations of Probability Theory

Definition (conditional probability) Let AandB be events with Pr(B)>0.

Theprobability that A occurs under the condition that B occurs is defined by

Pr(A|B) = Pr(A∩B) Pr(B) . Aand B are said to be independent if

Pr(A∩B) =Pr(A)·Pr(B);

equivalently, if Pr(A|B) =Pr(A).

(4)

Some Foundations of Probability Theory

Example (conditional probability)

Let’s throw two (distinguishable) dice, one red and oneblue.

Event A: “Number of pips of both dice is at least 10.”

Event B: “Number of pips of both dice is even.”

There are a total of 6·6 = 36 elementary events: e_1,1,e_1,2, . . . ,e_6,6. IfB has occurred, only 18 elementary events are still possible for A.

For example,e_1,1 is still possible, bute_1,2 is not.

We have A∩B={e_4,6,e6,4,e5,5,e6,6}.It follows that Pr(A|B) =Pr(A∩B)

Pr(B) =⁴/36 1/2 =2

9.

(5)

Some Foundations of Probability Theory

Lemma (Bayes)

Let A and B be events with Pr(A)>0 andPr(B)>0. Then, Pr(B)·Pr(A|B) =Pr(A)·Pr(B|A).

Proof: By definition, we have

Pr(B)·Pr(A|B) =Pr(A∩B) =Pr(B∩A) =Pr(A)·Pr(B|A),

which proves the lemma. q

Remark: Moreover, if A andB are independent, we have:

Pr(A) =Pr(A|B)·Pr(B) +Pr(A|B)·(1−Pr(B)).

(6)

Some Foundations of Probability Theory

Arandom variable is a function mapping fromE toR(or to Z).

For example, if every elementary eventei is the input of lengthn to a randomized algorithm A, then one might define the random variable X(e_i) to be the running time ofAon input e_i.

IfX:E →Ris a random variable on a probability spaceE, then

“X=x” denotes the event E that Xtakes on the valuex∈R, i.e., E={e_i ∈E

X(e_i) =x}.

(7)

Some Foundations of Probability Theory

Definition (expectation value and variance)

The expectation value and the variance of a random variable Xare defined by:

E(X) =

∑

ei∈E

X(e_i)·Pr(e_i);

V(X) = E((X−E(X))²) =E(X²)−(E(X))². Intuitively,

the expectation value E(X) gives the value taken on by Xon the average, with respect to the underlying probability distribution.

The variance is a measure of how far the values of Xdeviate fromE(X).

(8)

Some Foundations of Probability Theory

Example (expectation value)

Let p denote the probability that a certain event E occurs, and letX be a random variable that gives the number of independent trials untilE occurs for the first time. Let µ denote the number of independent trialson the average untilE occurs for the first time, i.e.,µ =E(X).

With probability p,E occurs with the first trial already (which is thus successful), and with probability 1−p one still needs on the average µ trials after this first (unsuccessful) trial. Hence,

µ =p·1 + (1−p)(1 +µ) =p+ 1−p+µ−pµ,

which implies E(X) =µ = 1/p. For example, if a randomized algorithm provides some desired result with probabilityp, then this algorithm has to be simulated 1/p times on the average until this desired result is obtained.

(9)

Shannon’s Theorem

Claude Shannon (1916 – 2001), “father of information theory”

(10)

Shannon’s Theorem: Assumptions and Preliminaries

Goal: Let S= (M,C,K,E,D) be a cryptosystem.

When doesS guarantee “perfect secrecy”?

Assumptions and Preliminaries:

The messages are distributed onM according to a probability distribution PrM that may depend on the natural language used.

For each new message, Alice chooses a new key from K that is independent of the message to be encrypted.

This assumption makes sense, since Alice is choosing her key before she knows what the plaintext will be.

The keys are distributed according to a probability distribution PrK

onK.

(11)

Shannon’s Theorem: Assumptions and Preliminaries

The distributions PrM and PrK induce a probability distribution PrM×K onM×K.

That is, dropping the subscript, for each messagem∈M and for each key k∈K,

Pr(m,k) =Pr_M(m)Pr_K(k)

is the probability that the message m is encrypted with the keyk, wherem andk are independent.

(12)

Shannon’s Theorem: Assumptions and Preliminaries

Notation:

For m∈M, letm denote the event{(m,k)

k∈K}.

Then,Pr(m) =Pr_M(m) is the probability that the message mwill be encrypted.

For k∈K, letk denote the event{(m,k)

m∈M}.

Then,Pr(k) =PrK(k) is the probability that the key k will be used.

For c∈C, letc denote the event{(m,k)

E_k(m) =c}.

Then,Pr(m|c) is the probability thatm is encrypted under the condition that c is received.

(13)

Shannon’s Theorem: Perfect Secrecy

Definition

A cryptosystemS = (M,C,K,E,D) is said to guaranteeperfect secrecy if and only if

(∀m∈M) (∀c∈C) [Pr(m|c) =Pr(m)].

Remark: That is, a cryptosystem guarantees perfect secrecy if and only if the event that some messagem is encrypted and

the event that some ciphertext c is received are independent:

Erich learns nothing about m from knowing c.

(14)

Shannon’s Theorem: Perfect Secrecy

Example (perfect secrecy)

Consider a cryptosystemS = (M,C,K,E,D) such that the plaintext space M, the ciphertext spaceC, and the key spaceK are given by:

M={a,b}, where Pr(a) =¹/3 andPr(b) =²/3; K ={$,#}, wherePr($) =¹/4 andPr(#) =³/4; C ={x,y}.

The probability that a letter m∈M is encrypted with a key k∈K is:

Pr(a,$) = Pr(a)·Pr($) = ¹₃·¹₄ = ₁₂¹ ; Pr(a,#) = Pr(a)·Pr(#) = ¹₃·³₄ = ¹₄;

Pr(b,$) = Pr(b)·Pr($) = ²₃·¹₄ = ¹₆; Pr(b,#) = Pr(b)·Pr(#) = ²₃·³₄ = ¹₂.

(15)

Shannon’s Theorem: Perfect Secrecy

Example (perfect secrecy, continued) Let the encryption functions be given by:

E_$(a) =y; E_$(b) =x; E_#(a) =x; E_#(b) =y. Hence, the probability that a ciphertext symbol c ∈C occurs is:

Pr(x) = Pr(b,$) +Pr(a,#) = ¹₆+¹₄ = ₁₂⁵ ; Pr(y) = Pr(a,$) +Pr(b,#) = ₁₂¹ +¹₂ = ₁₂⁷ .

(16)

Shannon’s Theorem: Perfect Secrecy

Example (perfect secrecy, continued)

Then, for each pair (m,c)∈M×C, the conditional probabilityPr(m|c) is different from the probability Pr(m):

Pr(a|x) = ^Pr(a,#)_Pr(x) =

3 12

5 12

= ³₅ 6= ¹₃ = Pr(a);

Pr(a|y) = ^Pr(a,$)_Pr(y) =

1 12

7 12

= ¹₇ 6= ¹₃ = Pr(a);

Pr(b|x) = ^Pr(b,$)_Pr(x) = ¹²²5 12

= ²₅ 6= ²₃ = Pr(b);

Pr(b|y) = ^Pr(b,#)_Pr(y) =

6 12

7 12

= ⁶₇ 6= ²₃ = Pr(b).

It follows that the given cryptosystem does not achieve perfect secrecy.

In particular, if Erich sees the ciphertext letter y, the odds are good that the encrypted plaintext letter was a b.

(17)

Shannon’s Theorem

In 1948, Claude Shannon lays the foundations of information and coding theory, with applications to cryptography.

Theorem (Shannon)

Let S= (M,C,K,E,D) be a cryptosystem withkMk=kCk=kKk<∞ and Pr(m)>0 for each m∈M.

Then, S guarantees perfect secrecy if and only if

1 for each m∈M and for each c∈C , there exists a unique key k∈K with Ek(m) =c, and

2 the keys in K are uniformly distributed.

(18)

Shannon’s Theorem: Proof

Proof: Assume that S achieves perfect secrecy. We show that both conditions are valid.

To prove the first condition, fix an arbitrary message m∈M.

Suppose that there is a ciphertext c∈C such that no key encryptsm into c. That is, for all k∈K, we have E_k(m)6=c. Thus,

Pr(m)6= 0 =Pr(m|c).

So S does not guarantee perfect secrecy, a contradiction. It follows that (∀c∈C) (∃k∈K) [Ek(m) =c].

Now, the assumption kMk=kCk=kKk<∞implies that each ciphertext c ∈C has aunique key k with E_k(m) =c.

(19)

Shannon’s Theorem: Proof

To prove the second condition, fix an arbitrary ciphertextc ∈C. For anym∈M, let k(m) be the unique key k with Ek(m) =c. By the Lemma of Bayes, it follows that for each m∈M:

Pr(m|c) = Pr(c|m)Pr(m)

Pr(c) =Pr(k(m))Pr(m)

Pr(c) . (1)

Since S guarantees perfect secrecy, we have Pr(m|c) =Pr(m).

By (1), this impliesPr(k(m)) =Pr(c), and the latter equality holds independently of m.

Hence, the probabilitiesPr(k) are equal for all k∈K, which implies that Pr(k) =¹/^kKk. Thus, the keys inK are uniformly distributed.

(20)

Shannon’s Theorem: Proof

Conversely, suppose that both conditions are true.

We show that S achieves perfect secrecy.

Let k=k(m,c) be the unique key k with E_k(m) =c.

By the Lemma of Bayes, it follows that

Pr(m|c) = Pr(m)Pr(c|m) Pr(c)

= Pr(m)Pr(k(m,c))

∑q∈MPr(q)Pr(k(q,c)). (2) Since all keys are uniformly distributed by the second condition, it follows that

Pr(k(m,c)) = 1 kKk.

(21)

Shannon’s Theorem: Proof

Proof: Moreover, we have that

q∈M

∑

Pr(q)Pr(k(q,c)) = ∑q∈MPr(q) kKk = 1

kKk.

Substituting this equality in (2) gives

Pr(m|c) =Pr(m).

Hence, S guarantees perfect secrecy. q

(22)

Shannon’s Theorem and Vernam’s One-Time Pad

The One-Time Pad was invented by Gilbert Vernam in 1917.

Fix the alphabet Σ ={0,1}, and define the plaintext space, the ciphertext space, and the key space by

M=C =K ={0,1}ⁿ for some block lengthn∈N.

The keys are uniformly distributed on{0,1}ⁿ.

(23)

Shannon’s Theorem and Vernam’s One-Time Pad

For each key~k∈ {0,1}ⁿ, define the encryption functionE_~_k and the decryption function D_~_k, both mapping from {0,1}ⁿ to {0,1}ⁿ, by:

E_~_k(~x) = ~x⊕~k;

D_~_k(~y) = ~y⊕~k, where⊕denotes bit-wise addition modulo 2.

By Shannon’s Theorem, the one-time pad achieves perfect secrecy, since for each plaintext~x∈ {0,1}ⁿ and for each ciphertext

~y∈ {0,1}ⁿ, there exists a unique key~k∈ {0,1}ⁿ with

~y=~x⊕~k, namely the vector~k=~x⊕~y.

(24)

Disadvantages of Vernam’s One-Time Pad

The one-time pad has some severe disadvantagesthat make its usage impractical in most concrete scenarios:

To guarantee perfect secrecy, every key can be used only once, and it must be as long as the block of the message encrypted by it.

Since for every block of the message a new secret key must be exchanged, the one-time pad lacks efficient key management.

This is one reason why it has only limited use in commercial applications.

Even worse, the one-time pad is not secure against known-plaintext attacks if the same key is used twice:

Knowing a message~x and a corresponding ciphertext~y, Erich can determine the key used simply by computing

~x⊕~y=~x⊕~x⊕~k=~k.

(25)

Remarks on Vernam’s One-Time Pad

Remark:

Due to its perfect secrecy and despite its drawbacks, the one-time pad has been employed in real-world applications in a diplomatic and military context.

Allegedly, it has been used for the hotline between Moscow and Washington (as noted in a survey by Simmons, 1979).

Evidently, in situations where unconditional security does matter, cryptosystems that provably guarantee perfect secrecy—such as the one-time pad—are of great importance.

Also other cryptosystems (even the shift cipher) guarantee perfect secrecy,provided every plaintext letter is encrypted with a new, randomly chosen (under uniform distribution) key.

(26)

Entropy: A Measure of Information

c

https://me.me/i/k-nows-chemistry-knows-physics-knows-nothing-gameofthronesnotofficial-mr-white- 407d19b99fb5472290e628d11a67e07e

(27)

Entropy in Physics

Thenotion of entropy originates from physics, where it is used to describe the degree of chaos (or disorder, the opposite of ordered structure) in the universe or in any closed system.

In the world of physics, entropy refers to manifest order or structure, as opposed to the regular, precise movement of microscopic particles, which is not considered as manifest order.

Thesecond principle of thermodynamics says that the entropy of any closed system increases with the passing of time, unless it is a

“reversible” system in which case the entropy remains constant.

(28)

Entropy in Computer Science

In algorithmics: Entropy is a useful notion for analyzing the running time of randomized algorithms,

to estimate the average search time in the data structure of optimal search trees,

to estimate the average length of the linked lists for hash tables that resolve collisions by chaining, and so on.

In data compression: Entropy measures the randomness of strings.

In information and coding theory: Entropy is a measure of the information gainedby knowing the result of a random experiment.

The other way round, entropy is a measure of the uncertainty removed by conducting this experiment.

Roughly speaking, uncertaintycorresponds tochaos, whereas information corresponds toorder.

(29)

Entropy in Cryptology

Entropy can be used to measure the amount of information about the key used that is revealed by the ciphertext observed (in a

ciphertext-only attack), assuming the same key is used repeatedly.

Definition

Let Xbe a random variable that can take onn possible values from the set X ={x₁,x2, . . . ,xn}. For each i with 1≤i≤n, let pi =Pr(X=xi) be the probability that Xtakes on the valuex_i. Define theentropy ofXby

H(X) = −

n

∑

i=1

pilogpi. (3)

By convention, we set 0 log 0 = 0.

(30)

Entropy: Interpretation

Formally, the weighted sum in (3) has the form of an expectation value.

Entropy expresses the degree ofuncertainty associated with the result of the given experimentbefore it has been conducted.

Conversely, it can be viewed as a measure for the average amount of information gained by conducting the experiment, i.e.,after knowing its result.

In other words, if the i^th event occurs in this experiment, the information gained amounts to

−logpi= log 1 pi

.

(31)

Entropy: Interpretation

If events are independent, then their probabilities are multiplied.

Hence, by the logarithm laws, the corresponding amounts of information gained can be added up.

Note that all logarithms are base 2; as usual, this choice is arbitrary because changing the base would change the amount of the entropy only by a constant factor.

(32)

Entropy: Examples

Example (tossing a fair coin)

Tossing a coin is a random experiment with two possible outcomes: heads or tails. LetX be a random variable giving the result of the experiment:

X= 1if the coin came down headsafter being tossed, and X= 0if it came downtails.

If it is a fair coin, both these events occur with probability p1=p2=¹/2. Since

H(X) =− 1

2log1 2+1

2log1 2

=1

2log 2+1

2log 2= 1, the random experiment of tossing a coin has an information content of precisely one bit, which is either 1 for heads or 0 fortails.

(33)

Entropy: Examples

Example (tossing a biased coin)

However, if the coin is biased, say with the probabilities p1=¹/4 for heads and

p2=³/4 for tails,

the entropy of this experiment decreases to about 0.8113 bit because:

H(X) =− 1

4log1 4+3

4log3 4

=1

4log 4+3 4log4

3

=1 2+3

4(2−log 3)= 2−3 4log 3

≈0.8113.

Remark: Ifp1= 0and p2= 1, we have

H(X) =−(0 log 0+1 log 1) = 0.

(34)

Entropy: Examples

Example (experiment with three possible outcomes) Let Xbe a random variable with threepossible outcomes:

x₁ with probability p₁=¹/2, x₂ with probability p₂=¹/4, and x₃ with probability p₃=¹/4.

A “most efficient” encoding of the three outcomes is:

x1 by 0, x2 by 10, and x3 by 11.

The entropy is the average number of bits in this encoding of X:

H(X) =− ¹₂log¹₂+¹₄log¹₄+¹₄log¹₄

=¹₂·1+¹₄·2+¹₄·2= ³₂.

(35)

Entropy: Examples

Example (entropy of a cryptosystem)

Consider the cryptosystem S= (M,C,K,E,D) defined in our previous example:

M={a,b}, where Pr(a) =¹/3 andPr(b) =²/3; K ={$,#}, wherePr($) =¹/4 andPr(#) =³/4; C ={x,y}, wherePr(x) =⁵/12andPr(y) =⁷/12. One can think of

a key as a random variableKthat takes on values inK ={$,#}, and thus one can compute its entropy.

Similarly, letMand Cbe random variables that describe the plaintext and ciphertext, respectively, and take on values inM and inC.

(36)

Entropy: Examples

Example (entropy of a cryptosystem, continued)

The entropies of K,M, andCcan be computed as follows:

H(K) = − 1

4log1 4+3

4log3 4

=1

4log 4 +3 4log4

3= 2−3 4log 3

≈ 0.8113;

H(M) = − 1

3log1 3+2

3log2 3

=1

3log 3 +2 3log3

2

≈ 0.9183;

H(C) = − 5

12log 5 12+ 7

12log 7 12

= 5 12log12

5 + 7 12log12

7

≈ 0.9799.

(37)

Entropy: Properties

c

fotolia.de – Fotolia 158776046 S.jpg

(38)

Entropy: Properties

Theorem

Let Xbe a random variable that can take on n possible values from the set X ={x₁,x₂, . . . ,x_n}. For each i with 1≤i≤n, let p_i =Pr(X=x_i)be the probability that Xtakes on the value xi.

1 H(X)≤logn, with equality if and only if (p₁,p₂, . . . ,p_n) =

1 n,1

n, . . . ,1 n

.

2 H(X)≥0, with equality if and only if pi= 1for some i , and pj= 0 for j6=i . Here,H(X) = 0means that the experiment’s result is completely determined, i.e., there is no uncertainty about it and it does not provide any information.

(39)

Entropy: Properties

3 IfY is a random variable that can take onn+ 1 possible values from the setY ={y₁,y₂, . . . ,y_n+1}such that Pr(Y=y_i) =p_i for 1≤i≤n andPr(Y=yn+1) = 0, thenH(Y) =H(X).

4 Ifπ∈S_n is an arbitrary permutation of the set {1,2, . . . ,n} and if Y is a random variable withPr(Y=x_i) =p_π(i), 1≤i≤n, then

H(Y) =H(X).

5 Grouping Property: IfY andZare random variables such that Y can take onn−1 possible values with the probabilitiesp₁+p₂,p₃, . . . ,p_n andZ can take on two possible values with the probabilities^p¹/(p1+p2)

and^p²/(p1+p2), then

H(X) =H(Y) + (p1+p2)H(Z).

(40)

Entropy: Properties

6 Gibb’s Lemma: Let Y be a random variable that can take onn possible values with the probabilitiesq₁,q₂, . . . ,q_n(i.e., 0≤q_i ≤1 and

∑ⁿ_i=1q_i = 1), then

H(X)≤ −

n

∑

i=1

pilogqi,

with equality if and only if (p₁,p₂, . . . ,p_n) = (q₁,q₂, . . . ,q_n).

7 Subadditivity: Let Z= (X₁,X₂, . . . ,X_n) be a random variable whose possible values aren-tuples of the form (x1,x2, . . . ,xn), i.e., xi is the value of the random variable X_i. Then,

H(Z)≤H(X₁) +H(X₂) +· · ·+H(X_n),

with equality if and only if the random variables X_i are independent, i.e.,Pr(X1=x1,X2=x2, . . . ,Xn=xn) =∏ⁿ_i=1Pr(Xi =xi).

(41)

Entropy: Proving Property 1 using Jensen’s Inequality

Proof: We restrict ourselves to proving the first statement.

Observe that the function logx is strictly concave on the interval (0,∞).

Definition

A real-valued function f is said to be concave on the interval I⊆Rif and only if for all x,y∈I,

f

x+y 2

≥ f(x) +f(y)

2 .

The function f is said to be strictly concave on the interval I⊆R if and only if for all x,y∈I with x6=y,

f

x+y 2

> f(x) +f(y)

2 .

(42)

Entropy: Proving Property 1 using Jensen’s Inequality

Lemma (Jensen’s Inequality)

If f is a continuous function that is strictly concave on the real interval I , and if the positive real numbers p1,p2, . . . ,pn satisfy

n

∑

i=1

p_i= 1, then

n

∑

i=1

p_if(x_i)≤f

n

∑

i=1

p_ix_i

! .

without proof

(43)

Entropy: Proving Property 1 using Jensen’s Inequality

Observe that the function logx is strictly concave on the interval (0,∞).

By Jensen’s inequality, it follows that H(X) = −

n

∑

i=1

p_ilogp_i=

n

∑

i=1

p_ilog 1 p_i

≤ log

n

∑

i=1

p_i 1

p_i

= logn.

Furthermore, we have equality if and only if pi =¹/nfor eachi with

1≤i≤n. q

(44)

Conditional Entropy

Definition

Let Xand Ybe two random variables such that

X can take onn possible values from the set X={x₁,x2, . . . ,xn} and Y can take onm possible values from the set Y ={y₁,y2, . . . ,ym}.

For eachi andj with 1≤i≤n and 1≤j≤m, define the conditional probabilitiespij and the probabilitiesqj by:

pij = Pr(X=xi|Y=yj);

q_j = Pr(Y=y_j).

For fixedj, 1≤j≤m, letX_j denote the random variable onX that is distributed according top_1j,p_2j, . . . ,p_nj. Clearly,H(X_j) =−∑ⁿ_i=1p_ijlogp_ij.

(45)

Conditional Entropy

Definition (continued)

Define the conditional entropy by H(X|Y) =

m

∑

j=1

q_jH(X_j) =−

m

∑

j=1 n

∑

i=1

q_jp_ijlogp_ij.

Remark: Intuitively, the conditional entropy H(X|Y) is the weighted average (with respect to the probability distribution on Y) of the entropies H(X_j) for allj with 1≤j≤m.

(46)

Conditional Entropy: Simple Properties

Theorem

H(X,Y) =H(Y) +H(X|Y).

Proof: For each i andj with 1≤i≤n and 1≤j≤m, let pij = Pr(X=xi|Y=yj);

qij = Pr(X=xi,Y=yj).

H(X,Y) = −

n

∑

i=1 m

∑

j=1

q_ijlogq_ij

= −

n

∑

i=1 m

∑

j=1

q_ijlog (p_ijPr(Y=y_j))

(47)

Conditional Entropy: Simple Properties

= −

n

∑

i=1 m

∑

j=1

q_ijlogp_ij−

n

∑

i=1 m

∑

j=1

q_ijlogPr(Y=y_j)

= −

n

∑

i=1 m

∑

j=1

qijlogpij−

n

∑

i=1

Pr(X=xi)

m

∑

j=1

Pr(Y=yj) logPr(Y=yj)

= −

n

∑

i=1 m

∑

j=1

p_ijPr(Y=y_j) logp_ij+H(Y)

= −

m

∑

j=1

Pr(Y=y_j)

n

∑

i=1

p_ijlogp_ij+H(Y)

= H(X|Y) +H(Y). q

(48)

Conditional Entropy: Simple Properties

Corollary

H(X|Y)≤H(X), with equality if and only if XandY are independent.

Proof: By subadditivity, we have

H(X,Y)≤H(X) +H(Y), with equality if and only if X andY are independent.

From the previous theorem, it follows that

H(X|Y) =H(X,Y)−H(Y)≤H(X),

with equality if and only if X andY are independent. q

(49)

Key Equivocation

Goal: Apply the properties of entropy and conditional entropy to cryptosystems.

In particular, what is its “key equivocation”?

Definition

Let S= (M,C,K,E,D) be a cryptosystem, and think of a key, a plaintext, and a ciphertext as a random variable K,M, andC, respectively.

The key equivocation of S is defined as the conditional entropyH(K|C).

Remark: The key equivocation of a cryptosystem is interpreted as the amount of information about the key used that is revealed by the ciphertext observed. How can we compute it?

(50)

Key Equivocation

Theorem

Let S= (M,C,K,E,D) be a cryptosystem, and let K,M, and Cbe random variables corresponding to K , M, and C . Then,

H(K|C) =H(K) +H(M)−H(C).

Proof: By applying the previous theorem with X=CandY= (K,M), we obtain

H(C,K,M) =H(K,M) +H(C|K,M).

Since S is a cryptosystem, a given keyk∈K and a given plaintextm∈M uniquely determine the ciphertext c=Ek(m).

(51)

Key Equivocation

Hence, H(C|K,M) = 0. It follows that

H(C,K,M) =H(K,M).

However, since the random variables Kand Mare independent, the subadditivity of entropy implies that

H(K,M) =H(K) +H(M).

Thus,

H(C,K,M) =H(K,M) =H(K) +H(M). (4)

(52)

Key Equivocation

Similarly, a given key k∈K and a given ciphertextc∈C uniquely determine the plaintext m=D_k(c). Hence,H(M|K,C) = 0.

It follows that

H(C,K,M) =H(K,C). (5)

By the previous theorem, the equalities (4) and (5) imply that H(K|C) = H(K,C)−H(C)

= H(C,K,M)−H(C)

= H(K) +H(M)−H(C),

which proves the theorem. q

(53)

Key Equivocation

Example (key equivocation in a cryptosystem)

Consider the cryptosystem S= (M,C,K,E,D) from our previous example:

M={a,b}, where Pr(a) =¹/3 andPr(b) =²/3; K ={$,#}, wherePr($) =¹/4 andPr(#) =³/4; C ={x,y}, wherePr(x) =⁵/12andPr(y) =⁷/12. We have already estimated the entropies ofK,M, andC:

H(K) ≈ 0.8113;

H(M) ≈ 0.9183;

H(C) ≈ 0.9799.

(54)

Key Equivocation

Example (key equivocation in a cryptosystem, continued) By the previous theorem, it follows that

H(K|C) = H(K) +H(M)−H(C)

≈ 0.8113 + 0.9183−0.9799 = 0.7497. (6) Check: The key equivocation of this cryptosystem can also be determined directly by applying the definition of conditional entropy.

Using the Lemma of Bayes, we first determine the conditional probabilities Pr(K=k|C=c) for each k∈K ={$,#}andc ∈C ={x,y}:

Pr($|x) =Pr(b,$) Pr(x) =2

5; Pr(#|x) =Pr(a,#) Pr(x) =3

5; Pr($|y) =Pr(a,$)

Pr(y) =1

7; Pr(#|y) =Pr(b,#) Pr(y) = 6

7.

(55)

Key Equivocation

Example (key equivocation in a cryptosystem, continued)

Note that it is a pure coincidence that our cryptosystem satisfies:

Pr($|x) =²/5=Pr(b|x), Pr($|y) =¹/7=Pr(a|y),

Pr(#|x) =³/5=Pr(a|x), and Pr(#|y) =⁶/7=Pr(b|y).

In general, this is not the case. For example, it is not necessarily the case that the key space and the plaintext space are of the same size.

Now, we can compute the key equivocation according to the definition of conditional entropy:

H(K|C) = 5 12

2 5log5

2+3 5log5

3

+ 7 12

1

7log 7 +6 7log7

6

≈ 0.7497,

confirming the result in (6) that was yielded by the previous theorem.

(56)

Spurious Keys and Language Redundancy

Let S= (M,C,K,E,D) be a cryptosystem, and let x1x2· · ·xn be a plaintext that is encrypted by y₁y₂· · ·y_n.

Suppose attacker Erich wants to determine the key used by a ciphertext-only attack, and he knows the underlying plaintext language is English.

Many keys can be outright eliminated, but some may remain possible.

Aspurious key is a key that cannot be outright eliminated.

Goal: Estimate the expected number of spurious keys.

To this end, we consider the entropy per letterHL of a natural language L, and its redundancyRL.

(57)

Spurious Keys and Language Redundancy

Definition

Let Lbe a natural language over alphabet Σ.

Define theentropy of L by

HL= lim

n→∞

H(Mⁿ)

n ,

whereMⁿ is some random variable that is distributed according to the probabilities with which the length n strings (i.e., all n-grams) in the language Loccur.

Define theredundancy of Lby

RL= 1− HL

logkΣk.

(58)

Spurious Keys and Language Redundancy

Theorem

Let S= (M,C,K,E,D)be a cryptosystem with kMk=kCk and keys in K uniformly distributed at random.

Let RL be the redundancy of the underlying natural language.

Let ¯sn be the expected number of spurious keys, given a ciphertext of length n (for sufficiently large n).

Then

¯

sn≥ kKk kMk^n·^R^L −1.

(59)

Spurious Keys and Language Redundancy

Proof: Probability distributions on K andMⁿ induce a probability distribution on Cⁿ.

Again, we consider the corresponding random variables K,Mⁿ, and Cⁿ. For y∈Cⁿ, let

K(y) ={k∈K

∃x∈Mⁿ:Pr(x)>0 and E_k(x) =y}

be the set of keys for which some “reasonable” plaintext is encrypted into y. ThenkK(y)k −1 is the number of spurious keys.

We want to estimate ¯sn, theexpected number of spurious keys(over all possible y∈Cⁿ).

(60)

Spurious Keys and Language Redundancy

¯

s_n =

∑

y∈Cⁿ

Pr(y)(kK(y)k −1)

=

∑

y∈Cⁿ

Pr(y)kK(y)k −

∑

y∈Cⁿ

Pr(y)

=

∑

y∈Cⁿ

Pr(y)kK(y)k

!

−1. (7)

By the theorem on slide 50, we know

H(K|Cⁿ) =H(K) +H(Mⁿ)−H(Cⁿ).

According to the definitions of entropy and redundancy of a language, we use the approximation

H(Mⁿ)≈nHL=n(1−RL) logkMk for sufficiently large n.

(61)

Spurious Keys and Language Redundancy

Obviously,

H(Cⁿ)≤nlogkCk.

Assuming kMk=kCk, it follows that

H(K|Cⁿ) ≥ H(K)−nRLlogkMk. (8) Now, we connect H(K|Cⁿ) with ¯sn:

H(K|Cⁿ) =

∑

y∈Cⁿ

Pr(y)H(K|y) by Def. conditional entropy

≤

∑

y∈Cⁿ

Pr(y) logkK(y)k by Property 1 of entropy

≤ log

∑

y∈Cⁿ

Pr(y)kK(y)k by Jensen’s inequality

≤ log(¯s_n+ 1) by (7).

(62)

Spurious Keys and Language Redundancy

Summing up:

H(K|Cⁿ) ≤ log(¯sn+ 1). (9) (8) and (9) together imply:

log(¯sn+ 1)≥H(K)−nRLlogkMk.

Note thatH(K) is maximal with the keys being uniformly distributed.

Rearranging the above inequality, we obtain

¯

s_n≥ kKk kMk^n·^R^L −1,

as desired. q

(63)

Unicity Distance

Definition

Let S= (M,C,K,E,D) be a cryptosystem withkMk=kCk and keys in K uniformly distributed at random.

Define the unicity distance of S to be the valuen0∈Nfor which ¯sn0

becomes zero, i.e., the average size of ciphertext required for uniquely determining the correct key, provided one has sufficient computation time.

Corollary

A cryptosystem S as above has unicity distance n₀≈ logkKk

RL·logkMk.