Distance and rate of a code

(1)

Topics in Algebra: Cryptography

Univ.-Prof. Dr. Goulnara ARZHANTSEVA

WS 2019

(2)

Codes and Expanders

Cryptography:

Symmetric and asymmetric cryptosystems;

One-way functions, Hash functions;

Key management, Digital Signatures, Applications;

Pseudorandom generators.

Encoding, Error-correction.

(3)

Code

LetP be a finite set of possible messages.

Definition: Code

Acodeis a subsetC ⊂ {0,1}ⁿwith|C|=|P|, and anencodingis given by abijectivemapψ:P →C.

Linear code

Alinear codeC is a code withP ={0,1}^k for somek <n, and encoding is done by a linear operator (agenerating matrix) A_C ∈F^k₂^×n:

ψ(v) =v^TA_C.

A linear codeC is a linear subspace of{0,1}ⁿofdimensionk, whose basis is given by the rows ofk×nmatrixA_C.

(4)

Code

LetP be a finite set of possible messages.

Definition: Code

Acodeis a subsetC ⊂ {0,1}ⁿwith|C|=|P|, and anencodingis given by abijectivemapψ:P →C.

Linear code

Alinear codeC is a code withP ={0,1}^k for somek <n, and encoding is done by a linear operator (agenerating matrix) A_C ∈F^k₂^×n:

ψ(v) =v^TA_C.

A linear codeC is a linear subspace of{0,1}ⁿofdimensionk, whose basis is given by the rows ofk×nmatrixA_C.

(5)

Distance and rate of a code

Forx,y ∈C, theHamming distanced_H(x,y) = number of distinct bits Definition: Distance and rate

d(C) = min x,y ∈C

x 6=y

d_H(x,y), r(C) = log₂|C|

n

The distance measures the ability to resolve corrupted bits.

Thedistance should be large: two codewords should be sufficiently dissimilar so that corruption of a single bit (or of a small number of bits) does not turn one codeword into another.

(6)

Distance and rate of a code

d(C) = min x,y ∈C

x 6=y

n

The distance measures the ability to resolve corrupted bits.

Thedistance should be large: two codewords should be sufficiently dissimilar so that corruption of a single bit (or of a small number of bits) does not turn one codeword into another.

(7)

What code will correct t -bit errors?

If 2 bits are bad in a codeworda, the resulting (erroneous) codeworda⁰ is at distance 2 =d_H(a,a⁰).

Such errors can be corrected ifd(a⁰,c)>3: the correct codewordais the closest toa⁰. Thus,d(C)>5 is required:

56d(C)6d_H(a,c)6d_H(a,a⁰) +d_H(a⁰,c).

Similarly, we obtain the following result.

Observation

A code withd(C)>t+ (t+ 1) = 2t+ 1 can correctt-bit errors.

(8)

Distance and rate of a code

d(C) = min x,y ∈C

x 6=y

n

The rate measures the number of information-bits.

For a linear code of dimensionk, the rate is ^log_n²²^k = ^k_n, the amount of non-redundant information per bit.

Therate should be large.

(9)

Distance versus rate

A sparser code has larger distance (i.e. more errors can be corrected) but smaller rate (i.e. smaller information-density).

Theorem: Quantifying the distance-rate tradeoff Hamming’1950 LetC⊂ {0,1}ⁿbe a code andt=^j^d(C)−1₂ ^k. Then

|C|

2ⁿ 6 1 Pt

i=0 n i

A code isperfectif it achieves the Hamming bound.

(10)

Distance versus rate

Proof: Forx ∈ {0,1}ⁿ, B(x,t) ={y ∈ {0,1}ⁿ|d_H(x,y)6t}is the ball of radiustcentered atx, with respect to the Hamming distance.

For allx,y ∈C,x 6=y, the setsB(x,t) andB(y,t) are disjoint.

Otherwise,d_H(x,y)62t<d(C), contradicting the definition ofd(C).

EachB(x,t) has size^P^t_i=0 ⁿ_i. Their union is contained in{0,1}ⁿ, so

|C|

t

X

i=0

n i

! 62ⁿ

(11)

Distance versus rate

Example: the Hamming bound

A linear code of lengthn, dimensionk and distance 3 satisfies k 6n−log₂(n+ 1)

Example: a Hamming code of length 7, dimension 4 and distance 3 For (x₁,x₂,x₃,x₄)∈ {0,1}⁴, we define

C_Ham(x₁,x₂,x₃,x₄) = (x₁,x₂,x₃,x₄,x₂⊕x₃⊕x₄,x₁⊕x₃⊕x₄,x₁⊕x₂⊕x₄) 4 = 7−log₂(7 + 1), hence,C_Hamhas the largest possible dimension for any binary code of length 7 and distance 3.

(12)

Distance of a linear code

Forc ∈Fⁿ₂, the supportsuppc = the set of positions with nonzero bits.

|suppc|= the number of nonzero bits, theweightofc.

Lemma: Distance of a linear code

For a linear codeC, we haved(C) = min_c∈C,c6=0|suppc|.

Proof: Ifc,c⁰∈C, thenc⊕c⁰ ∈C, sinceCis linear. Then, d_H(c,c⁰) =d_H(0,c⊕c⁰) =|supp(c⊕c⁰)|.

(13)

Asymptotically good codes

Definition: Asymptotically good code

A familyCof codesC_n⊂ {0,1}ⁿasn→ ∞, isasymptotically goodif there exist constantsα, λ >0 such that for allC_n∈ C,

d(Cn)

n > αandr(Cn)> λ.

We want both a constant-fraction number of errors and a constant rate.

We also want that encoding and decoding isin P, ideally inlinear time.

(14)

Bipartite graphs

Definition: A bipartite graph

A graph isbipartiteif there is a partition of its set of vertices into two (disjoint) subsetsSandT such that every edge has one endpoint vertex inSand another one inT.

Definition: An (l,r)-regular graph

A bipartite graph is (l,r)-regular if all vertices inShave degreel, and all vertices inT have degreer.

The complete graphK_3,3is a bipartite graph. It is (3,3)-regular.

(15)

Bipartite expander graphs

Definition: Bipartite expander

A bipartite graphX is an(l,r, α, δ)-expanderif it is (l,r)-regular, and for all setsU⊂Swith|U|6α|S|, we have|∂U|> δ|U|.

∂U denotes theexternal boundaryof the setU = the set of vertices at distance 1 fromU, in the edge-length distance onX.

Here,α, δ >0 are real constants andl,r are positive integers.

Small subsets ofShave big enough boundary: they are‘expanding’.

(16)

Parity check matrix

LetS ⊂F^r₂be anr-bit linear code of dim.k withparity check matrixP_S: c ∈S⇐⇒P_Sc = 0.

The (r−k)×r matrixP_S describes linear relations that hold∀c ∈S.

Rows ofA_S spanSand rows ofP_SspanS^⊥.

That is,P_SA^T_S = (0), the zero matrix of size (r −k)×k.

By a changing of basis ofF^r₂, we writeA_S = (I_kM), whereI_k thek ×k identity matrix.

Then,P_S = (M^TI_r−k).

(17)

Parity check matrix

LetS ⊂F^r₂be anr-bit linear code of dim.k withparity check matrixP_S: c ∈S⇐⇒P_Sc = 0.

The (r−k)×r matrixP_S describes linear relations that hold∀c ∈S.

Rows ofA_S spanSand rows ofP_SspanS^⊥.

That is,P_SA^T_S = (0), the zero matrix of size (r −k)×k.

By a changing of basis ofF^r₂, we writeA_S = (I_kM), whereI_k thek ×k identity matrix.

Then,P_S= (M^TI_r−k).

(18)

Towards expander codes

LetX be an (l,r)-regular expander whosel-degree side hasnvertices andl <r.

We will extend anr-bit linear codeS to ann-bit linear codeC(X,S).

This will allow to produce an asymptotically good family of codes.

(19)

Expander codes

Let{u₁, . . . ,un}benvertices on thel-degree side ofX.

Then ther-degree side has (l·n)/r vertices, say{v₁, . . . ,v_ln/r}.

Letσbe a function such that fori= 1, . . . ,ln/r, the neighbours ofv_i are u_σ(i,1), . . . ,u_σ(i,r)

Definition: C(X,S)

C(X,S) ={(x₁, . . . ,x_n)∈Fⁿ₂ | ∀iwe have (x_σ(i,1), . . . ,x_σ(i,r))∈S}

(20)

Expander codes

Lemma: Expander code is linear C(X,S) is a linear code.

Proof: IfB_X_,i is the 0−1 matrix that maps (x₁, . . . ,xn) to

(x_σ(i,1), . . . ,x_σ(i,r)), of sizer ×n, then the parity check matrixP_C(X,S)is the matrix whose rows are the union of the rows of the matrices

P_SB_X_,i, each of size (r −k)×n, fori = 1, . . . ,ln/r.

(21)

Expander code

Theorem: Expander code Sipser-Spielman’1994

Suppose thatX is an (l,r, α,l/r)-expander, andShas rate

R >1−1/l and normalised distanced(S)/r =. ThenC(X,S) has rate at least 1−l(1−R) and normalised distance at leastα.

Proof: Each matrixP_SB_X,i hasr −k = (1−R)·r rows. So, the parity check matrix ofC(X,S) has

l·n

r (1−R)r =ln(1−R) rows.

This spansC(X,S)^⊥. Hence, the dimension ofC(X,S) is at least n−ln(1−R), and rate of at least ^{n−ln(1−R)}_n = 1−l(1−R).

Next we bound the normalised distanced(C(X,S))/n.

(22)

Expander code

Theorem: Expander code Sipser-Spielman’1994

Suppose thatX is an (l,r, α,l/r)-expander, andShas rate

R >1−1/l and normalised distanced(S)/r =. ThenC(X,S) has rate at least 1−l(1−R) and normalised distance at leastα.

Proof: Each matrixP_SB_X,i hasr −k = (1−R)·r rows. So, the parity check matrix ofC(X,S) has

l·n

r (1−R)r =ln(1−R) rows.

This spansC(X,S)^⊥. Hence, the dimension ofC(X,S) is at least n−ln(1−R), and rate of at least ^{n−ln(1−R)}_n = 1−l(1−R).

Next we bound the normalised distanced(C(X,S))/n.

(23)

Expander code: Theorem (suite)

Suppose by contradiction that there isc ∈C(X,S) with|suppc|6αn.

LetUbe the vertices inX corresponding to the coordinates ofsuppc.

By the expansion of the graph,|∂U|> _r^l|U|.There arel|U|edges fromUtoX \U, so somev_i∈∂Uhas<rneighbours inU.

Then, (x_σ(i,1), . . . ,x_σ(i,r))∈Shas <r 1-bits, contradicting the hypothesis that the normalised distance ofS is.

(24)

Asymptotically good error-correcting codes

Corollary

If (X_i)_i>1is a family of (l,r, α,l/r)-expanders withnvertices of its l-degree side, asn→ ∞, thanC(X_i,S)_i_>₁are asymptotically good error-correcting codes.

(25)

Expander code: Example

Expander code from the even-weight code

LetS_even⊂F^r₂be the code consisting of all even-weight codewords.

ThenP_S_even = (1 1 · · ·1), the normalised distance ofS_evenis 2/r and the rateR= 1−1/r.

IfX is an (l,r, α,l/2)-expander, then, by the Theorem,C(X,S_even) has the normalised distance at leastα and the rate at least 1−l/r.

(26)

Linear error-correcting (without proof)

Theorem: Linear decoding Sipser-Spielman’1994 IfX is an (l,r, α,³₄l)-expander, then the codeC(X,S_even) permits an α/2 fraction of errors to be corrected in linear time.

C(X,S_even) has normalised distance at leastαand rate at least 1−l/r.

(27)

Linear error-correcting

There is a linear-time algorithm that will map to a codeword any word of relative distance at mostαfrom that codeword, for some positive constantα.

Algorithm

While not all constraints are satisfied, find a variablex_i in more unsatisfied than satisfied constraints, and switchx_i.

C(X,S_even) hasnvariablesand (l·n)/r constraints.

Given (x₁, . . . ,x_n)∈ {0,1}ⁿ, a constraintv_iissatisfiedif

(x_σ(i,1), . . . ,x_σ(i,r))∈S_even, i.e. the mod2 sum of the coordinates is zero.

Otherwise, it isunsatisfied.

One shows that the algorithm terminates after linear number of switches and can be implemented in linear time.

(28)

Linear error-correcting

There is a linear-time algorithm that will map to a codeword any word of relative distance at mostαfrom that codeword, for some positive constantα.

Algorithm

While not all constraints are satisfied, find a variablex_i in more unsatisfied than satisfied constraints, and switchx_i.

C(X,S_even) hasnvariablesand (l·n)/r constraints.

Given (x₁, . . . ,x_n)∈ {0,1}ⁿ, a constraintv_iissatisfiedif

(x_σ(i,1), . . . ,x_σ(i,r))∈S_even, i.e. the mod2 sum of the coordinates is zero.

Otherwise, it isunsatisfied.

One shows that the algorithm terminates after linear number of switches and can be implemented in linear time.

(29)

Asymptotically good linear time error-correcting codes

Corollary

If (X_i)_i>1is a family of (l,r, α,³₄l)-expanders withnvertices of its l-degree side, asn→ ∞, thanC(X_i,S_even)_i_>₁are asymptotically good linear time error-correcting codes.

(30)

Existence and constructions of expanders: Remarks

Theorem: Existence of expanders Kolmogorov-Barzdin’1968 A random (bipartite) graph is an expander.

The above definition of expander can be adapted to usual (not necessarily) bipartite graphs.

Examples of explicit (non bipartite) expanders can be produced by takingbox spaces of finitely generated residually finite groups with Kazhdan’s property (T).

SL₃(Z) is such a group andSL₃(Z/pZ) as primep→ ∞is such an (explicit) expander.

(31)

Existence and constructions of expanders: Remarks

A usual expander gives a bipartite expander: take two copies of the vertex set for each finite graph and have an edge between vertices in different copies if and only if there is an edge between these vertices in the original graph.

Expander graphs are ubiquitous in mathematics and computer science!

(32)

Test questions

Question 27

Is the Hamming distance indeed a distance?

Question 28

Given a linear codeC, is its generating matrix uniquely defined?

Question 29

Is the complete graphK_3,3a bipartite expander?

Question 30

LetY is a non bipartite expander with the expansion parameterλ.

What is the expansion parameter of the bipartite expanderX

constructed fromY as in the previous slide. What about the diameter and the girth ofX (given the diameter and the girth ofY)?