3’ 5’5’ ••••• ••••• ••••• P P P AT CG GC ••••• ••••• ••••• P P P 3’ DoubleStrandedDNAMolecule

(1)

Literatur – DNA Computing

T. Head, Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors. Bull. Math. Biology 49 (1987) 737–759.

L. M. Adleman, Molecular computation of solutions to combinatorial problems. Science 226 (1994) 1021–1024.

T. Head, Gh. P˘aun and D. Pixton, Language theory and molecular genetics. In: G. Rozenberg and A. Salomaa (eds.), Handbook of Formal Languages, Springer-Verlag, 1997, Vol. II, Chapter 7, 295–360.

Gh. P˘aun, G. Rozenberg and A.Salomaa, DNA Computing - New Computing Paradigms. Springer-Verlag, Berlin, 1998.

(2)

Molecule with Thymine Base

C @

@@

HN C CH3

O C CH@

@@

N CH

CH H

O

H CH CH CH

O@

@@

P@

@@

O

O O

O@

@@

P@

@@

O

O O

1’

2’ 3’

4’

5’

(3)

Double Stranded DNA Molecule

3’ 5’

5’ ^- 3’

•

A T

•

P

5’

4’

3’

2’

1’

3’

5’

•

C G

•

P

•

G C

•

P

3’

5’

3’

(4)

Measuring the Length of DNA Molecules by Gel Electrophoresis

large fragments small fragments negative

electrodes

positive electrodes

– ^- +

(5)

Polymerase

5’ 3’

CGGA

GCCTCTACCT

3’ 5’

-

5’ 3’

CGGAG

GCCTCTACCT

3’ 5’

-

5’ 3’

CGGAGA

GCCTCTACCT

3’ 5’

- ... ^-

5’ 3’

CGGAGATGGA GCCTCTACCT

3’ 5’

(6)

Polymerase Chain Reaction

γ

z }| {

β

z }| {

| {z }

γ

| {z }

β

@@

@@R

denaturation by heating

γ

z }| {

β

z }| {

| {z }

γ

| {z }

β

?

annealing

?

annealing

γ

z }| {

β

z }| {

| {z }

β−primer

γ−primer

z }| {

| {z }

γ

| {z }

β

?

polymerase

?

polymerase

γ

z }| {

β

z }| {

| {z }

γ

| {z }

β

γ

z }| {

β

z }| {

| {z }

γ

| {z }

β

(7)

Endonuclease

5’ 3’

CATATG GTATAC

3’ 5’

?

NdeI

5’ 3’ 5’ 5’

CA TATG

GTAT AC

3’ 5’ 3’ 5’

5’ 3’

GGCC

3’CCGG5’

?

HaeIII

5’ 3’ 5’ 3’

GG CC

CC GG

3’ 5’ 3’ 5’

(8)

Hydrogen Bonding and DNA Ligase

C-A T-A-T-G

| | | |

G-T-A-T A-C

hydrogen -

bonding

C-A T-A-T-G

ligase -

C-A-T-A-T-G

(9)

Splicing with Sticky Ends

A G C T T C G A α1 β1

C G C G G C G C α2 β2

? ?

TaqI SciNI

T C G A

A G C T

α1 β1 G C G C

C G C G

α2 β2

? ?

XXXXXXXXXXXX

XXXXXXXX

XXXXXXXXXXz 9

exchange

T C G C

A G C G

α1 β2 G C G A

C G C T

α2 β1

? hydrogen bondingDNA ligaseand ?

A G C G T C G C α1 β2

C G C T G C G A α2 β1

(10)

Splicing with Blunt Ends

A G C T T C G A

α1 β1 G G C C

C C G G α2 β2

? ?

AluI HaeIII

A G C T

T C G A

α1 β1 G G C C

C C G G

α2 β2

? ?

XXXXXXXXXXXX

XXXXXXXX

XXXXXXXXXXz 9

exchange

A G C C

T C G G

α1 β2 G C G A

C G C T

α2 β1

? hydrogen bondingDNA ligaseand ?

T C G G A G C C α1 β2

C C G A G G C T α2 β1

(11)

Adleman’s Experiment

4

3 1

0 6

2 5

V(2) TATCGGATCGGTATATCCGA

E(2,3) CATATAGGCTCGATAAGCTC

V(3) GCTATTCGAGCTTAAAGCTA

E(3,4) GAATTTCGATCCGATCCATG

(12)

Splicing Scheme and Splicing Operation I

Definition:

A splicing scheme is a pair (V, R), where – V is an alphabet and

– R is a subset of V ^∗#V ^∗$V ^∗#V ^∗.

The elements of R are called splicing rules.

Definition:

We say that w ∈ V ^∗ and z ∈ V ^∗ are obtained from u ∈ V ^∗ and v ∈ V ^∗ by the splicing rule r = r1#r2$r3#r4 and write (u, v) ⊢_r w and (u, v) ⊢_r z, if the following conditions are satisfied:

– u = u1r1r2u2 and v = v1r3r4v2, – w = u1r1r4v2 and z = v1r3r2u2.

(13)

Splicing Scheme and Splicing Operation II

For a language L over V and a splicing scheme (V, R) we set spl(L, R) = {w | (u, v) ⊢r w, u ∈ L, v ∈ L, r ∈ R}.

For two language families L1 and L2 we set

spl(L1,L2) = {L | L = spl(L1, L2) for L1 ∈ L1

and a splicing scheme (V, R) with R ∈ L2}.

(14)

Splicing Operation – Examples

L = {aⁿbⁿ | n ≥ 0} and R = {a#b$a#b}

spl(V, R) = {aⁿb^m | n ≥ 1, m ≥ 1}

L ⊂ V ^∗ arbitrary, L^′ ⊂ V ^∗ arbitrary, (V ∪{c}), R), R = {#xc$c# | x ∈ L^′} spl(L{c}, R) = {w | wz ∈ L for some z ∈ L^′}

{aⁿbⁿ} ∈/ spl(L(REG),L(RE))

(15)

Generative Power of the Splicing Operation

Theorem:

The following table holds where where at the intersection of the row marked by X and the column marked by Y we give Z if L(Z) = spl(L(X),L(Y )) and Z1/Z2 if L(Z1) ⊂ spl(L(X),L(Y )) ⊂ L(Z2).

F IN REG CF CS RE

F IN F IN F IN F IN F IN F IN REG REG REG REG/CF REG/RE REG/RE

CF CF CF RE RE RE

CS RE RE RE RE RE

RE RE RE RE RE RE

(16)

Some Lemmas I

Lemma:

For any language families L1,L2,L^′₁,L^′₂ with L1 ⊆ L^′₁ and L2 ⊆ L^′₂, we have spl(L1,L2) ⊆ spl(L^′₁,L^′₂).

Lemma:

If L1 is closed under concatenation with symbols, then L1 ⊆ spl(L1,L2) for all language families L2.

Lemma:

If L is closed under concatenation, homomorphism, inverse homomorphisms and intersections with regular sets, then spl(L,L(REG)) ⊆ L.

(17)

Einige Lemmata II

Lemma:

If L is closed under homomorphism, inverse homomorphisms and intersections with regular sets, then spl(L(REG),L) ⊆ L.

Lemma:

For any recursively enumerable language L, there are context-free languages L1 and L2 such that L = {u | uv ∈ L1 for some v ∈ L2}.

Lemma:

For any recursively enumerable language L ⊂ V ^∗, there are a context- sensitive language Sprache L^′ and letters c1 and c2, which are not in V , such that L^′ ⊆ L{c1}{c2}^∗ holds, and for any w ∈ L there is a number i ≥ 1 such that wc1cⁱ₂ ∈ L^′.

(18)

Splicing Systems

Definition:

A splicing system is a triple G = (V, R, A), where – V is an alphabet,

– R is a subset of V ^∗#V ^∗$V ^∗#V ^∗, and – A is a subset of V ^∗.

Definition:

The language L(G) generated by a splicing system G is defined by the following settings:

– spl⁰(G) = A and splⁱ⁺¹(G) = spl(splⁱ(G), R)) ∪ splⁱ(G) for i ≥ 0, – L(G) = ∪_i≥0splⁱ(G).

Example:

G = ({a, b},{a#b$a#b},{(aⁿbⁿ)^m | n ≥ 1, m ≥ 1})

L(G) = {a^r¹b^s¹a^r²b^s² . . . a^r^mb^s^m | m ≥ 1, r_i ≥ 1, s_i ≥ 1, 1 ≤ i ≤ m}

(19)

Extended Splicing Systems

Definition:

i) An extended splicing system is a quadruple G = (V, T, R, A) where – H = (V, R, A) is a splicing system and

– T is a subset of V .

ii) The language generated by an extended splicing system G is defined as L(G) = L(H) ∩ T^∗.

Example:

G = ({a, b, c},{a, b},{#c$c#a},{c^maⁿbⁿ | n ≥ 1}

L(G) = {aⁿbⁿ | n ≥ 1}

Definition:

For two language families L1 and L2, we define Spl(L1,L2) (ESpl/L1,L2) as the set of all languages L(G) generated by some (extended) splicing system G = (V, R, A) (G = (V, T, R, A)) with A ∈ L1 and R ∈ L2.

(20)

The Power of Splicing Systems

Theorem:

The following table holds, where at the intersection of the row marked by X and the coloumn marked by Y we give Z if L(Z) = Spl(L(X),L(Y )) and Z1/Z2 if L(Z1) ⊂ Spl(L(X),L(Y )) ⊂ L(Z2).

F IN REG CF CS RE

F IN F IN/REG F IN/RE F IN/RE F IN/RE F IN/RE REG REG REG/RE REG/RE REG/RE REG/RE

CF CF CF/RE CF/RE CF/RE CF/RE

CS CS/RE CS/RE CS/RE CS/RE CS/RE

RE RE RE RE RE RE

(21)

The Power of Extended Splicing Systems

Theorem:

The following table holds, where at the intersection of the row marked by X and the coloumn marked by Y we give Z if L(Z) = ESpl(L(X),L(Y )).

F IN REG CF CS RE F IN REG RE RE RE RE

REG REG RE RE RE RE

CF CF RE RE RE RE

CS RE RE RE RE RE

RE RE RE RE RE RE

(22)

Some Lemmas III

Lemma:

For any language families L1,L2,L^′₁,L^′₂ with L1 ⊆ L^′₁ and L2 ⊆ L^′₂, we have ESpl(L1,L2) ⊆ ESpl(L^′₁,L^′₂).

Lemma:

If a language family L is closed under concatenation with symbols, then L ⊆ ESpl(L,L(FIN)).

Lemma:

L(REG) ⊆ ESpl(L(F IN),L(F IN)).

(23)

Some Lemmas IV

Lemma:

For any family L which is closed under union, concatenation, Kleene- closure, homomorphisms, inverse homomorphisms and intersections with regular sets, ESpl(L,L(FIN)) ⊆ L.

Lemma:

For any recursively enumerable language L ⊆ T^∗, there is an extended splicing system G = (V, T, R, A) with a finite set A and a regular set R of splicing rules such that L(G) = L.

Lemma:

For any extended splicing system G = (V, T, R, A), L(G) is a recursively enumerable set.

(24)

Some Measures of Descriptional Complexity – Definitions

Definition: i) For a splicing system G = (V, R, A) or an extended splicing system G = (V, T, R, A) we define the complexity measures r(G), a(G) and l(G) by

r(G) = max{|u| | u = u_i for some u1#u2$u3#u4 ∈ R, 1 ≤ i ≤ 4}, a(G) = #(A),

l(G) = max{|z| | z ∈ A}.

ii) For a language family L and n ≥ 1 and m ∈ {a, l}, we define the families L_n(r,L) and L_n(m,L) as the set of languages L(G) where G = (V, R, A) is a splicing system with r(G) ≤ n and A ∈ L and with m(G) ≤ n and R ∈ L, respectively.

Analogously, for m ∈ {r, a, l} and extended splicing systems, we define the sets L_n(em,L).

(25)

Results on Descriptional Complexities – Results

Theorem: For any n ≥ 1,

i) L(F IN) ⊂ L_n(r,L(F IN)) ⊂ Spl(L(F IN),L(F IN)), ii) Ln(r,L(F IN)) ⊂ Ln+1(r,L(F IN)):

Theorem: For L ∈ {L(REG),L(CF),L(RE)} and n ≥ 1, Ln(r,L) = L.

L_n(ea, L(REG)) = ESpl(L(F IN),L(REG)).

L1(el, L(REG)) ⊂ L_n(el,L(REG)) = ESpl(L(F IN),L(REG)):