Snapshots of DNA polymerase processing aberrant substrates:
Structural insights into abasic site bypass
&
polymerization of 5-‐alkynylated nucleotide analogs
Dissertation
zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften
(Dr. rer. nat.)
an der Universität Konstanz
Naturwissenschaftliche Sektion Fachbereich Chemie
vorgelegt von
Samra Obeid
2011
Referent 1: Prof. Dr. Andreas Marx Referent 2: Prof. Dr. Valentin Wittmann Referent 3 und Prüfungsvorsitz: Prof. Dr. Wolfram Welte Tag der mündlichen Prüfung: 04.05.2012
Teile dieser Arbeit sind veröffentlicht in:
Proc. Natl. Acad. Sci. U. S. A., 2010,
Obeid S, Baccaro A, Welte W, Diederichs K, & Marx A;
107:21327-‐21331. „Structural basis for the synthesis of nucleobase modified
DNA by Thermus aquaticus DNA polymerase.“
EMBO J., 2010, 29:1738-‐1747.
Obeid S, Blatter N, Kranaster R, Schnur A, Diederichs K, Welte W, & Marx A; „Replication through an abasic DNA lesion: structural basis for adenine selectivity.“
J. Biol. Chem., 2012, 287:14099-‐
Obeid S, Welte W, Diederichs K, & Marx A; „ Amino acid
14108 templating mechanisms in selection of nucleotides
opposite abasic sites by a family a DNA polymerase.”
Chem. Commun., 2012,
Obeid S, Bußkamp H, Welte W, Diederichs K, & Marx A DOI: 10.1039/c2cc34181f. „Interactions of non-‐polar and “Click-‐able” nucleotides in
the confines of a DNA polymerase.“
Weitere Publikationen:
ChemBioChem, 2011, 12:1574-‐1580. Obeid S, Schnur A, Gloeckner C, Blatter N, Welte W,
Diederichs K , & Marx A; „ Learning from Directed Evolution: Thermus aquaticus DNA Polymerase Mutants with Translesion Synthesis Activity.“
Angew. Chem., Int. Ed. Engl., 2008,
Obeid S, Yulikov M, Jeschke G, & Marx A;
47:6782-‐6785. „Enzymatic synthesis of multiple spin-‐labeled DNA.“
Nucleic Acids Sym Ser (Oxf)., 2008,
Obeid S, Yulikov M, Jeschke G, & Marx A;
52:373-‐374. „Enzymatic synthesis of multi spin-‐labeled DNA.“
Danksagung
Die vorliegende Arbeit entstand in der Zeit von Nov 2007 bis Dez 2011 in der Arbeitsgruppe von Prof. Dr.
Andreas Marx am Lehrstuhl für Organische und Zelluläre Chemie im Fachbereich Chemie der Universität Konstanz.
In erste Linie möchte ich mich ganz herzlich bei Herrn Prof. Dr. Andreas Marx für die Vergabe eines sehr vielseitigen und interessanten Promotionsthemas bedanken. Insbesondere möchte ich mich für die Betreuung und Unterstützung, die fordernd aber auch sehr motivierend waren, bedanken. Hierbei möchte ich vor allem das in mich gesetzte Vertrauen und die Freiheit zu selbstständigen Bearbeitung und Gestaltung des Themas erwähnen.
Ich möchte mich an dieser Stelle auch für die gute Zusammenarbeit mit der Arbeitsgruppe von Prof. Dr.
Welte und Prof. Dr. Diederichs bedanken. Ich wurde in Ihrem Arbeitskreis herzlich aufgenommen und tatkräftig unterstützt.
Ich danke der ganzen Arbeitsgruppe Marx für die super Arbeitsatmosphäre. Mit jedem konnte ich wissenschaftliche Fragen diskutieren und alle waren dabei sehr offen und immer hilfsbereit. Danke dafür!
Ein offenes Ohr hatten vor allem meine Laborkollegen Sascha Keller, Anna Baccaro und Holger Bußkamp für mich. Hierbei war der Schwierigkeitsgrad der Frage, ob wissenschaftlicher Natur oder nicht, immer zweitrangig. Ich bin euch dafür so unendlich dankbar.
Ein besonderer Dank gilt auch Bastian Holzberger, der mich schon während dem Studium begleitet und unterstützt hat. Ich werde nie das AC-‐III Lernen bei Riccardo Behr vergessen J! Ich danke dir auch für das zahlreiche Korrekturlesen von Paperversionen.
Ich möchte mich auch bei Nina Blatter, Ramon Kranaster, Andreas Schnur, Christian Glöckner und Anna Baccaro für die produktive Zusammenarbeit, die sogar mit Publikationen veredelt wurde, bedanken.
Allen Freunden, die ich hier leider nicht alle namentlich erwähnen kann, danke ich für die Unterstützung, sowie die schöne und lustige Zeit, die einem den Laborstress auch mal vergessen ließen J -‐ bin mir sicher, dass sich die richtigen Personen angesprochen fühlen!
Meinem Freund Christoph und meiner Familie möchte ich für die mentale Stütze und die Ablenkungen, sowohl während meines Studiums, wie auch während der Promotion, danken. Ohne euch wäre das niemals möglich gewesen!
Table of Contents
1 Introduction 1
1.1 History of DNA ... 1
1.2 DNA structure and characteristics ... 1
1.3 DNA function ... 2
1.4 History of DNA polymerases ... 2
1.5 DNA polymerases features and function ... 3
1.5.1 Biological role of DNA polymerases ... 3
1.5.2 DNA polymerase catalysis – the two metal ion mechanism ... 5
1.5.3 DNA polymerase fidelity and selectivity: Watson Crick base pairing vs. active site tightness ... 6
1.5.4 DNA polymerase as tool for molecular biology, biotechnology and diagnostics ... 8
1.5.5 Model system for sequence family A DNA polymerases: KlenTaq DNA polymerase .... 8
1.6 DNA lesions ... 9
1.6.1 Overview ... 9
1.6.2 Abasic site ... 10
1.7 Template independent incorporation at a blunt-‐ended DNA duplex ... 12
1.8 Functionalized DNA ... 12
1.8.1 Solid support synthesis of modified DNA ... 13
1.8.2 Enzymatic Synthesis of modified nucleotides ... 13
1.9 Crystallography ... 14
1.9.1 Synchrotrons as X-‐ray source ... 14
1.9.2 Protein crystals ... 14
1.9.3 From data collection to refined model ... 14
1.10 Concepts and Objectives ... 16
1.10.1 Elucidation of lesion bypass mechanism ... 16
1.10.2 Elucidation of the mechanism by which blunt-‐ended DNA is elongated ... 16
1.10.3 Elucidation of process of functionalized nucleotides by DNA polymerases ... 16
2 Results and Discussion 19 2.1 Abasic site bypass ... 19
2.1.1
KlenTaq follows the ‘A-‐rule’ ... 192.1.2 Crystal structure: KlenTaq bound to ddATP opposite an abasic site analog F (KlenTaq
F-‐ddATP) ... 19
2.1.2.1
Overall KlenTaqF-‐ddATP structure ... 21
2.1.2.2
Active site arrangement of KlenTaqF-‐ddATP ... 22
2.1.2.3
Tyrosine 671 mimics the absent nucleobase in the template strand ... 23
2.1.3 Purine selection: preference of adenosine over guanosine ... 25
2.1.4 Crystal structure: KlenTaq bound to ddGTP opposite an abasic site analog F (KlenTaq
F-‐ddGTP) ... 25
2.1.5 The unfavored cases: Pyrimidine incorporation opposite an abasic site ... 28
2.1.6 Crystal structure: KlenTaq bound to ddTTP opposite an abasic site analog F (KlenTaq
F-‐ddTTP) ... 28
2.1.7 Crystal structure: KlenTaq in presence of ddCTP and an abasic site analog F (KlenTaq
F-‐ddCTP-‐binary) ... 30
2.1.8 Crystal structure: binary complex of KlenTaq bound to primer/template construct containing an abasic site ... 31
2.1.9 Stacking probes: enhance the geometric fit to the active site ... 32
2.1.10 Crystal structure: KlenTaq bound to dNITP opposite an abasic site analog F (KlenTaq
F-‐dNITP) ... 33
2.1.11 Discussion ... 35
2.1.11.1
Abasic site bypass by different DNA polymerases ... 35
2.1.11.2
Role of Tyr671 in abasic site bypass by KlenTaq DNA polymerase and transfer to other A-‐ family DNA polymerases ... 36
2.1.11.3
Features of KlenTaqF-‐ddATP ... 36
2.1.11.4
Possible impact of the ‘A-‐rule’ ... 37
2.1.11.5
Relevance of the obtained KlenTaq structures in presence of an abasic site lesion ... 37
2.1.11.6
Selectivity of adenosine over guanosine opposite abasic site lesions ... 38
2.1.11.7
Geometric constrains and distinct hydrogen binding patterns account for the decrease in incorporation efficiency of pyrimidines opposite an abasic site lesion ... 40
2.1.11.8
Nucleobase analog dNITP lacking hydrogen bonding capability opposite an abasic site lesion ... 40
2.1.11.9
Relevance of the binary KlenTaq structure in presence of an abasic site ... 41
2.2 Template independent incorporation at a blunt-‐end DNA duplex ... 43
2.2.1 Nucleotide selectivity at a blunt-‐end DNA duplex ... 43
2.2.2 Crystal structure: KlenTaq bound to ddATP at a bunt end DNA duplex ... 44
2.2.3 Discussion ... 46
2.3 Functionalized nucleotides enabling numerous biomolecular applications ... 47
2.4 Incorporation of modified nucleotide analogs ... 47
2.4.1 Single nucleotide incorporation of C5 modified dNTPs ... 47
2.4.2 Structure of KlenTaq in Complex with DNA and C5 Modified dNTP. ... 49
2.4.3 Structure of KlenTaq in Complex with DNA and dT
spinTP. ... 51
2.4.4 Structure of KlenTaq in Complex with DNA and dT
dendTP. ... 52
2.4.5 Discussion ... 55
2.5 Elongation of modified nucleotide analogs ... 57
2.5.1 Acceptance of dT
alkyneTP or dC
alkyneTP by KlenTaq DNA polymerase ... 58
2.5.2 Structure of KlenTaq in complex with DNA and alkyne modified substrates ... 59
2.5.2.1
Structure of KlenTaq in complex with DNA and dTalkyneTP ... 61
2.5.2.2
Structure of KlenTaq in complex with DNA and dCalkyneTP ... 62
2.5.2.3
Structure of KlenTaq in complex with DNA and ddCalkyneTP ... 63
2.5.3 Increase in incorporation efficiency of the alkyne modified substrates by using a mutated KlenTaq variant ... 64
2.5.4 Discussion ... 65
3 Conclusive summary 69 4 Zusammenfassung 73 5 Materials and methods 79 5.1 General ... 79
5.1.1 Chemicals and solvents ... 79
5.1.2 Chromatography ... 79
5.1.3 Instrumental and chemical analysis ... 80
5.1.4 Chemical DNA synthesis ... 80
5.1.5 Ethanol precipitation ... 80
5.1.6 Determination of the extinction coefficient ε of modified nucleotide analogs ... 81
5.2 Chemical synthesis ... 81
5.2.1 5-‐Nitro-‐1-‐indolyl-‐2′-‐deoxyriboside-‐5′-‐triphosphate (dNITP) ... 81
5.2.2 5’-‐Tert-‐butyldiphenylsilyl-‐2’-‐deoxyuridine ... 81
5.2.3 2’,3’-‐dideoxyuridine ... 82
5.2.4 5-‐Iodo-‐2’,3’-‐dideoxyuridine ... 82
5.2.5 5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’,3’-‐dideoxyuridine (ddT
alkyne) ... 83
5.2.6 5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’,3’-‐dideoxycytidine (ddC
alkyne) ... 83
5.2.7 5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’,3’-‐dideoxyuridine-‐5’-‐triphophate (ddT
alkyneTP) . 84 5.2.8 5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’,3’-‐dideoxycytidine-‐5’-‐triphosphate (ddC
alkyneTP) 84 5.2.9 3’-‐O-‐acetyl-‐5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’-‐deoxyuridine (dT
alkyne) ... 84
5.2.10 3’-‐O-‐acetyl-‐5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’-‐deoxycytidine (dC
alkyne) ... 85
5.2.11 5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’-‐deoxyuridine-‐5’-‐triphosphate (dT
alkyneTP) ... 85
5.2.12 5-‐(2-‐(4-‐ethynylphenyl)ethynyl)-‐2’-‐deoxycytidine-‐5’-‐triphosphate (dC
alkyneTP) ... 86
5.3 Molecular biological Experiments ... 86
5.3.1 General procedures ... 86
5.3.1.1
Buffers and solutions ... 86
5.3.1.2
Gel electrophoresis ... 87
5.3.1.3
Protein concentration determination by Bradford ... 87
5.3.1.4
Protein concentration determination using SDS-‐PAGE analysis ... 88
5.3.1.5
Protein expression and purification ... 88
5.3.1.6
Radioactive-‐labeling of primers ... 88
5.3.2 Incorporation opposite an abasic site ... 88
5.3.2.1
Primer extension assay ... 88
5.3.2.2
Pre-‐steady state kinetics for incorporation of dNITP opposite an abasic site ... 89
5.3.3 Incorporation of functionalized nucleotides ... 89
5.3.3.1
Primer extension assay ... 89
5.3.3.2
Pre-‐steady state kinetics for incorporation of functionalized dTRMPs ... 89
5.4 Crystallization Experiments ... 90
5.4.1 General procedures ... 90
5.4.1.1
Buffers and solutions ... 90
5.4.1.2
Gene construct of KlenTaq DNA polymerase ... 90
5.4.1.3
Site-‐directed mutagenesis ... 90
5.4.1.4
Protein expression and purification ... 91
5.4.1.5
Protein crystallization ... 92
5.4.1.6
Data collection ... 93
5.4.2 Crystallization trials in the presence of an abasic site ... 93
5.4.3 Crystallization trials at the blunt-‐ended DNA ... 95
5.4.4 Crystallization trials in the presence of a functionalized nucleotide ... 95 5.4.5 Crystallization trials with the enthynylphenylethynyl modified pyrimidine analogs 96
6 Appendix 97
7 Abbreviation 99
1 Introduction
1.1 History of DNA
In 1944, Avery, MacLeod and McCarty were able to isolate DNA and support “the belief that a nucleic acid of the desoxyribose type is the fundamental unit of the transforming principle of Pneumocuccus Type III.” and thereby established DNA as the genetic material (1). Almost one decade later (1953) Watson and Crick postulated the three-‐dimensional structure of DNA (2, 3) (Figure 1) considering the X-‐ray data of Wilkins et al (4) and Franklin et al (5) as well as the Chargaff’s observation that
“the ratio of Adenine to Thymine and Guanine to Cytosine were nearly 1.0 in all species studied” (6). Based on chemical and stereo-‐chemical arguments Watson and Crick disproved the previously three-‐chain models proposed by Pauling and Corey (7).
Whereas the X-‐ray data of Wilkins and Franklin only verified the helical structure and a repeat of the polynucleotide composition, nearly three decades later Wing et al.
(1980) were able to crystallize and solve a structure of a self-‐complementary dodecamer (PDB-‐ID: 1BNA) (8, 9). These revolutionary findings added significantly to the understanding of DNA and to the possible copying mechanism of the genetic material.
1.2 DNA structure and characteristics
DNA (deoxyribonucleic acid) is a polymer consisting of four monomeric units (nucleotides). The nucleotides are composed of a phosphate, sugar and a base moiety, which is N-‐glycosidic bound to the sugar part. Furthermore, they can be classified into pyrimidines (thymidine (T) and cytidine (C)) and purines (adenosine (A) and guanosine (G)) adapted from the respective nucleobase (Figure 2A). Based on hydrogen bonds the nucleotides form specific base pairs. Thereby, A pairs with T and G with C (Watson-‐
Crick base pairing) resulting in two base pairs with nearly the same size (Figure 2B). Thus the three-‐
dimensional structure of DNA is a double helix with an alternating phosphate-‐sugar backbone and the base pairs in the core perpendicular to the common helix axis. Watson and Crick correctly concluded “if only specific pairs of the bases can be formed, it follows that if the sequence of bases on one chain is given, than the sequence on the other chain is automatically determined.” (2). Not only the ability to predict the sequence of the complementary strand by a given sequence is characteristic for DNA, it also forms a geometrically well-‐defined duplex structure with major and minor groove (Figure 1). These advantageous characteristics as self-‐assembly, hybridization specificity and the formation of a geometrically well-‐defined helical structure make DNA an interesting tool for various applications.
Figure 1 DNA double helix (PDB-‐
ID: 1BNA)
1.3 DNA function
With understanding the structure of DNA in 1953 simultaneously the function became obvious. Since then, the knowledge that every cell contains all the genetic information, allowing it’s functioning, was established. DNA serves as carrier of genetic information and provides the information to construct RNA molecules and proteins. Thereby the sequence of these four nucleobases along a sugar phosphodiester backbone encodes the information making DNA the “life molecule”.
1.4 History of DNA polymerases
Proposing the DNA double helix model Watson and Crick instantly pointed out: “It has not escape our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.”
(2). The discovery of DNA structure was the initial point for new field in biology, which is addicted to elucidate the genetic code. In this connection understanding the copy mechanism played a central role. Watson and Crick mentioned already “Whether a special enzyme is required to carry out the polymerization, or whether the single helical chain already formed acts effectively as an enzyme, remains to be seen.” (3). However, Arthur Kornberg (1918-‐2007), maybe the most famous enzymologists of that time, was convinced that an enzyme is responsible for the copy mechanism of DNA and started to search for it, which was then named DNA polymerase (DNA pol). After he identified dNTPs as the right substrate for the enzyme Figure 3 DNA pol (PDB-‐ID:
1DPI); shown are the finger-‐, thumb-‐ and palm-‐domain.
Figure 2 (A) Chemical structure of DNA buildings blocks. (B) Watson-‐Crick base pairs. Dashed lines indicate hydrogen bonding interaction.
performing the DNA synthesis, he was able to isolate, purify and characterize this enzyme (10, 11). Nearly one decade later Kornberg succeeded in the synthesis of a viral DNA and demonstrated that the fully synthetic circular DNA is still infectious (12). Based on Kornberg’s and co-‐workers effort another ten years later other DNA polymerases were found and it became obvious that the DNA replication process requires several DNA polymerases. 1985, Steitz O. and co-‐workers presented the first high-‐resolution structure of the Klenow fragment (or large fragment) of E. coli DNA pol I (13). Based on this crystal structure the DNA polymerase structure is associated with a cupped right hand containing a finger, thumb and palm sub-‐domain (Figure 3). The deep groove formed by the three sub-‐domains already suggests the DNA bining site. Subsequent crystallization of DNA polymerases in a ternary complex bound to a primer/template duplex and an incoming nucleotide followed. Together with functional studies the crystal structure helped to elucidate the basic mechanism of nucleotide incorporation by DNA pols.
1.5 DNA polymerases features and function
1.5.1 Biological role of DNA polymerases
DNA polymerases catalyze all DNA synthesis occurring in nature, which can be categorized in three main processes -‐ DNA replication, repair and recombination (14). Replicative polymerases -‐ members of the sequence family A and B -‐ copy a template strand by selectively incorporating nucleotide monophosphates to the 3’-‐primer terminus (15, 16). As mentioned before, the specific structure of complementary DNA double strands enables the copying of genetic information in a template directed manner (Figure 4).
The replisome, a multimeric protein complex, facilitates the replication in E. coli comprising enzymes and proteins with different functions (17-‐19). Starting from the oriC (origin of replication) the genomic DNA is unwound by the ATP-‐dependant DnaB helicase. The resulting two single DNA strands are stabilized by SSB proteins (single strand binding proteins) (20). The unwinding of the helical structure triggers the release of topoisomerases (21). Next, short RNA oligonucleotides (primers) of roughly 12 nucleotides
Figure 4 Scheme of enzyme catalyzed DNA synthesis in 5’→3’ direction
complementary to the template strand are synthesized by a specialized RNA polymerase DnaG and DnaB form the so-‐called primosome (22). Then the DNA polymerase III (pol III) holoenzyme complex is assembled at the replication fork containing ten different proteins. These proteins can be grouped into three major groups: (i) the catalytic core of pol III, (ii) the sliding β-‐clamp, and (iii) the γ-‐clamp loader (19). Synthesis of a new DNA strand always proceeds in 5’→3’ direction, thereby the 5’-‐end of the incoming nucleotides are coupled to the free 3’-‐hydroxyl group of the growing primer strand. The existence of two core units in the holoenzyme ensures simultaneous replication of both single stands.
Upon binding of the β-‐clamp the processivity – the incorporation number of nucleotides before dissociation – is drastically increased allowing the polymerization of more than 50 kb before the next dissociation event occurs (19, 20). The β-‐clamp is a ring-‐like protein dimer encompassing the DNA strand tethering the core enzyme to the DNA. Since DNA synthesis can only take place in 5’→3’ direction, the simultaneous copying of both parental template strands cannot be conducted in a continuous way on both strands. While the replisome is moving along the DNA, pushing the replication fork forward, the leading strand is copied continuously, whereas the lagging strand is copied in fragments of roughly 1000 nucleotides (Okazaki fragments) (23, 24). The synthesis of lagging strand points away from the replication fork. After completion of an Okaziaki fragment the pol III has to dissociate from its template and bind to a freshly unwound single-‐stranded template section. To ensure the disassembly and reassembly of the β-‐
clamp the γ-‐clamp loader is essential and in this way leading pol III to a new primer/template complex (19, 20). Each Okazaki fragment starts with an RNA primer, which is removed by the intrinsic 5’-‐3’-‐
endonuclease activity of E. coli DNA polymerase I (pol I) after replication. Upon binding of pol I to the 3’-‐
end of an Okazaki fragment the RNA primer are successive hydrolyzed and causes simultaneous synthesis of DNA. In this way RNA primers are translated into DNA (nick translation). The remaining nicks are removed by ligases catalyzing the formation of phosphodiester bonds between the fragments.
Eucaryotes contain at least 15 different DNA pols and replication is facilitated in a similar manner.
However, two different DNA polymerases – DNA polymerase δ (pol δ) and DNA polymerase ε (pol ε) – are mainly responsible for the replication (20). It remains to be elucidated which polymerase is copying the leading and which the lagging strand. Several studies suggest that pol ε copies the leading strand in Saccharomyces cervisiae (16, 25). Eucaryotic DNA polymerase α (pol α) is part of the primosome and essential for initiation of replication, analog to the primase DnaG in prokaryots. Other important eukaryotic DNA polymerases are pol β, which closes gaps resulting from the repair of DNA lesions during base excision repair (BER) (26), and pol γ, which is responsible for mitochondrial replication (27, 28).
Further eucaryotic DNA polymerases play a role in DNA repair and translesion bypass like pol ζ, η, θ, ι, κ, λ, μ, σ and φ, which have been discovered during the last years (26, 29, 30).
1.5.2 DNA polymerase catalysis – the two metal ion mechanism
DNA polymerases incorporate nucleotides following a simplified five-‐step-‐
procedure. The first step is the binding of the primer/template duplex to the DNA polymerase forming the binary complex (Figure 5 step 1). In a second step the 2’-‐deoxynucleoside triphosphate (dNTP) binds to the complex resulting in the ternary complex (step 2). Upon nucleotide binding discrimination for canonic and mispaired nucleotides take place.
Thereby the correctly pairing nucleotides usually display a higher affinity to the complex. Furthermore, nucleotide binding triggers a conformational change of the polymerase binding pocket (step 3). At this level the so-‐called ‘induced fit’ mechanism (see chapter 1.5.3) provides further discrimination for the correct Watson-‐Crick base pair. It leads to a tighter enclosure of the dNTP in a pocket shaped to fit the correct nucleotide excluding water from the active site (31). This conformational change, which includes the closure of the finger domain, was discussed to be the rate-‐limiting step in DNA synthesis. However, recent studies have shown that local reorganizations in the active site are the rate-‐limiting factor (32). The closed conformation of the DNA pol places the reactive 3’-‐
hydroxyl group of the primer in an ideal position to enable a nucleophilic
attack to the 5’-‐α-‐phosphate group of the incoming dNTP. Two divalent metal ions (usually Mg2+) are octahedrally coordinated by three highly conserved side chains in the active site and the triphosphate of the incoming nucleotide. Metal ion A facilitates the deprotonation of the 3’-‐OH group, promoting the
Figure 5 Kinetic model of nucleotide incorporation. The various complexes are indicated as mentioned in the text. The rate constant of the rate-‐limiting step: k3 is indicated. E = DNA polymerase. Graphic adapted from Rothwell and Waksman 2005 (32).
Figure 6 The two-‐metal-‐ion mechanism of polynucleotide polymerases adapted from Steitz, 1998 (34). Two divalent metal ions, A and B are coordinated by two aspartic-‐acid residues in the active site (here D705 and D882 for E. coli DNA polymerase I). Water molecules bound to the metal ion A are shown as filled black circles.
nucleophilic attack of 3’-‐O- on the α-‐phosphate (step 4) (33). Metal ion B is mainly coordinated by the triphosphate moiety as well as the aspartic acid residues assisting the release of pyrophosphate (Figure 6). Both metal ions stabilize the structure and charge of the trigonal bipyramidal transition state of the α-‐
phosphate during the nucleophilic substitution (SN2 reaction) (34-‐37). Therefore, the mechanism is known as the ‘Two metal ion mechanism’. The phosphoryl transfer is completed by the release of pyrophosphate, which is subsequently hydrolyzed in aqueous solution pushing the equilibrium to the product side. Then the polymerase switches back into the open conformation (step 5). Afterwards, translocation of the polymerase along the template strand and incorporation of a second nucleotide can occur, otherwise dissociation from the primer/template complex can take place or – if present – edit the just incorporated nucleotide with intrinsic or extrinsic exonuclease proof-‐reading activity. Thereby the adjacent steps are reversibly connected.
1.5.3 DNA polymerase fidelity and selectivity: Watson Crick base pairing vs.
active site tightness
The accuracy of DNA synthesis is crucial for the maintenance of the genome stability. Therefore, replicative DNA polymerases are high fidelity DNA polymerases with low error rates. Studies found that replication in E. coli and bacteriophages displays a base substitution error rate of 10−7–10−8 per nucleotide in vivo in the absence of mismatch repair (38). The error rate can be improved to 10−8–10−10 in E. coli by proof-‐reading, mismatch repair and numerous other factors (16, 39) making DNA synthesis a highly accurate process. In general, the fidelity and selectivity of DNA polymerases is linked to their biological function and the organism from which they are derived. Translesion synthesis DNA polymerases show relative low fidelity e.g an error rate of approximately 1/10, meaning a remarkable one in ten error rate, for pol η (eta) (40). Even more remarkable is pol ι (iota), which inserts in vitro G opposite T rather than A opposite T (41). The difference in fidelity begs the question: What are the determinants of fidelity of these enzymes?
The elucidated structure of DNA by Watson and Crick immediately suggested a copy mechanism, since only specific base pairs are formed. Thereby, the idea was established that nucleotide selectivity of DNA polymerases is manifested in their hydrogen bonding capacity and the formation of the correct base pairing according to Watson and Crick (Figure 2). Thus the Watson-‐Crick hydrogen bonds between canonic base pairs have been thought to account for the high accuracy in DNA synthesis (2, 3). However, the small free energy barriers between machted and mismachted base pairs showed that the selectivity of DNA polymerases depends not primarily on hydrogen bonding capability. Moreover, the base pair geometry contributes to the DNA polymerase selectivity. The active site of a DNA polymerase is designed in way to accept Watson-‐Crick base pairs or base pairs imitating the geometry of such a base pair.
For instance, extensive studies with designed isosteric nucleotide analogs lacking hydrogen bonding capability, but showing enhanced stacking capacity have shown that efficient and selective DNA synthesis is possible (37, 42-‐46). With these steric probes in hand one could show that Watson-‐Crick hydrogen bonds are not the only important factor assigning selectivity. In addition to hydrogen bond interaction between the minor groove and the protein, base stacking and solvation effects, but especially sterical effects are taken into account to explain the highly accurate performance of DNA polymerases. Based on these and other (47) findings Kool postulated the model of ‘active site tightness and substrate fit in DNA replication’ (37, 42, 43, 48). Therefore, he defined first the active site binding pocket. The analysis of crystal structures suggests that the active site of selective polymerases forms a tight binding pocket whose geometry is complementary to the respective canonical dNTP (49-‐54) depending on the respective template base (42, 55). In addition, it could be shown that the canonical base pairs only slightly differ in their shape and size (Figure 7and Figure 8). Their geometric constraints show minor alterations in the minor and major groove defining, but no variation in the over-‐all length (48, 56) (Figure 8A), suggesting a size exclusion hypothesis that a base pair must fit into the consensus
base pair shape. Finally, the incoming nucleotide placed opposite the templating nucleobase have to fit into the geometric constrains defined by the minor and major groove sides, as illustrated by the consensus base pair shape (Figure 8B). In the case of non-‐canonical base pairing steric clashes can occur. Thereby
Figure 7 Schematic diagram illustrating the space-‐filling shapes of the four base pairs in isomorphous orientation. Graphic is adapted from Kool 2002 (42).
Figure 8 (A) Schematic diagram showing the overlay of the four base pair shapes. The variability are marked by arrows at sides of major groove and in center of minor groove. R represents deoxyribose and phosphodiester backbone. (B) Overlay showing the consensus largest dimensions along the outer surface. The graphic is adapted from Kool 2002 (42).
either the incoming nucleotide can not enter the binding pocket or if it does partially insert itself it will not allow the triphosphate moiety to be aligned correctly for efficient phosphodiester bond formation.
However, in this scenario one should not forget that the tightness of the active site binding pocket is only defined in the closed conformation of the enzyme. The closure of the active site is likely only for correctly shaped pairs, thus the steric clashes resulting from non-‐canonical base pairing might prevent the closed conformation, rather than the reverse, where the formed active site prevents the steric clashes.
However, the theory of active site tightness does not exclude the contribution of further non-‐covalent interactions, such as base stacking, it only completes the picture, how DNA polymerases perform accurate DNA synthesis.
1.5.4 DNA polymerase as tool for molecular biology, biotechnology and diagnostics
Functional studies added significantly to the understanding of DNA polymerase reaction and elucidated an universal basic reaction mechanism (34). For efficient catalysis the DNA polymerase needs (i) the four natural nucleotides, (ii) a DNA template directing the incorporation events, (iii) a DNA primer hybridizing to the template strand and harboring a 3’-‐OH group, (iv) a divalent metal ion e.g. magnesium as cofactor, and (v) in some cases DNA pol auxiliary proteins such as PCNA (proliferating cell nuclear antigen). With understanding how DNA polymerases function their applications in research area such as genetics or diagnostics explored. The major breakthrough was the proposed concept of the polymerase chain reaction (PCR) by Saiki and Mullis in 1985 (57-‐59). After several rounds of optimization, the method enables in vitro amplification of target DNA sequences by using thermostable DNA polymerases, 2’-‐
deoxyribonucleotides and short oligonucleotides as primers. Thereby, exponential amplification of the original genetic material is achieved by successive temperature cycles causing DNA denaturation, primer annealing and DNA polymerization (57, 58). Because of this application DNA polymerases reaped the following title ‘The Molecule of the Year’ in 1989 (60, 61).
To date, numerous PCR-‐based methods cater for specialized needs. The possibility to amplify, modify, analyze or tag DNA by simple experimental set-‐ups has been of great benefit in the fields of genetics, medicine and diagnostics. Further progression in this direction elucidated the time-‐resolved amplification of genetic material in real-‐time PCR, also known as quantitative PCR (62) or the use of error prone PCR to create mutant protein libraries (63).
1.5.5 Model system for sequence family A DNA polymerases:
KlenTaq DNApolymerase
The large fragment of Thermus aquaticus (Taq) DNA polymerase (in short KlenTaq, N-‐terminally truncated form of Taq polymerase (aa 293-‐832)) is the ortholog of DNA polymerases I from E. coli and belongs to the A-‐family of DNA polymerases. KlenTaq DNA polymerase shows the characteristic right-‐hand shaped structure consisting of a finger, thumb and palm sub-‐domain. DNA polymerase I enzymes are involved in nucleotide excision repair and in the processing of Okazaki fragments in procaryotes. With a reaction temperature optimum of 75-‐80°C KlenTaq DNA polymerase is applicable in various experiments such as PCR. Since this enzyme class is heavily employed and well characterized on a functional and structural level (32, 52, 54, 64-‐67), it is used in this study as a model system for DNA polymerases from sequence
family A. G. Waksman and coworkers were able to crystallize KlenTaq DNA polymerase in apo form as well as in complex with suitable substrates (52, 54, 65). Based on these structures the enzyme substrates interactions in the different reaction states during nucleotidyl transfer are well known. Starting with DNA binding the thumb domain changes forming a cylinder-‐like crevice, which almost completely surrounds the primer/template complex (binary complex: pdb 4KTQ). This fact explains the very low DNA dissociation rate. By the entrance of a nucleotide, the finger domain switches from the open conformation (open ternary complex: pdb 2KTQ) to the closed from (closed ternary complex: pdb 3KTQ). Thereby the O helix of the finger domain packs against the nascent base pair and closes the active site. A tight binding pocket is formed aligning all components required for catalysis. After the nucleotidyl transfer reaction the KlenTaq DNA polymerase changes back to the open confromation releasing the pyrophosphate, followed by translocation of the polymerase along the DNA.
1.6 DNA lesions
1.6.1 Overview
Endo-‐ and exogenous agents constantly damage DNA. For instance, exposure to UV radiation, alkylating agents and oxidative species leads to the formation of abasic sites, pyrimidine dimers, alkylated adducts and oxidative lesion products. To maintain the genomic integrity and reduce the mutagenic potential cells allocate with multiple repair pathways and specialized enzymes. However, several health statistics could show that DNA lesions can be highly mutagenic and sometimes carcinogenic e.g. in Europe in 2000 ∼35 000 new cases of UV radiation damage-‐induced skin cancer were diagnosed (68). Further, the tobacco-‐
derived nitrosamine NNK is associated with lung cancer resulting in ∼334 800 deaths in Europe in 2006 (69). Therefore the biological prevalence of the DNA lesions and their chemical structures need to be determined. The main aspect here are (i) identification and quantification of DNA lesions in model systems and in vivo, (ii) to assess influences of lesions on physical properties of DNA e.g. thermal stability, and (iii) to elucidate the impact of the lesions on DNA function e.g. enzyme-‐mediated processes such as replication.
Within the last decade years specialized DNA polymerases, responsible for translesion DNA synthesis (TLS), were identified and characterized. W. Yang and R. Woodgate published a clear summary of this class of enzymes emphasizing the relationship of the bypass properties and the structural features (29). In brief, many of the TLS enzymes are member of the Y-‐family of DNA polymerases exhibiting universal features to manage bypass of a variety of DNA lesions. In a simplified model TLS polymerases can be categorized into two classes. The first class of enzyme is highly specialized and responsible for bypassing a certain DNA lesion e.g. the human pol η is able to bypass thymine-‐thymine cyclobutane dimer with high efficiency. Interestingly, patients showing mutations or defects in the human pol η gene suffer from sunlight-‐sensitive and cancer-‐prone Xeroderma pigmentosum variant (XP-‐V) syndrome (70, 71). The second class of enzymes is the all-‐rounder and has the ability to accommodate different DNA lesions e.g.
the archaeal Dpo4 DNA polymerase from the Y-‐family. A series of structural studies show this low fidelity polymerase bound to damaged substrates such as oxidative damage (72, 73), UV cross-‐linking (74),
benzo-‐[a]pyrene diol epoxide adduct (BPDE) (75), and abasic site lesions (76). However, efficient catalysis is mainly observed in case of an abasic site lesion(76, 77).
1.6.2 Abasic site
The most common DNA damage under physiological conditions are abasic sites resulting mainly from spontaneous hydrolysis of the N-‐glycosidic bond between the sugar moiety and the nucleobase in DNA (78). Abasic sites also occur as intermediates during excision repair of damaged nucleotides (79) or can be manifested in several chemical structures such as C4’-‐oxidized abasic site (C4-‐
AP) after treatment of DNA with antitumor antibiotics like bleomycin (80, 81). The abasic site L (2’-‐
deoxyribonolacetone) results from one-‐electron nucleotide oxidation (82, 83). In general, it has been estimated that 10000 abasic sites are formed in human cell per day (78,
84, 85). Guanine and adenine nucleobases are cleaved most efficiently resulting in the abasic sugar moiety (AP, Figure 9A). To investigate the biochemical impact of AP a stabilized tetrahydrofuran analog is used as a model.
Since the genetic information gets lost by the cleavage of the nucleobase, abasic sites bear a high mutagenic potential (85-‐87). To face this problem nature offers a whole arsenal of enzymes and possible pathways. In most cases, the lesion is removed by DNA repair systems using the sister strand to guide for incorporation of the right nucleotide. However, undetected lesions or those, formed during S phase, pose a challenge to DNA polymerases and block replication (26, 88). Additionally, it was found that the mutagenic potential of these lesions in translesion synthesis is more pronounced in animal compared with bacterial cells presumably because of higher translesion synthesis in eukaryotes (87, 89, 90).
A set of studies concerning the behavior of DNA polymerases, belonging to different families, showed that there are multiple mechanisms to overcome an abasic site. Most translesion DNA polymerases from family X and Y follow various loop out mechanisms (76, 77, 91-‐94). Thereby, the nucleotide selection is influenced by the following upstream templating bases resulting in deletions and complex mutation spectra. Recently, an amino acid templating mechanism was found for the “error-‐free” bypass of an abasic site by the yeast Rev1 DNA polymerase belonging to the family Y (95). Since guanine is cleaved most efficiently (85), the preference of Rev1 for dCMP incorporation opposite an abasic site represents the
“best-‐guess”.
In contrast, in vitro and in vivo studies of the replicative DNA polymerases from family A (including human DNA polymerases γ and θ) and B (including human DNA polymerases α, ε and δ) in the presence of the stabilized tetrahydrofuran abasic site analog F (Figure 9D) have shown that purines, in particular adenosine, and to a lesser extent guanosine, are most frequently incorporated opposite the lesion. The strong preference for adenosine incorporation opposite an abasic site has been termed ‘A-‐rule’ (89, 91, Figure 9 Structures of different forms of abasic DNA lesions.
96-‐104). The apparent selectivity for incorporation of purines ultimately results in transversion mutations commonly found in human cancers (86).
However, the determinants of the ‘A-‐rule’ are still controversially discussed. Structural and functional studies have added significantly to our understanding of the basic mechanisms of translesion synthesis by DNA polymerases (29, 105). Since in the canonical case several points account for the high selectivity of DNA polymerases. Besides Watson-‐Crick base paring, stacking interaction and solvation properties the induced fit to the active site of an enzyme plays a paramount role. Sine the Watson-‐Crick recognition cannot take place in the presence of an abasic site it seems obvious that the other properties account for the selectivity opposite an abasic site. Therefore, superior stacking as well as solvation properties of adenine have been discussed to be the driving force behind the adenine selection (47, 99, 106, 107). Based on this assumption numerous of non-‐natural nucleotide analogs were studied regarding their behavior in the presence of an abasic site. If the induced fit model is taken as a selection criteria opposite abasic sites, a non-‐natural nucleotide analog with nearly identical size to the Watson-‐Crick base pair, should show the highest incorporation efficiency. By steric examination Matray and Kool identified the pyrene nucleoside triphosphate (dPTP) as a perfect match in the absence of a templating base (Figure 10) (47). Indeed, they could show that the pyrene modified nucleotide is incorporated by DNA polymerase I from E. coli with higher efficiency than any other natural nucleotide, demonstrating that a simple steric model is sufficient for efficient incorporation. Further the fluorescent nucleobase analog is used to identify and sequence abasic site lesions in DNA. Studies of several nucleotide
analogs identified 5-‐nitro-‐1-‐indoyl-‐nucleotide (dNITP) as the ‘specific partner’ opposite an abasic site, since dNIMP is incorporated with increased efficiency by RB69 DNA polymerase, a α-‐like DNA polymerase, compared to dPMP (106). The structure of RB69 DNA polymerase capturing an artificial 5-‐nitro-‐1-‐indoyl-‐nucleotide (dNITP) opposite an abasic site in the active site of the enzyme elucidated that a dipole-‐induced dipole stacking interaction between the 5-‐
nitro group and base 3′ to the templating lesion might explain the increase in incorporation efficiency. These findings might suggest that base stacking is likely to have a paramount role in the selective incorporation of dAMP opposite abasic sites (106). DNA polymerases from family A and B, which are involved in the majority of DNA synthesis in DNA replication and repair, follow the A-‐rule when bypassing abasic sites. Up to now only a structure from RB69 DNA polymerase, a member of the B sequence family, was reported showing an incoming guanosine opposite an abasic site lesion (97). This structure is a hybrid structure between the enzyme in replicating and editing or apo mode.
The approach to crystallize RB69 DNA polymerase
Figure 10 (A) Chemical Structures of dPTP and the abasic site analog F. (B) Space-‐filling models of the A-‐T (top) and the P-‐F (bottom) base pairs in B-‐form geometry, illustrating the steric fit of pyrene opposite an abasic site. The graphic is adapted from Matray and Kool 1999 (47).