• Keine Ergebnisse gefunden

X-ray crystallographic analysis of the archaeal transcriptional regulator TrmB and development of a graphical user interface for the monochromatic diffraction data processing software XDS

N/A
N/A
Protected

Academic year: 2022

Aktie "X-ray crystallographic analysis of the archaeal transcriptional regulator TrmB and development of a graphical user interface for the monochromatic diffraction data processing software XDS"

Copied!
226
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

X-ray crystallographic analysis of the archaeal transcriptional regulator TrmB and development of a

graphical user interface for the monochromatic diffraction data processing software XDS.

Dissertation

submitted to the

Department of Biology, University of Konstanz, Germany for the degree of

Doctor of Natural Sciences

presented by

Dipl. Biol. Michael Krug

First referee: Prof. Wolfram Welte Second referee: Prof. Winfried Boos

Date of oral examination: 25. May 2009

Konstanzer Online-Publikations-System (KOPS)

(2)
(3)

Contents

Zusammenfassung XV

Summary XIX

1 Introduction to archaeal transcription and its regulators 1

1.1 Archaea . . . 1

1.1.1 Informational Macromolecules: The birth of sequence alignments . . . 1

1.1.2 The Methanogens . . . 2

1.1.3 Discovery of the Archaea. . . 2

1.1.4 Signatures of Archaea . . . 3

1.1.5 The order Thermococcales . . . 4

1.1.6 The archaeal transcription apparatus . . . 5

1.1.6.1 The eukaryal RNAP II machinery . . . 5

1.1.6.2 The archaeal RNAP machinery . . . 6

1.1.7 Archaeal transcription regulators . . . 10

1.1.7.1 Archaeal proteins of the Lrp family . . . 10

1.2 DNA . . . 12

1.2.1 The B-DNA. . . 12

1.2.1.1 The B-DNA helix has major and minor grooves . . . 12

1.2.1.2 Specifically recognizable base sequences in B-DNA . . . 13

1.3 The helix-turn-helix motif . . . 15

1.3.1 Structural scaffold of the HTH domain and its elaborations . . . 17

1.3.1.1 The basic HTH domain consists of a three-helical bundle . . 17

1.3.1.2 The tetra-helical version of the HTH domain . . . 19

1.3.1.3 The ribbon-helix-helix (RHH) fold of transcription factors . . 19

1.3.1.4 The winged HTH domain . . . 19

1.4 Interaction between HTH proteins and DNA . . . 23

1.4.1 Protein dimers bind to (pseudo)palindromic DNA sites . . . 23

1.4.2 The DNA may become distorted upon binding . . . 24

1.4.3 The wing of wHTH proteins is inserted into the minor groove (with exceptions) . . . 25

1.4.4 Allosteric control of DNA binding . . . 30

1.4.5 Kinetics of protein-DNA recognition . . . 30

(4)

2 Introduction to the methods of crystallography 33

2.1 Protein crystals . . . 33

2.1.1 Symmetry. . . 34

2.1.2 Asymmetric unit . . . 35

2.2 Basics of crystallography . . . 35

2.2.1 X-ray diffraction by one electron. . . 38

2.2.2 Scattering by a two electron system . . . 39

2.2.3 Scattering by atoms . . . 40

2.2.4 Scattering by a crystal . . . 40

2.2.5 Thermal parameter B . . . 42

2.2.6 Reciprocal space and the Ewald sphere . . . 43

2.2.7 Mosaicity . . . 45

2.2.8 Calculation of electron density . . . 45

2.2.9 The Patterson function . . . 47

2.2.10 Single or Multiple Isomorphous Replacement . . . 48

2.2.10.1 Multiple Isomorphous Replacement (MIR) . . . 49

2.2.10.2 Treatment of errors . . . 51

2.2.11 Single Isomorphous Replacement with Anomalous Scattering (SIRAS) 53 2.2.12 Multiple wavelength Anomalous Dispersion (MAD) . . . 55

2.2.13 Molecular Replacement . . . 56

2.2.13.1 Patterson methods . . . 57

2.2.13.2 Maximum likelihood method . . . 59

2.2.13.3 Packing check . . . 61

2.2.14 Quality criteria . . . 61

2.2.14.1 Data quality . . . 61

2.2.14.2 Model quality . . . 62

2.3 A brief introduction to data integration by XDS . . . 62

2.3.1 Localizing diffraction spots . . . 63

2.3.2 Basis extraction . . . 63

2.3.3 Indexing . . . 64

2.3.4 Integration . . . 65

2.3.4.1 Spot extraction and standard profiles . . . 66

2.3.4.2 Background . . . 66

2.3.4.3 Intensity estimation . . . 67

3 Crystal Structure of the archaeal transcriptional regulator TrmB 69 3.1 Introduction . . . 69

3.2 Crystal structure of the Sugar Binding domain of TrmB . . . 72

3.2.1 Introduction. . . 72

3.2.2 Material and Methods . . . 72

3.2.2.1 Construction of truncated TrmBs . . . 72

3.2.2.2 Site-directed mutation analysis in the sugar binding motif of TrmB . . . 73

3.2.2.3 Protein purification and molecular sieve chromatography . . 73

(5)

3.2.2.4 Protein purification prior to crystallization experiments . . . 74

3.2.3 Protein crystallization and data collection . . . 74

3.2.3.1 Crystal structure determination and refinement . . . 75

3.2.3.2 Sugar binding and inhibition assay of TrmB . . . 75

3.2.4 Results . . . 77

3.2.4.1 Purification of truncated TrmB . . . 77

3.2.4.2 Sugar binding activity of TrmB∆2−109 . . . 78

3.2.4.3 Crystal structure of TrmB∆2−109 . . . 78

3.2.4.4 Maltose binding in TrmB∆2−109 . . . 81

3.2.5 Discussion . . . 81

3.2.5.1 Comparison with eubacterial binding proteins and transcrip- tional repressors . . . 81

3.2.5.2 The TrmB∆2−109 structure represents a novel sugar-binding structure . . . 84

3.2.5.3 The TrmB∆2−109 structure in the light of the in vitro DNA binding and transcriptional function of TrmB . . . 85

3.3 Structure of TrmB with bound sucrose . . . 87

3.3.1 Introduction. . . 87

3.3.2 Materials and methods . . . 87

3.3.2.1 Protein purification . . . 87

3.3.2.2 Dynamic light scattering (DLS) . . . 88

3.3.2.3 Protein crystallization . . . 88

3.3.2.4 Data collection and data analysis . . . 89

3.3.2.5 Crystal structure determination and refinement . . . 90

3.3.3 Results . . . 90

3.3.3.1 Protein purification . . . 90

3.3.3.2 Crystallization and data collection . . . 93

3.3.3.3 Crystal structure of TrmB . . . 94

3.3.4 Discussion . . . 97

3.3.4.1 Possible DNA-binding modes . . . 101

3.3.5 Outlook . . . 108

4 XDSi - A Graphical User Interface for XDS and beyond 109 4.1 Introduction . . . 109

4.1.1 XDS . . . 109

4.1.1.1 Processing steps of XDS . . . 110

4.1.2 Why XDSi? . . . 114

4.1.3 Tcl and Tk . . . 114

4.2 XDSi . . . 116

4.2.1 Starting processing in XDSi . . . 118

4.2.2 XDS Mode . . . 120

4.2.3 XDS POINTLESS mode . . . 122

4.2.4 XDS Fullauto mode . . . 123

4.2.5 XDSi Plots . . . 127

(6)

4.2.6 Additional functions of XDSi . . . 130

4.2.6.1 Write XDS.INP . . . 130

4.2.6.2 Plot statistics . . . 131

4.2.6.3 Compare two CORRECT.LPs . . . 131

4.2.6.4 Compare all CORRECT.LPs . . . 131

4.2.7 Requirements for using XDSi . . . 131

4.2.8 Errors . . . 134

4.3 Outlook . . . 135

References 162

A Amino acid nomenclature 163

B List of abbrevations 165

C XDSi example plots 167

D XDSi sourcecode 179

E List of publications 203

(7)

List of Figures

1.1 Phylogenetic tree of life. . . 3

1.2 Eukaryal RNAP II machinery. . . 7

1.3 Structure of DNA with bound TBP and TFPc . . . 8

1.4 Structure of B-DNA. . . 13

1.5 Schematic diagram of major and minor groove of B-DNA. . . 14

1.6 Major and minor groove of B-DNA . . . 14

1.7 “Edges” of the DNA base pairs . . . 15

1.8 Base pair recognition patters of B-DNA . . . 16

1.9 Sequence-specific DNA recognition sites for three restriction enzymes . . . 16

1.10 The basic HTH motif . . . 18

1.11 The RHH motif . . . 20

1.12 The wHTH motif . . . 21

1.13 HTH and its elaborations . . . 22

1.14 Homodimer of Repressor of phageλbound to DNA . . . 26

1.15 Crystal structure of CAP-DNA complex . . . 26

1.16 Crystal structure of Lac-DNA complex . . . 27

1.17 Structure of RFX DNA binding domain in complex with DNA . . . 29

2.1 Unit cell . . . 33

2.2 A set of unit cells . . . 34

2.3 Precision image . . . 36

2.4 Fourier synthesis . . . 37

2.5 X-ray diffraction by one electron . . . 38

2.6 X-ray diffraction by two electrons . . . 39

2.7 VectorS . . . 40

2.8 Atomic scattering factor f for carbon. . . 41

2.9 Bragg‘s law for scattering by a crystal . . . 42

2.10 The reciprocal lattice . . . 44

2.11 The Ewald construction . . . 44

2.12 Argand diagram . . . 46

2.13 Patterson function . . . 48

2.14 Patterson-vector superposition . . . 50

2.15 Harker construction for SIR. . . 50

2.16 Harker construction for MIR . . . 51

2.17 Treatment of phase errors . . . 52

2.18 Anomalous scattering . . . 53

(8)

2.19 Anomalous Scattering breakes Friedels law . . . 54

2.20 Vector diagram illustrating anomalous scattering. . . 54

2.21 Harker construction for SIRAS . . . 55

2.22 Structure factor in a MAD experiment . . . 56

3.1 Gene cluster encoding the binding protein-dependent ABC transporter for tre- halose/maltose inT. litoralisand pyr. . . 70

3.2 Gene cluster encoding the maltodextrin ABC transporter inP. furiosus. . . 71

3.3 Molecular sieve chromatography of TrmB∆2−109and protein profile . . . 79

3.4 Binding of maltose, sucrose, maltotriose and glucose by TrmB∆2−109. . . 80

3.5 Crystal structure of TrmB∆2−109 . . . 82

3.6 Detail of the maltose binding site of TrmB∆2−109 . . . 83

3.7 Aromatic residues of the interdomain interface. . . 84

3.8 Putative dimer structure of TrmB∆2−109 . . . 86

3.9 TrmB crystal . . . 94

3.10 Packing of a TrmB crystal . . . 95

3.11 Structure of TrmB . . . 96

3.12 Superposition of the sugar binding domain of TrmB∆2−109 and TrmB . . . 97

3.13 Dimerization of TrmB . . . 98

3.14 Dimerization of TrmB . . . 98

3.15 Sequence alignment of TrmB, Sto12a, Sso10a and 1SFX . . . 99

3.16 Trmblike proteins . . . 100

3.17 Structure of BmrR with bound DNA . . . 101

3.18 DNA binding part of BmrR . . . 102

3.19 Model of possible TrmB-DNA interaction in case of the TM system . . . 103

3.20 Model of a possible TrmB-DNA interaction in case of the TM system . . . 104

3.21 Model of a possible TrmB-DNA interaction in case of the MD system . . . 105

3.22 Tyr50in case of the proposed binding mode for the TM system. . . 106

3.23 Tyr50in case of the proposed binding mode for the MD system.. . . 106

3.24 Residue 87 in the coiled coil . . . 107

4.1 The XDSi window . . . 117

4.2 Cutout of XDS.INP . . . 118

4.3 Flowchart XDSi . . . 119

4.4 The infobox . . . 119

4.5 Infobox after BROWSE . . . 120

4.6 Change XDS.INP . . . 121

4.7 Infobox during a XDS POINTLESS run . . . 122

4.8 Infobox displaying spacegroup and unit cell parameters . . . 123

4.9 Flowchart XDS POINTLESS . . . 124

4.10 Infobox during XDS Fullauto run. . . 125

4.11 Rmeasof all data sets inResultdir . . . 126

4.12 I/Sigma of 2 CORRECT.LPs . . . 132

4.13 Rmeasof 2 CORRECT.LPs . . . 133

(9)

C.1 Example of a plot described as No. 1 in section 4.2.5 . . . 168

C.2 Example of a plot described as No. 2 in section 4.2.5 . . . 168

C.3 Example of a plot described as No. 3 in section 4.2.5 . . . 169

C.4 Example of a plot described as No. 4 in section 4.2.5 . . . 169

C.5 Example of a plot described as No. 5 in section 4.2.5 . . . 170

C.6 Example of a plot described as No. 6 in section 4.2.5 . . . 170

C.7 Example of a plot described as No. 7 in section 4.2.5 . . . 171

C.8 Example of a plot described as No. 8 in section 4.2.5 . . . 171

C.9 Example of a plot described as No. 9 in section 4.2.5 . . . 172

C.10 Example of a plot described as No. 10 in section 4.2.5. . . 172

C.11 Example of a plot described as No. 11 in section 4.2.5. . . 173

C.12 Example of a plot described as No. 12 in section 4.2.5. . . 173

C.13 Example of a plot described as No. 13 in section 4.2.5. . . 174

C.14 Example of a plot described as No. 14 in section 4.2.5. . . 174

C.15 Example of a plot described as No. 15 in section 4.2.5. . . 175

C.16 Example of a plot described as No. 16 in section 4.2.5. . . 175

C.17 Example of a plot described as No. 17 in section 4.2.5. . . 176

C.18 Example of a plot described as No. 18 in section 4.2.5. . . 176

C.19 Example of a plot described as No. 19 in section 4.2.5. . . 177

C.20 Example of a plot described as No. 20 in section 4.2.5. . . 177

C.21 Example of a plot described as No. 21 in section 4.2.5. . . 178

(10)
(11)

List of Tables

2.1 The seven crystal systems . . . 35

3.1 Summary of data collection. . . 76

3.2 Inhibition of binding of labeled substrate to TrmB∆2−109by unlabeled sugars . 76 3.3 Refinement statistics . . . 77

3.4 Kdvalues for sugar binding . . . 82

3.5 Top hits of a DALI search with the structure of TrmB∆2−109 . . . 85

3.6 Oligonucleotides used for cocrystallization experiments with TrmB. . . 89

3.7 DLS measurements of TrmB . . . 93

3.8 Summary of data collection. . . 94

4.1 Information exchange between processing steps of XDS . . . 111

(12)

List of Tables

(13)

There are many bonfires guiding us on our ways illuminating our steps

Some always stay, some slowly fade away

leaving us enough time to walk into the flare of another one But sometimes one just suddenly disappears

leaving us in the dark On our own

Suddenly everything becomes different and will never be the same

Little by little we stumble along

until we reach a new bonfire leading us out of the dark emblazing our further steps

But if we look back, there will always be a faint but never expiring sparkle - memory.

For Dad.

(14)
(15)

Zusammenfassung

Im Jahre 1970 publizierte Francis Crick sein “Zentrales Dogma der Molekularbiologie”, ein theoretisches Grundgerüst, das den Transfer von Information zwischen informationstragenden biologischen Molekülen, nämlich der DNA, RNA und Proteinen verständlich machen sollte. Die Idee eines solchen Informationsflusses entstand aufgrund der Tatsache, daß alle diese biolog- ischen Moleküle linear aufgebaute Polymere sind, so daß sie durch die spezielle Aufeinander- folge ihrer einzelnen Bausteine ein Träger von Information sein können.

Im Fluß biologischer Information spielt die Transkription eine wichtige Rolle. Während der Transkription wird nämlich Information von der DNA in mRNA übertragen. Aus dieser können anschließend Proteine synthetisiert werden, wobei die Information, die in der mRNA gespeichert ist als Vorlage dient. Aber dieser lineare Informationsfluß weist nicht immer die gleiche Bandbreite auf: wie in großen technischen Regelkreisen gibt es verschiedene Regu- lationsmechanismen, die den Informationsfluß regeln; abhängig von Information, die durch verschiedenste Quellen in das System eingebracht und “registriert” wird. Eine solche regula- torische Instanz sind transkriptionelle Regulatoren. Sie kontrollieren den Transfer von Informa- tion von der DNA in mRNA.

TrmB aus Thermococcus litoralis und Pyrococcus furiosus, zweier Archaeen, ist solch ein transkriptioneller Regulator. Genexpression in Archaeen basiert auf einem eukaryontenähn- lichen Transkriptionsapparat und eukaryontenähnlichen Promotoren, benutzt jedoch bakterien- ähnliche Transkriptionsfaktoren. TrmB reguliert die Expression zweier verschiedener ABC Transporter inP. furiosus, abhängig von der Gegenwart verschiedener Zucker (diese sind die Substrate der beiden ABC Transporter) innerhalb der Zelle. Die DNA Bindestellen für TrmB unterscheiden sich für die zwei verschiedenen Promotoren: die eine der beiden ist palindromisch, die andere nicht.

Im Verlaufe dieser Arbeit stellte die schlechte Löslichkeit von TrmB ein großes Problem dar. Deswegen konstruierte Sung-Jae Lee eine trunkierte Version des Proteines, der die DNA Bindestelle fehlte: TrmB∆2−109. Die Struktur dieser Zuckerbindedomäne von TrmB mit gebun-

(16)

Zusammenfassung

dener Maltose konnte gelöst werden und gewährte Einsicht in den Zuckerbindemodus von TrmB. Die Zuckerbindetasche von TrmB∆2−109 hat keine Ähnlichkeit mit der kanonischen Substratbindetasche eubakterieller zuckerbindender Transkriptionsregulatoren oder periplasma- tischen Bindeproteinen. Die gebundene Maltose im Falle von TrmB∆2−109 ragt an die Ober- fläche, wohingegen die Zucker in eubakteriellen Proteinen tief im Inneren der Proteine gebun- den sind. Beinahe alle Wasserstoffbrücken zwischen TrmB∆2−109 und der Maltose liegen am nichtreduzierenden Glucosylrest des Zuckers. Nur eine Wasserstoffbrücke besteht zwischen TrmB∆2−109und der reduzierenden Glucosylhälfte der Maltose.

Mit Hilfe von extensiven Löslichkeitstests konnten schließlich Pufferbedingungen gefun- den werden, in denen TrmB so weit aufkonzentriert werden konnte, daß erfolgversprechende Kristallisationsansätze möglich wurden. So konnte TrmB mit gebundener Sucrose kristallisiert werden. Die Struktur der DNA Bindedomäne besteht aus einem von zwei Helizes umflank- ten “winged helix-turn-helix motif”. Die beiden diesem Motiv jeweils nachfolgenden Helizes zweier Monomere können ein “coiled coil” Arrangement ausbilden und so ein TrmB Dimer erzeugen, dessen DNA Bindedomänenarchitektur derer einiger mutmaßlicher archaeeller Tran- skriptionsregulatoren ähnelt. Diese Struktur führte auch zu Modellen möglicher DNA Binde- modi für die beiden verschiedenen DNA Bindestellen von TrmB.

Ein anderer Teil dieser Arbeit war die Entwicklung einer graphischen Benutzeroberfläche für die Datenintegrationssoftware XDS: XDSi.

XDS ist eine textbasierte Software zur Prozessierung monochromatischer Beugungsdaten von Proteinkristallen, die mit Hilfe der Rotationsmethode aufgenommen wurden. In der Welt eines Kristallographen existieren zwei hohe Hürden: Die Erzeugung von Proteinkristallen und die Bestimmung der Struktur eines erfolgreich kristallisierten Proteines anhand der aufgenomme- nen Beugungsdaten. So wie Versuche zur Proteinkristallisation in hohem Maße von einem er- folgreichen Reinigungsprotokoll abhängen, steht und fällt die erfolgreiche Lösung der Protein- struktur mit der Güte der Datenintegration.

Anfänglich sollte XDSi eine graphische Benutzeroberfläche für XDS sein, die eine XDS Auswertung für einen gegebenen Datensatz durchführt und anschließend die wichtigsten Daten der von XDS erzeugten Dateien graphisch darstellt. XDSi basiert auf der Skriptsprache Tcl in Kombination mit dem Interpreter Tk, die beide als C Bibliotheken implementiert sind, und der wish-Shell. Da Tcl/Tk und die wish auch die Ausführung anderer Programme aus XDSi

(17)

heraus ermöglichen, entwickelte sich XDSi zu einer graphischen Benutzeroberfläche, die die automatische Prozessierung und Raumgruppenbestimmung multipler Datensätze mit minimal- stem Benutzeraufwand ermöglicht.

(18)

Zusammenfassung

(19)

Summary

In 1970 Francis Crick published his “Central dogma of molecular biology”, a framework for un- derstanding the transfer of information between sequential information-carrying biopolymers, namely DNA, RNA and proteins. This idea occurred because all these biopolymers are linear polymers, so the sequence of their monomers can encode information.

Transcription plays an important role in the flow of biological information because during transcription DNA information is copied into mRNA, of which subsequently proteins can be synthesized using the information in the mRNA as a template. But this linear flux of infor- mation is not always the same: like in big technical control loops there are several regulatory mechanisms that, depending on the information that is introduced into the system from diverse sources, control that flow of information. One such regulatory instance are transcriptional regu- lators. They control the information transfer from DNA to mRNA.

TrmB fromThermococcus litoralisandPyrococcus furiosus, two archaeal organisms, is such a transcriptional regulator. Gene expression in archaea relies on a eukaryotic-like transcription machinery and eukaryotic-like promoter elements but bacterial-like regulatory transcription fac- tors. TrmB controls the expression of two different ABC transporters inP. furiosus, depending on the presence of different sugars (they are the substrates of the two transporters) within the cell. The DNA binding sites for TrmB in the two cases differ: one is palindromic, whereas the other is nonpalindromic.

A big obstacle during this work was the low solubility of TrmB. This is why a truncated version lacking the DNA binding domain of the protein was constructed by Sung-Jae Lee:

TrmB∆2−109. The structure of this sugar binding domain of TrmB with bound maltose could be solved at 1.5Å and led to an understanding of the sugar binding mode of TrmB. The sugar bind- ing pocket of TrmB∆2−109 does not resemble the canonical substrate binding pocket of eubac- terial sugar-binding transcriptional regulators and periplasmic binding proteins and the bound maltose in TrmB∆2−109 is sticking to the surface, wheras the sugars are bound to the eubacte- rial proteins deeply within the protein. Almost all hydrogen bonds between TrmB∆2−109 and

(20)

Summary

the maltose are formed between TrmB∆2−109and the nonreducing glucosyl residue of maltose whereas only one hydrogen is formed between TrmB∆2−109 and the reducing glucosyl moiety of maltose.

Extensive solubility tests with TrmB finally led to buffer conditions enabling to concentrate the protein to levels necessary for successful protein crystallization. Using this buffer, the pro- tein could be crystallized with bound sucrose. The structure of TrmB revealed that the DNA binding domain is connected with the sugar binding domain via a short linker and consists of a winged helix-turn-helix motif preceded and succeeded by helices. The helices succeeding the winged helix-turn-helix motif of two monomers can form a coiled-coil arrangement leading to a TrmB dimer whose DNA binding domain architecture resembles those of other archaeal (pu- tative) transcriptional regulators. This structure gives an idea of how TrmB could bind to its two different DNA binding sites.

Another part of this work was the development of a Graphical User Interface for the data inte- gration software XDS: XDSi.

XDS is a text-based software for processing monochromatic diffraction data of protein crystals recorded by the rotation method. In the world of a crystallographer there are two big sticking points: producing diffracting protein crystals and determining the structure of the crystallized protein from the recorded reflection data sets. Like protein crystallization trials stand and fall with the purification protocol, determining the structure of a crystallized protein depends to a considerable part on the data integration step. The initial notion of XDSi was to have a Graphical User Interface to XDS that should automatically run XDS for a given dataset and subsequently produce plots representing the most informative data of the output files generated by the differ- ent XDS steps. In that way it would facilitate the handling of XDS for unexperienced users and save time for experienced users. The visualized output of important statistics would be easier to estimate and otherwise unrecognized errors could be avoided. XDSi is based on the scripting language Tcl in combination with the interpreter Tk (both implemented as C libraries) and the windowing shell wish. Since Tcl/Tk and the wish provide generic programming facilities as well as the ability to execute other programs, XDSi evolved into a Graphical User Interface that allows automatic processing and spacegroup assignment of one or multiple datasets so that the user can focus his finetuning efforts on the most promising of his datasets.

(21)

1 Introduction to archaeal

transcription and its regulators

1.1 Archaea

The discovery of the archaea took place in the year 1977. Data to support this discovery were presented in the Proceedings of the National Academy of Science of October 1977 [1]. The recovery of the archaea resulted from the intersection of two independent lines of research. The culmination of the "informational-macromolecule" line was the research of Carl Woese, where- as the biochemistry of methanogenesis was pursued by Ralph Wolfe and his research group [2, chapter 1].

1.1.1 Informational Macromolecules: The birth of sequence alignments

This line of investigation began with the publications of Sanger and coworkers [3; 4], who showed that each amino acid in the two chains of insulin occupies a precise position in the protein molecule. This discovery had enormous implications for genetics and for the emerging area of molecular biology. By the 1960s, insulin molecules from various animal species had been sequenced, and it was apparent that insulin from different species possessed variations in the sequence of certain amino acids.

Zuckerkandl and Pauling pointed out [5], that these differences could be used to deter- mine the relationship among the molecules, and hence, the organisms from which they orig- inated. Macromolecules which showed only minor sequence changes were closely related, whereas those with larger differences were more distantly related. But soon the limitations of protein molecules to study relatedness among organisms became apparent.

In the early 1960s, Carl Woese‘s study of the ribosome convinced him that this structure was highly conserved. He reasoned that because the ribosome is of ancient origin, is universally

(22)

1 Introduction to archaeal transcription and its regulators

distributed, and is functionally equivalent in all living cells, it would be the ideal structure to use for the study of evolution. In addition, the ribosome has only one function, translation of the genetic code into the amino acid sequence of a protein, and may be somewhat "insulated"

from the variables of phenotypic encounters. Woese‘s seminal insight was to recognize that ribosomal RNA was the ideal molecule to follow evolution to very ancient events. He modified the Sanger RNA sequencing technique [6; 7] and explored the use of ribosomal 5S, 16S and 23S rRNA. He chose 16S rRNA as a "statistical ensemble" of 1540 monomers that would allow investigations of organisms to reach the very root of the tree of life.

1.1.2 The Methanogens

This line had an early beginning with the experiments of Allessandro Volta in 1776 which showed that the gas produced by decaying vegetable residues in sediments was combustible; its identification as methane came a century later, and definitive experiments with methanogenic bacteria came with the isolation of Methanobacillus omelianskii [8]. Later, Methanobacillus omelianskii was found to be a symbiotic association of two organisms, i.e., interspecies hy- drogen transfer had been discovered [9]. The methanogenic organism from the mixed culture oxidized hydrogen and reduced carbon dioxide to methane. In 1971, the first unique coen- zyme of methanogenesis was discovered [10]. After exhaustive analysis, this enzyme was found nowhere else in nature. A second unique enzyme [11], the blue-green fluorescent deazaflavin, F420, was found to be the coenzyme for formic dehydrogenase [12]. In the year 1976 samples of several different methanogens were given to Carl Woese by Ralph Wolfe to apply his 16S RNA sequencing method to them to get a hint on the evolutionary origin of these methanogens.

1.1.3 Discovery of the Archaea

After analysis of the 16S RNA of the first methanogen, Woese was puzzled. After having care- fully repeated the experiment and been doing it also for several other methanogens, he came to the conclusion that none of them were related to any bacteria he had ever investigated. By that time, Woese had 16S RNA sequence data from 60 different bacteria.

In November 1977, Woese and Fox proposed that ribosomal RNA sequence characteriza- tion could be used to define three "aboriginal lines of descent" [13]. One of those lines, the methanogen line, was named "archaebacteria". Over a decade later, the name "archaea" was proposed for the archaebacteria [14] (see figure1.1). By that time, much evidence had accumu-

(23)

1.1 Archaea

Figure 1.1:Universal phylogenetic tree in rooted form, showing the three domains (from Woese et al. [14]). The numbers on the branch tips correspond to the following groups of organisms. Bacteria: 1, the ther- motogales; 2, the flavobacteria and relatives; 3, the cyanobacteria; 4, the purple bacteria; 5, the Gram-positive bacteria; and 6, the green nonsulfur bacteria. Archaea: the kingdom Crenarchaeota:

7, the genus Pyrodictium; and 8, the genus Thermoproteus; and the kingdom Euryarchaeota: 9, the Thermococcales; 10, the Methanococcales; 11, the Methanobacteriales; 12, the Methanomicrobiales;

and 13, the extreme halophiles. Eukarya: 14, the animals; 15, the ciliates; 16, the green plants; 17, the fungi; 18, the flagellates; and 19, the microsporidia.

lated showing that the archaea clearly belonged to the eukaryotic line of descent, and it would be less confusing if the word "bacteria" in archaebacteria were deleted.

1.1.4 Signatures of Archaea

Archaea lack peptidoglycan in their cell walls [15]. Various cell wall chemistries are known, from the peptidoglycan analog pseudopeptidoglycan to walls made of polysaccharide, protein, or glycoprotein.

Bacteria and Eukarya synthesize membrane lipids with a backbone consisting of fatty acids bonded in ester linkage to a molecule of glycerol. In contrast, archaeal lipids consist of ether- linked molecules [16]. In 1978, Zillig and coworkers began to report results of experiments on the remarkably similar component patterns of the DNA-dependent RNA polymerases of archaea and eukaryotes [17;18].

The exotoxin produced by the bacteriumCorynebacterium diphtheriaeis a potent inhibitor of eukaryotic protein synthesis because it ADP-ribosylates an elongation factor required to

(24)

1 Introduction to archaeal transcription and its regulators

translocate the ribosome along the mRNA. The so modified elongation factor is inactive [19].

Diphteria toxin does not inhibit protein synthesis in species of bacteria. Kessel and Flink [20]

found, thatin vitroprotein synthesis by halobacterial preparations was inhibited by diphtheria toxin. These studies were extended to 18 archaea and summarized by Gehrmann et al. [21]:

The archaea have an elongation apparatus in the ribosome that is distinct from those of typical bacteria and eukaryotes. All archaea contain a structural domain in elongation factor-2 (EF-2) that renders it a substrate for diphtheria toxin. Archaeal EF-2 is highly specific for archaeal ribosomes and does not work with ribosomes from bacteria or eukaryotes [2, chapter 1]. In addition to ribosomal RNA, transfer RNAs of archaea were found to be unique. They do not contain the universal common arm sequence [22].

In contrast to their transcription system being eukaryal-like, many biosynthetic or metabolic processes in archaea are more similar to the same bacterial systems than to the eukaryotic coun- terparts [23].

1.1.5 The order Thermococcales

The proteins that were subject of this dissertation originated from Thermococcus litoralisand Pyrococcus furiosus. Because they both belong to the order Thermococcales, this order will be described in some detail in the following. Among the hyperthermophilic archaea, representa- tives of the order Thermococcales form the most numerous group. Members of this group are the most frequently isolated hyperthermophiles. They are heterotrophic and as such regarded as the major constituents of organic matter within marine hot water ecosystems [24]. They be- long to the branch of Euryarchaeota (see figure1.1) that contains the methanogens, the genus Thermoplasma, and the extremely halophilic archaea. The Thermococcales order is actually represented by three genera: Pyrococcus[25],Thermococcus[26] and the later describedPale- ococcus[27]. In the year 2006 these three genera included 38 species: 2 belonging to the genus Paleococcus, 6 belonging to the genusPyrococcusand 30 to the genusThermococcus.

The optimal growth temperature is 95-100C for members of the genusPyrococcusand 80- 90C for those of the genus Thermococcus. Pyrococcus strains have been isolated only from marine hydrothermal vents, whereas species belonging to the genus Thermococcushave been isolated from terrestrial fresh water [28], marine solfataric ecosystems, deep-sea hydrothermal vents [29] and offshore oil wells [30].

Representatives of the order Thermococcales have coccoid cells with or without flagella; they

(25)

1.1 Archaea

are obligate anaerobic organotrophic thermophiles with a fermentative metabolism using pep- tides, polysaccharides, or other sugars as carbon sources. Elemental sulfur is either stimulatory or necessary for the growth of these microorganisms [2, chapter 2].

Thermococcales were initially discovered in terrestrial and submarine hot vents and they were then found also in deep subsurface environments. For example,Thermococcus celer[31], T. litoralis[32] and Pyrococcussp. were discovered in an offshore oil production platform in the North Sea [29], andT. litoraliswas also isolated from a continental oil well [33]. This fact probably indicates the indigenous origin of hyperthermophilic archaea in the deep subsurface biosphere. The sites where these microorganisms are found may appear unusual, but these mi- croorganisms might have been deposited with the original sediment and survived over geologic time by metabolizing buried organic matter [34;35].

Many Thermococcales show heterotrophic growth on a variety of carbohydrates [36–39].

This suggests that oligosaccharides with varying degrees of polymerization are transported into the cell and are subsequently hydrolyzed to glucose. Various studies have focused both on the transport of the saccharides into the cell and on the pathways that are used to degrade the glucose [40]. ForT. litoralis, a transport system for both maltose and trehalose has been described [41].

1.1.6 The archaeal transcription apparatus

Today, archaea are known in an astounding diversity of habitats, including the furthest extremes of temperature, pressure, salinity and acidity showing their great ability to adapt to very different living spaces. Their transcription and gene regulatory systems, being hybrids of bacterial and eukaryotic components [17;18; 42;43] (see also section1.1.4), play a very significant role in our efforts to understand the universal aspects of transcriptional mechanisms.

Analyses of the archaeal transcription initiation machinery have revealed striking parallels with the eukaryal RNA polymerase II (RNAP II) transcription apparatus (see section1.1.6.2).

For beeing able to compare the similarities between the archaeal and eukaryal transcription mechanisms a brief summary of the principal steps in transcription initiation by the RNAP II machinery will be given in the following section.

1.1.6.1 The eukaryal RNAP II machinery

The initiation process in eukaryotes starts with the recognition of a sequence element (the TATA box) located∼30bp upstream of the initiation site, by the highly conserved TATA-box-binding

(26)

1 Introduction to archaeal transcription and its regulators

protein (TBP), a component of a large multisubunit complex called TFIID. Other subunits of TFIID may additionally contact other promoter elements. A second factor, called TFIIA, can stabilize the binding of the TBP/TFIID complex. After binding, TBP recruits a further factor, TFIIB, serving as a bridge between TBP and the incoming RNA polymerase II (RNAP II). But the main contact between TFIIB and the polymerase appears to be mediated by a factor bound to the polymerase, TFIIF. The complex of TFIID, TFIIB, TFIIF and RNAP II binds tightly to the promoter. For efficient initiation two further factors are required: TFIIE and TFIIH. They facilitate the localized melting of DNA at the start site thus forming the open complex. This pro- cess is ATP driven [44]. They also assist the RNAP II in leaving the promoter thereby abet it in turning from an initiation polymerase to an elongation-competent polymerase. For an overview see figure1.2.

1.1.6.2 The archaeal RNAP machinery

The analysis of the purified archaeal RNA polymerase (RNAP) fromSulfolobus acidocaldarius led to the assumption that the archaeal transcription initiation machinery may be more closely related to that of the eukarya than to that of the bacteria because it could be shown thatS. acido- caldariusRNAP has at least ten subunits [18;45], in contrast to the four-subunit bacterial core enzyme.

Subsequent gene identification for the majority of the subunits of the RNAP of S. acido- caldariusand determination of their sequences showed that the individual subunits are highly conserved in most cases and that the subunit composition is similar to that of eukaryal poly- merases. For instance, the small subunits E, H, K, L and N of the S. acidocaldarius RNAP possess clear homologues in the eukaryal, but not the bacterial, enzymes [46].

Archaeal promoter elements consist of three parts. The typical archaeal promoter contains a TATA-like element (TATA box), ∼25-30bp upstream of the site of transcription initiation [47–49]. The second part of the archaeal promoter element is the BRE (transcription factor B recognition element). It is located immediately upstream of the TATA box and is important for promoter strength as well as the orientation of the transcription initiation complex [50–52].

The presence of these elements combined with the eukaryal-like composition of the poly- merase led to search of a factor homologous to eukaryal TBP. Subsequently, TBP homologues have been identified from a wide range of archaea. Both archaeal and eukaryotic TBP molecules consist of two repeats of about 90 amino acids and adopt a symmetrical saddle-shaped form

(27)

1.1 Archaea

Figure 1.2:A: Eukaryal RNAP II machinery. The transcription initiation site is depicted by an arrow. The TFIID complex (blue) binds to the TATA box via its TATA box binding protein component TBP (light blue). TFIIA (pink) stabilizes the protein-DNA interaction. Subsequently TFIIB (green) binds to the DNA-bound TBP and recruits RNAP via interaction with TFIIF (purple). Binding of TFIIE (black) and TFIIH (mauve) to the RNAP leads to promoter melting and clearance. B: Model of the archaeal RNAP machinery. TBP (blue) binds to the TATA box, TFB (green) binds to BRE (not shown) and the RNAP is shown in orange (adapted from Bell and Jackson [45]).

(28)

1 Introduction to archaeal transcription and its regulators

Figure 1.3:Picture showing the complex between a DNA fragment displaying BRE (red) and TATA box (ma- genta) and bound TBP (lightblue) and the C-terminal part of TFP (yellow with the helix-turn- helix motif (see section1.3) colored orange. PDB code 1D3U.)

[53; 54]. The similarity of the first to the second repeat is much higher in archaeal TBPs (36- 53% of identical amino acids) than in eukaryotic TBPs (22-26%) [55]. Archaeal TBPs are acidic proteins with isoelectric points (IPs) within the range of 3.9-6.1, while the eukaryotic counter- parts are basic, with IPs ranging between 9.8 and 10.7 [56].

In addition to TBP, archaea possess a second general transcription factor, called TFB (Tran- scription factor B). The identification of a partial open reading frame of 152 amino acids with homology to the eukaryotic TFIIB in Pyrococcus woesei was the first indicaton that archaeal transcription factors are of the eukaryotic type [57]. Sequence analysis of the complete gene showed about 30% identity to eukaryotic tfIIbgenes [58]. In addition, this sequence exhibits distinct structural motifs characteristic of eukaryotic TFIIB such as an imperfect amino acid repeat or a zinc ribbon at the N-terminus. InSulfolobus as well as in Methanococcusit could be demonstrated that the archaeal TFIIB homologue (now called TFB) is essential to direct initiation of archaeal transcription [52; 59]. TFB binds to BRE and the N-terminal region is required for RNA polymerase recruitment [60;61]. It could be shown that it is possible to fully reconstitute transcription in the archaeal system using just TBP, TFB and highly purified RNAP [52;62].

Archaeal genome sequencing projects have revealed that most likely all Archaea have a tran-

(29)

1.1 Archaea

scription factor resembling TFIIE of the eukaryal system, called TFE. The eukaryotic protein is a heterotetramer composed of two 57kDa and two 34kDa subunits [63;64]. Purified TFIIE has been found to possess no enzymatic activity, it stabilizes the preinitiation complex by binding to the complex as well as to the DNA, and it is involved in the transition from initiation to elon- gation [65;66]. Mutational analysis revealed that the N-terminal half of TFIIEαis sufficient for both basal and activated transcription [67]. Similar results were found in yeast usingin vivoge- netic experiments [68]. Interestingly, this part of theαsubunit is still conserved in archaea. The crystal structure of the N-terminal domain of TFE fromSulfolobus solfataricusshowed that this domain adopts an extended winged fold with unusual features that are consistent with a role of this domain as an adapter between RNA polymerase and general transcription factors [69]. In vitrotranscription experiments indicate that TFE is not absolutely required for transcription in a reconstituted archaealin vitrosystem; it nonetheless plays a stimulatory role on some promoters and under certain conditions [69].

So the core of the archaeal transcription apparatus consists of an eukaryotic RNAP II-like transcriptase, and the two initiation factors TBP and TFB (see figure1.2).

Archaea initiate transcription by the binding of TBP to the TATA box [59; 70]. Structural analysis of TBP-TATA box co-crystals revealed the mechanism of binding in more detail [71;

72]. Like its eukaryotic counterparts, PyrococcusTBP binds to the minor groove of the DNA and imposes a similar severe distortion on the DNA [72]. Several studies with eukaryotic TBPs suggest that TBP can bind in both orientations with only minimal preference toward the cor- rect orientation [73; 74]. Due to the greater symmetry of archaeal TBPs it is most likely that archaeal TBPs cannot select the right orientation of binding to the TATA box. The polarity of the initiation complex is fixed in the next step by the binding of TFB [50]. It recognizes the distorted DNA-TBP complex and interacts with BRE and the TFIIB-related C-terminal domain of TFB [75;76]. Bound TFB enables the recruitment of the RNA polymerase and the formation of an initiation complex.

Notably, archaeal RNAP subunits A lack the C-terminal repeat extension of eukaryal RNAPII subunit 1 that serves as the assembly platform for complexes that mediate transcriptional activa- tion, chromatin modification, transcriptional elongation and termination as well as co-transcrip- tional RNA processing. The opening of archaeal promoters by their cognate core transcription machinery is, in contrast to the eukaryal one, not ATP hydrolysis-driven [77].

The fact that extrinsic initiation factors, TBP and TFB, are used in archaeal transcription

(30)

1 Introduction to archaeal transcription and its regulators

generates the potential for stably marking a promoter for activity through TBP attachment to the TATA box. In contrast, eukaryotic transcription can utilize cis-acting mechanisms for gen- erating persistent active states of promoters [78; 79]. This led to the statement that there is a

"fundamentally different logic of eukaryotic and bacterial transcription" [80], because in order to establish stable states of actual or potential transcriptional activity eukaryotes have to ad- ditionally use chromatin modification-driven mechanisms which are not part of the bacterial machinery (and almost certainly not part of the archaeal machinery) [77].

Since in bacteriaσleaves the promoter after each round of transcription, constitutive promot- ers are unmarked. So principally successive rounds of transcription at these promoters could be uncorrelated. Nevertheless there are feedback mechanisms correlating successive rounds of transcription even in bacteria shown by the fact that active transcription entrains continueing transcription [81].

As to archaea, the behaviour of TBP determines the way archaeal activators and repressors of transcription can exert their regulating effects. If TBP remains stably bound to the promoter repressors will have to block TFB or RNAP entry to the promoter and the repressors themselves must not be excluded by prebound TBP, in order to generate a fast response to signals of envi- ronmental changes. Conversely, if repressors are to be barred by pre-bound TBP, then operators that overlap the transcriptional start site or BRE will lead to faster response to external signals than operators overlapping the TATA box.

1.1.7 Archaeal transcription regulators

In spite of their transcription machinery resembling the eukaryal RNAP II apparatus, most specific transcriptional regulators identified in archaea so far seem to be more bacterial-like, as a plethora of homologues of bacterial regulatory factors were found in the archaeal genome [75;

77;82]. The most widely represented archaeal DNA-binding proteins with known or surmised gene-regulatory potential are related to members of the bacterial Lrp/AsnC family [83] which influence cellular metabolism in both a global (Lrp) and specific (AsnC) manner [84].

1.1.7.1 Archaeal proteins of the Lrp family

The best characterized representative of that family isEscherichia coliLrp (leucine-responsive regulatory protein), an abundant protein that acts as a global regulator of amino acid biosynthe- sis, transport, protein degradation and intermediary metabolism, thereby responding primarily

(31)

1.1 Archaea

to leucine [85].

Lrp family proteins consist of a N-terminal DNA-binding helix-turn-helix domain connected by a linker to a C-terminal effector domain. The ligands of the effector domain can modulate protein association and DNA binding and thus have regulatory function [83].

The structures of two proteins belonging to the Lrp family from Pyrococcusspecies (36%

amino acid identity, 57% similarity) have been determined. Each structure is based on symmet- ric protein dimers; in one structure four dimers are arranged as a disk, making contact through their effector domains in the center, with their helix-turn-helix DNA-binding domains facing out [86]. In the other structure, protein dimers form a sixfold helix, also with the effector domains facing into the core and the DNA-binding domains facing out [87].

In solution, Lrp family proteins are present as dimers or oligomers of dimers, exhibiting a concentration-dependent equilibrium [83]. Their preferred DNA binding sites are inverted re- peats, demonstrated by SELEX forE. coliand, more definitively, for two Lrp family proteins of Methanocaldococcus jannaschii, Ptr1 and Ptr2 (Ptr standing for putative transcriptional regula- tor) [88;89]. For E. coli, Lrp cooperative DNA binding has been analysed quantitatively [90]

and for archaeal Lrp family proteins there are qualitative indications of cooperativity concerning DNA binding [91;92].

(32)

1 Introduction to archaeal transcription and its regulators

1.2 DNA

DNA replication, DNA transcription, and the regulation of gene expression all depend upon the recognition of DNA and, especially, differential recognition of distinct segments of DNA by proteins. To shed light on the ways of interaction between DNA and DNA-binding pro- teins one needs to understand the structure of the double-stranded, base-paired helical DNA molecule on its own in order to have an idea about the possible mechanisms of recognition of specific sequences by proteins. There are three known types of DNA structures [93]: A-DNA is obtained under dehydrated nonphysiological conditions, Z-DNA, a left-handed helix with the sugar-phosphate backbone following a zigzag path can be found for sequences with alternating G and C bases. A-DNA can not be foundin vivoand the question whether Z-DNA occurs in na- ture is a matter of controversy but the structure formed by DNA under physiological conditions is that of the so called B-DNA [94].

1.2.1 The B-DNA

B-DNA has the familiar shape of a right-handed helical staircase (see figure 1.4). The rails are two antiparallel phosphate-sugar chains, and the rungs are purine-pyrimidine base pairs, hydrogen bonded to each other. In B-DNA there are an average of 10 base pairs per turn of the helix, which corresponds to an average helical twist angle of 35.9. The spacing along the helix axis from one base pair to the next is approximately 3.4 Å [94].

1.2.1.1 The B-DNA helix has major and minor grooves

The sugar-phosphate backbones are bulky, forming striations on the edges of the helix, with grooves in between in which the bases are exposed. These grooves are of two different widths because of the asymmetrical attachment of the base pairs to the sugar rings of the backbone. In a regular helix the distance between the attachment points for the rungs would be the same at the front and the back of each step. In contrast to the regular helix, in the DNA molecule each base-pair “rung” is effectively wider at one edge than at the other resulting in one narrower groove, known as theminor groove, and one wider groove, the so calledmajor grooveof the DNA helix (see figures1.5and1.6). Mainly the major groove but in some cases also the minor groove are the main part of (specific) interactions between DNA and DNA-binding proteins (see section1.3.1.1)

(33)

1.2 DNA

Figure 1.4:Structure of B-DNA. Left: Carrtoon representation. Right: Representation with sticks. Phosphate atoms are colored orange, Carbon is colored grey, Oxygen red, Nitrogen blue and Hydrogen white.

In B-DNA the helical axis runs through the center of each base pair and the base pairs are stacked nearly perpendicular to the helical axis. This leads to the major and minor grooves being of similar depths. The edges of the base pairs form the floors of the two grooves. The edge of a base pair furthest from its attachment points to the sugar-phosphate backbones is the major groove edge, the one closest is the minor groove edge. These edges are accessible from outside and form the basis for the sequence-specific recognition of DNA by proteins [94].

1.2.1.2 Specifically recognizable base sequences in B-DNA

The bases are available for interactions with DNA-binding proteins only at the floor of the grooves. These regions are paved with nitrogen and oxygen atoms that can form hydrogen bonds with the side chains of a protein. The methyl group of thymine and the corresponding hydrogen in cytosine provide additional discriminatory recognition groups. These sites form patterns that are distinguishable for the four possible Watson/Crick base pairs (see figure1.7).

These patterns of potential hydrogen-bond acceptors and donors are clearly quite different for the different base pairs in the major groove, so that they could easily be recognized and discriminated by a protein molecule. As to the minor groove this is not the case: In the minor

(34)

1 Introduction to archaeal transcription and its regulators

Figure 1.5:Left: In a helical staircase there are two similar grooves. Four rungs are viewed from the top of the staircase. Right: Asymmetric connections of the base pairs to the sugar rings of the backbone lead to formation of a minor and major groove. The sugar-phosphate backbone is represented by connected balls and the base pairs as blue planks. Four base pairs are shown from the top of the helix. The position of the helix axis is marked by a cross (adapted from [94]).

Figure 1.6:B-DNA. Atoms represented as spheres. The sugar-phosphate backbone is colored red and the bases are colored lightblue. The minor and major groove are clearly recognizable. The edges of the bases are accessible in the major groove.

(35)

1.3 The helix-turn-helix motif

Figure 1.7:The edges of the base pairs contain nitrogen and oxygen atoms that can participate in hydrogen bonds to protein side chains. An H atom in cytosine (C) and a methyl group in thymine (T) form additional sequence-specific recognition sites in DNA. W1, W2, W20, and W10are the recognition sites at the edges of the base pairs in the major groove (W standing for wide) and S1, S2, and S10are those in the minor groove (S for small). The recognition sites are shown for all four base pairs (from [94]).

groove there is only one specifiable feature between two base pairs in form of a hydrogen-bond donor compared with a neutral hydrogen atom (see figures1.7and1.8).

For that and in addition for its being wider, the major groove is a much better candidate for sequence-specific recognition whereas the minor groove can play an additional role in unspe- cific interactions between DNA and protein (see section1.3.1.1).

In the major groove only a small number of base pairs is needed to provide unique and distinguishable recognition sites as can be seen in figure1.9illustrating the color codes for the hexanucleotide recognition sites of three different restriction enzymes (Eco RI, Bal I and Sma I).

1.3 The helix-turn-helix motif

In the year 1982 several pioneering investigations shed light on the features unifying diverse transcriptionalg regulators [95–100]. Their results showed that the phageλtranscription regu- lators, cro and the cI repressor, and lacI, the lactose operon repressor, shared a similar DNA- binding domain consisting of three helices. The second and the third helices of this tri-helical

(36)

1 Introduction to archaeal transcription and its regulators

Figure 1.8:Color codes for the recognition patterns at the edges of the base pairs in the major and minor grooves of B-DNA. Hydrogen-bond acceptors are red; hydrogen-bond donors are blue. The methyl group of thymine is yellow, while the corresponding H atom of cytosine is white. As can be seen, the pattern of H-bond donors and acceptors looks different for each base pair in the major groove but not in the minor groove (adapted from [94]).

Figure 1.9:Sequence-specific recognition sites in the major groove of DNA for three restriction enzymes: Eco RI, Bal I, and Sma I. The DNA sequences that are recognized by these enzymes are represented by the color code defined in figure1.8(from [94]).

(37)

1.3 The helix-turn-helix motif

domain formed a helix-turn-helix motif and it could be shown that this motif was the part of those proteins being mainly responsible for their interaction with DNA. Thus, this DNA-binding domain came to be referred to as the helix-turn-helix (HTH) domain.

Sequence analysis after this discovery revealed that DNA recognition by sigma factors was also accomplished by helix-turn-helix domains [101–103]. These and further investigations led to the conjecture that the HTH domain plays a significant role in DNA-protein interactions across a wide phylogenetic spectrum [104–107].

The fact that sequence and structural analysis studies also uncovered HTH modules in several specific eukaryotic transcription factors, chromatin proteins like histone H1, and basal transcrip- tion factors TFIIB and TFIIE [42;108–110], led to the idea that the helix-turn-helix domain is probably one of the most ancient conserved features of the transcription apparatus, which was already present in the last universal common ancestor of all extant life forms (LUCA).

Via comparative analysis several major monophyletic assemblages of HTH transcription fac- tors could be identified, each being distinguished by its own sequence and structure features [111–115] but all featuring the HTH motif. These classes often exhibited the fusion of the HTH domain with additional globular domains in the same polypeptide. Those globular domains, linked to the HTH, feature a perplexing diversity pointing to the immense variety of functional contexts in which the HTH domain is applied.

The understanding of procaryotic gene regulation has been aided by the NMR and X-ray crystal structures of a variety of prokaryotic transcription factors. This structural analysis has revealed that most prokaryotic transcription factors are homodimers that bind to palindromic or pseudopalindromic cognate DNA sites [116]. Among these transcription factors, three recur- rent DNA-binding motifs have been described: the HTH, the winged HTH and theβ ribbon, consisting of twoβstrands that lie within the major groove of the DNA [117;118].

1.3.1 Structural scaffold of the HTH domain and its elaborations

1.3.1.1 The basic HTH domain consists of a three-helical bundle

The basic HTH domain is a fold consisting of three core helices forming a right-handed helical bundle that has a partly open configuration. When the domain is displayed by placing the third helix in front and in horizontal orientation, the three helices form a triangular outline (see figure 1.10).

There is a characteristic sharp four residue turn, in which a glycine is usually found in the sec-

(38)

1 Introduction to archaeal transcription and its regulators

Figure 1.10:Left: The basic HTH motif consists of two helices, packed at angles of∼120, joined by a tight four-residue turn in which a glycine is usually found in the second position. The HTH motif alone is apparently insufficient for independent folding and a thirdαhelix stabilizes the motif as a com- pact, globular domain [119]. The picture shows the HTH motif of phage 434 repressor (PDB code:1PER). Helix 2 and 3 and their connecting turn form the helix-turn-helix motif. The recog- nition helix is colored orange, the glycine of the four-residue turn is shown as ball and stick model.

Right: View rotated by 90counterclockwise around the z-axis.

ond position [116] situated between the second and the third helix that does typically not tolerate insertions or distortions. This turn is a defining feature of the helix-turn-helix domain. Con- trastly the loop between helix-1 and helix-2 is much more variable and is object of preference concerning modifications in the different classes of HTH domains (see section1.3.1.4).

There is a shallow cleft between helix-3 and helix-1 on the side opposite to helix-2 leading to the open configuration of the bundle. Most of the extensions to the basic HTH domain are structural elements that appear to have evolved to generate a more closed configuration by interacting with that cleft. So it seems as if this shallow cleft has acted as kind of a structural niche favoring the evolution of additional structural elements to pack into it using hydrophobic interactions (see figure1.13).

The third helix inserts itself into the major groove of the DNA [105; 109; 120], thereby constituting the main DNA-protein interface. That is why the third helix is called therecogni- tion helix. But generally there are several additional secondary contacts with DNA that may be widely distributed across the fold or even be extensions outside of the core HTH domain, such as a basic patch at the N-terminus of helix 1 [120].

At large one can observe a great sequence diversity in the helix-turn-helix fold. But neverthe- less there are a few sequence elements which are widely conserved in members of the fold. The most characteristic of these elements is a pattern called “shs” (where “s” is a small residue, most often glycine in the first position, and “h” is a hydrophobic residue), lying in the turn between helix-2 and helix-3 of the core helix-turn-helix structure. Another well-conserved pattern is

(39)

1.3 The helix-turn-helix motif

called “phs” (where “p” is a charged residue, most frequently glutamate) located in helix2. The conserved hydrophobic residues in these motifs, complemented by at least two other conserved hydrophobic residues seen in helix 1 and helix 3 are localized at the interior thus forming the characteristic hydrophobic core stabilizing the domain.

The knowledge of the conserved structure-function associations of the helix-turn-helix fold from the three superkingdoms of life taken together with the conservation of these abovemen- tioned elements supports the monophyletic origin of HTH domains from a common ancestor that bore those sequence features [96;98].

1.3.1.2 The tetra-helical version of the HTH domain

The tetra-helical version of the helix-turn-helix domain is characterized by an additional C- terminal helix packing against the shallow cleft whose formation results in the open configu- ration of the tri-helical core. This embroidery of the basic tri-helical version is displayed by several major families of prokaryotic transcription factors [121].

1.3.1.3 The ribbon-helix-helix (RHH) fold of transcription factors

The RHH family of transcription factors is, to date, known only from prokaryotes [121]. They are obligate dimers, pairing through a single N-terminal strand. The sheet formed by the N- terminal strands of the domain is inserted into the major groove of DNA [122]. Mutagenesis experiments have shown that even single mutations in the N-terminal strand are capable of converting the strand of the RHH domain to a helix, and result in a structural packing that is closer to the canonical HTH domain [123]. This result, together with the notable structural and sequence similarities with the HTH domains, suggests that the RHH domain was derived from the HTH domain through conversion of the N-terminal helix to a strand.

1.3.1.4 The winged HTH domain

One form of extension to the basic HTH domain is the presence of a C-terminalβstrand hairpin unit, called the wing, packing against the shallow cleft of the partially open tri-helical core [109; 124]. The simplest versions of this winged HTH (wHTH) domain comprise a helical core similar to the basic three-helical version followed by the two-strand hairpin (see figure 1.12). The loop connecting the twoβstrands that form the hairpin has been shown to interact with DNA [108;124;125].

(40)

1 Introduction to archaeal transcription and its regulators

Figure 1.11:The ribbon-helix-helix motif: Structure showing the N-terminal part of a homodimer of the bacterial antitoxin CcdA with the N-terminal strands of the RHH motif inserted into the major groove of DNA (PDB-code:2H3A).

The canonical wHTH motif contains two wings, which are extended loop structures, threeα helices and threeβ strands. The topological order is H1-B1-H2-T-H3-B2-W1-B3-W2 (where H represents a helix, B a βstrand, T a turn, and W a wing) [126]. H2, H3 and the sequence between them make up the HTH motif, where H3 is the recognition helix. H3 is physically flanked by the wings andin totothe wHTH motif resembles a butterfly.

But there are several further serial elaborations concerning the βsheet. In some cases, the loop between helix 1 and helix 2 of the helix-turn-helix assumes an extended configuration and is incorporated into the sheet via hydrogen bonding with the basic C-terminal hairpin, thus forming a three-stranded version of the wHTH.

Winged HTH proteins have been structurally classified by the orientation of the three helices with respect to each other, or more specifically by the angle formed between helices H2 and H3 (the HTH angle) [127;128]. With respect to this angle, it is possible to subdivide winged HTH proteins into three groups: histone H5 and HNF-3γhave HTH angles of approximately 50, CAP, LexA (a bacterial repressor involved in the SOS response) and BirA have angles between 84and 92, HSF and MuA have angles of about 105to 110[129].

In the four-stranded version the linker between helix-1 and helix-2 also extends to a hairpin consisting of two β strands and forms, along with the C-terminal wing, an extended β sheet (see figure1.12). In binding of nucleic acids the wing often provides an additional interface for

(41)

1.3 The helix-turn-helix motif

Figure 1.12:Picture showing the most common different designs of the winged helix-turn-helix (wHTH) mo- tif. Left: wHTH domain of the DNA binding part of LexA repressor ofE. coli. The C-terminal extension following the recognition helix (colored orange) forms aβstrand hairpin comprised of twoβstrands (PDB code: 1LEA). Middle: wHTH of biotin repressor fromE. coli comprised of one wing with twoβstrands and an additional strand formed by the region between helix 1 and he- lix 2 (PDB code:1HXD). Right: wHTH of CRP fromE. coli forming an additional hairpin between helices one and two (PDB code:1I5Z).

substrate contact. It typically interacts with the minor groove of DNA through charged residues in the hairpin [109;124].

The phyletic wHTH superclass includes the majority of prokaryotic transcription factors. Thir- teen major families of prokaryotic wHTH domains, namely the BirA, ArsR, GntR, DtxR-FurR, CitB, LysR, ModE, MarR, PadR, YtcD, Rrf2, ScpB and HrcA-RuvB families, are unified by the presence of a characteristic helix after the wing. They comprise the largest monophyletic assem- blage within the wHTH superclass. Some of their representatives (ArsR, MarR, YctD, GntR, HrcA-RuvB) can be found in both archaea and bacteria. The MarR family is far spread amongst the archaea and includes most of the major archaea-specific wHTH transcription factors [121, figure 5].

The next major monophyletic cluster of wHTH superclass includes the DeoR, ArgR, LevR, YitL, Lrp-AsnC, ZBD (Z-DNA binding domain) and the RNase R families. They share an overall sequence similarity and a conserved pattern including a conserved glutamine or arginine residue between helix-1 and helix-2 of the HTH domain. Of those the Lrp-AsnC familiy is widely conserved in both bacteria and archaea [121]. But the wHTH domain can also be found in eukaryal DNA binding domains.

(42)

1 Introduction to archaeal transcription and its regulators

Figure 1.13:A pathway showing the structural elaboration of the simple HTH domain into its diverse ver- sions. Strands are shown as yellow arrows with the arrow heads at the C-terminus, helices are shown as blue cylinders. The orange arrows show the probable routes of transformation of the HTH fold. The topologies have been constructed using the following PDB entries: simple tri-helical bun- dle: 2HDD; FF domain: 1H40; Tetra-helical bundle: 1D5Y; Multihelical bundle: 1AIS; MetJ/Arc:

2CPG; MerR: 1JBG; L11: 1MMS; CRP-like: 1cpg;T. vaginalisinitiator: 1PP7; 3-stranded wHTH:

2DTR; Methionine aminopeptidase: 1XGS; winged HTH: 1I1G; wHTH with a C-terminal helix:

1JGS. From [121].

(43)

1.4 Interaction between HTH proteins and DNA

1.4 Interaction between HTH proteins and DNA

As mentioned before, the helix-turn-helix motif is the part of HTH proteins that interacts di- rectly with DNA. But proteins like transcriptional activators or repressors do not only bind to DNA but bind selectively to specific nucleotide sequences at a few adjacent positions. For a pro- tein to distinguish between different nucleotides is not straightforward because the nucleotides of the two antiparallel strands are base-paired in the interior of the double helix and the exterior surface of the double helix is almost independent of its nucleotide sequence, being composed primarily of the constant sugar-phosphate backbone. Only the edges of the nucleotides are ac- cessible to the solvent and to the protein, primarily in the major groove of the DNA double helix (see section1.2.1.2).

If a protein has to discriminate among DNA base pairs by interacting with their edges in the major groove, it needs to have interacting groups that protrude substantially from its surface, to be able to contact the nucleotides at the base of the groove. The helix-turn-helix motif is the protruding structural motif that is used by most prokaryotes to fulfill this requirement.

By interacting with DNA, the recognition helix inserts itself into the major groove of the DNA whereas the otherαhelix of the helix-turn-helix motif lies across the major groove and makes nonspecific contacts with DNA [130]. The amino acid side chains of the recognition he- lix make hydrogen bonds and van der Waals contacts with the exposed edges of the nucleotides in the DNA major groove. Hydrogen bonding to the nucleotides is especially important, and water molecules are frequently involved in networks of hydrogen bonds [131]. Because of the importance of hydrogen bonding for discriminating among nucleotides, the residues that in- teract with the DNA are primarily polar, especially those with multiple hydrogen bonding side chains, such as Asn, Gln, Arg, Asp, and Glu [131]. These direct interactions involve flexible side chains of the protein, although their flexibility is usually limited somewhat by interactions with neighbouring residues. Consequently, the different helix-turn-helix motifs interact with DNA in a variety of geometries [131]. With this type of recognition there is no simple code relating the amino acid sequence of the protein to the nucleotide sequence of DNA that it recognizes.

1.4.1 Protein dimers bind to (pseudo)palindromic DNA sites

As described in section1.3, most prokaryotic transcription factors are homodimers that bind to palindromic or pseudopalindromic DNA sites. The protein homodimers are formed in a manner,

Referenzen

ÄHNLICHE DOKUMENTE

In respect to the collinear expression of homeotic genes in Drosophlila, Welcome Bender and colleagues (Peifer et al., 1988), suggested the “open‐for‐business”

Simulating changes in mRNA and protein levels for two biological realistic scenarios, namely circadian oscillation of miRNA expression and a sudden change in miRNA synthesis, we

We applied very stringent rules to construct an atlas of high-confidence poly(A) sites, and the entire set of putative cleavage sites that resulted from mapping all of the

Thus, we generated transgenic mice in which FOG-1 expression was enforced at a physiologically relevant level in the B lymphoid system : in mature B cells and from early

The section 6.2 gives an overview of some experiments that were performed during the PhD thesis to have a better understanding of the mechanism responsible for the

The fact that some factors involved in cleavage and polyadenylation of pre-mRNAs are required for 3’ end formation of snoRNA further substantiates that there are parallels between

The phenotype observed in OBF-1 deficient mice clearly coincides with the expression pattern of the OBF-1 gene in B cells, which peaks at two distinct time points in B

Geminin transcript was detected in every mutant, with a lower expression in +#+ V mutants (Figure 12C). This observation is partially consistent with a previous work where