• Keine Ergebnisse gefunden

Towards the creation of synthetic Escherichia coli via Tryptophan and Methionine substitutions

N/A
N/A
Protected

Academic year: 2022

Aktie "Towards the creation of synthetic Escherichia coli via Tryptophan and Methionine substitutions"

Copied!
177
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Towards the creation of synthetic Escherichia coli via Tryptophan and

Methionine substitutions

vorgelegt von Master of Science (M. Sc.)

Isabella Tolle

an der Fakultät II – Mathematik und Naturwissenschaften der Technischen Universität Berlin

zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften Dr. rer. nat.

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Reinhard Schomäcker Gutachter: Prof. Dr. Nediljko Budisa

Gutachter: Prof. Dr. Thomas Friedrich Gutachterin: Prof. Dr. Zoya Ignatova

Tag der wissenschaftlichen Aussprache: 04. Juni 2021

Berlin 2021

(2)

Danksagung

Mein besonderer Dank gilt meinem Betreuer Herrn Prof. Dr. Nediljko Budisa für seine Unterstützung und die Überlassung dieses spannenden Promotionsthemas. Es erfüllt mich mit Stolz, dass er mir die Bearbeitung dieser Forschungsfrage anvertraute, aber es erfüllt mich mit noch größerer Dankbarkeit, dass er mir weitestgehend meinen eigenen, souveränen Umgang damit überließ. Ich habe diesen Vertrauensvorschuss jederzeit gespürt und hoffe, ihm angemessen gerecht geworden zu sein.

Ich möchte mich auch herzlich bei Prof. Dr. Thomas Friedrich und Prof. Dr. Zoya Ignatova für die Übernahme des Gutachtens meiner Dissertation, sowie bei Prof. Dr. Reinhard Schomäcker für die Übernahme des Vorsitzes meiner wissenschaftlichen Aussprache bedanken.

Weiterhin möchte ich mich bei allen Mitgliedern des Arbeitskreises Biokatalyse für das hervorragende Arbeitsklima bedanken, das sich besonders durch gegenseitige Hilfsbereitschaft, professionelle Kollegialität, aber auch durch freundschaftlichen Umgang auszeichnete. Insbesondere möchte ich mich bei Dr. Stefan Oehm für die nette Aufnahme in die Arbeitsgruppe, die Einarbeitung in mein Forschungsthema und nicht zuletzt auch dafür bedanken, dass er den infantilen Humor im Büro L111 salonfähig gemacht hat. Meinen Büro-Weggefährten Dr. Matthias Hauf, Dr, Jessica Nickling, Christin Treiber-Kleinke und Maxi Marock danke ich für die vielen humorvollen Gespräche (und die bisweilen eigenartige akustische Untermalung). Jessi danke ich darüber hinaus für ihre Motivation zu sportlichen Aktivitäten, welche ein wichtiger Ausgleich zum Laboralltag darstellten. Mein besonderer Dank gilt Christin, und zwar weit über ihre Arbeit bei der Generierung des Met-auxotrophen Stammes hinaus:

Sie ist eine treue und zuverlässige Kollegin und bewies dies unzählige Male durch ihre selbstlose Unterstützung sowohl im Laboralltag als auch darüber hinaus.

Dr. Tobias Schneider (aka Tobi II) danke ich herzlich für die Synthese von Trifluormethionin und seinen hilfreichen Ratschlägen zu allem Chemischen, die er stets mit seiner positiven, humorvollen Art erteilte.

Ich bedanke mich auch bei allen Mitgliedern der „Gourmet Lunch-Gruppe“, Dr. Ying Ma, Dr. Federica Agostini, Tuyet Mai Thi To und Georg Johannes Freiherr von Sass, für die vielseitigen, internationalen und köstlichen Mahlzeiten, welche dem Namen der Gruppe alle Ehre gemacht haben. Unsere anregenden und keineswegs ausschließlich intellektuellen Gespräche bei leckerem Essen haben jede Mittagspause zu einem Highlight werden lassen.

Mai, Hannes, Christin, Fede und Dr. Tobias Baumann danke ich aber hauptsächlich für die vielen produktiven wissenschaftlichen Diskussionen, sowie für ihren stets wertvollen und geistreichen Input zu meinen Forschungsfragen. Besonders diese Personen boten darüber hinaus die wichtige seelische Unterstützung, die im Laufe einer Promotion bisweilen unverzichtbar ist.

Besonders möchte ich mich bei Ying, Maxi, Tobi, Matze, Hannes, Fede, Mai und Christin für ihre wunderbare Freundschaft bedanken. Ihr habt das Büro – und Berlin – in allen Belangen noch besser gemacht!

Weiterhin möchte ich mich bei meiner besten Freundin, Manni, und der Kanutentruppe mit Max, Tabea, Tobi und Verena für ihre langjährige Freundschaft bedanken. Ich bedanke mich auch bei meinen Kommilitoninnen und Freundinnen, Lena und Daniela, die mit ihrer Unterstützung während des Bachelor Studiums den Grundstein für meinen beruflichen Werdegang mitgelegt haben und mich auch jetzt nach all den Jahren noch freundschaftlich begleiten.

Ganz besonders möchte ich auch meinem Bruder Fabian danken, zu dem ich schon mein ganzes Leben lang mit Bewunderung aufblicke und dem ich stets vertrauen kann. Vielen Dank für das kritische

(3)

Nicht zuletzt möchte ich auch meinen liebevollen Eltern und meinem ebenso liebevollen Partner Flo für ihren bedingungslosen und unermüdlichen Beistand in jeder Lebenslage danken. Ihr seid der Grund, warum ich ein nahezu sorgenfreies Leben führen durfte und mich komplett auf meine Promotion konzentrieren konnte.

(4)

Abstract

Billions of years of evolution have produced extant living organisms with a vast biodiversity and the ability to adapt to changing environments. At the foundation of it all lies the central dogma of molecular biology, according to which the information flows from the information storage polymer DNA to RNA (“informational polymers”), which is finally translated into proteins (“catalytic polymers”).

This fundamental translation process relies on the universal standard genetic code, that assigns nucleobase triplets to the 20 proteinogenic amino acids. However, even after some six decades of research and the formulation of various theories and models, the origin and evolution of the standard genetic code remain an enigma and a comprehensive and conclusive story has yet to be assembled.

The discovery of the 21st and 22nd amino acids selenocysteine and pyrrolysine imply a certain flexibility of the genetic code, which is further affirmed by the co-translational incorporation of over 200 noncanonical amino acids (ncAAs) into proteins over the last decades, culminating in the proteome- wide replacement of the latest addition to the genetic code, Trp, with its close structural analog L-β-(thieno[3,2-b]pyrrolyl)alanine (3,2[Tpa]). During this study, it was attempted to further alienate this strain, designated as TUB170, from life as we know it, by turning a tolerance towards [3,2]Tpa into an addiction. In TUB170 [3,2]Tpa incorporation relies on the catalytic promiscuity of the endogenous tryptophanyl-tRNA synthesis, which, in addition to charging its cognate tRNA with the canonical amino acid Trp, also charges the analog [3,2]Tpa to said tRNA. By replacing this enzyme with an enzyme capable of discriminating between these two amino acids, a biocontained organism dependent on a synthetic nutrient might be engineered. To this end, multiple enzyme libraries from different organisms were designed and assembled via site-saturation mutagenesis as well as error-prone PCR.

They were screened employing diverse experimental parameters with varying stringencies.

Nevertheless, an aminoacyl-tRNA synthetase that exclusively incorporates [3,2]Tpa could not be selected, which might be attributed to the close structural resemblance of the analog to its counterpart Trp, as this is what drove the choice of analog for the adaptation experiment of TUB170.

Another approach towards the creation of synthetic life might be through the replacement of further canonical amino acids, thereby advancing our understanding of the genetic code and its flexibility, as well as the interplay of diverse cellular processes. For the adaptation of an E. coli strain towards utilization of Met analogs, a Met-auxotrophy robust under all cultivation conditions was established in the laboratory wildtype strain MG1655. Furthermore, this strain was optimized for ethionine (Eth) turnover to S-adenosyl ethionine to promote transethylation reactions as a substitute for transmethylation, as Met functions as the precursor for the major cellular methyl donor. After 31 passages of continuous cultivation, an increase in general fitness could be observed, as evinced by a more stable number of colony forming units compared to those produced prior to the adaptation.

These results suggest that adaptation of a strain tolerant towards the replacement of methionine with its synthetic counterpart ethionine might be feasible.

(5)

Zusammenfassung

Milliarden von Jahren der Evolution haben Lebewesen mit einer erstaunlichen Variation und der Fähigkeit, sich ständig wechselnden Umgebungen und Lebensbedingungen anzupassen, hervorgebracht. Das Fundament dieses Lebens liegt beim zentralen Dogma der Molekularbiologie, nach dem die Information ausgehend vom informationsspeichernden Polymer DNS über die RNS („Informationspolymere“) fließt und letztendlich in ein Protein („katalytisches Polymer“) übersetzt wird. Diesem fundamentalen Übersetzungsprozess, der Proteintranslation, liegt der universale genetische Code zugrunde, welcher Basentripletts den 20 proteinogenen Aminosäuren zuordnet.

Allerdings bleiben die Entstehung und Evolution des genetischen Codes trotz sechs Jahrzehnten der Forschung und der Formulierung zahlreicher Theorien und Modelle weiterhin ein Mysterium.

Die Entdeckungen der 21. und 22. Aminosäuren, Selenocystein und Pyrrolysin, implizieren eine gewisse Flexibilität des genetischen Codes, welche weiterhin durch den co-translationalen Einbau von über 200 nichtkanonischen Aminosäuren (nkAS) in Proteine bestätigt wird und im proteomweiten Austausch der kanonischen Aminosäure Trp mit ihrem strukturellen Analogon L-β-(Thieno[3,2-b]pyrrolyl)alanin (3,2[Tpa]) kulminiert.

Im Rahmen dieser Doktorarbeit wurde versucht, die Toleranz dieses Stammes zur synthetischen Aminosäure [3,2]Tpa in ein Abhängigkeitsverhältnis umzuwandeln und den Stamm dabei weiter von einem natürlichen zu einem synthetischen Organismus zu verfremden. In diesem als TUB170 bezeichneten Stamm beruht der Einbau der nichtkanonischen Aminosäure auf der Tatsache, dass das endogene Enzym Tryptophanyl-tRNA-Synthetase zusätzlich zum natürlichen Substrat Trp, auch das Analogon [3,2]Tpa auf die zugehörige tRNA lädt. Durch den Austausch dieses Enzyms mit einem Enzym, welches zwischen den beiden Aminosäuren diskriminieren kann, könnte ein Organismus erschaffen werden, welcher zum Überleben vollständig von einem synthetischen Substrat abhängig ist. Hierzu wurden mehrere Enzymbibliotheken aus unterschiedlichen Organismen entworfen und mit Hilfe verschiedener Mutationstechniken generiert. Diese Bibliotheken wurden anschließend unter der Variation diverser Versuchsparameter gescreent. Jedoch konnte keine Aminoacyl-tRNA-Synthetase, welche ausschließlich [3,2]Tpa einbaut, selektiert werden. Dies ist möglicherweise der großen strukturellen Ähnlichkeit des Analogons zu seinem Gegenstück Trp zuzuschreiben, welche die Wahl auf dieses spezifische Analogon für das Adaptationsexperiment lenkte.

Eine andere Herangehensweise zur Erschaffung synthetischen Lebens könnte die Substitution weiterer kanonischer Aminosäuren sein, wodurch wir unser Verständnis des genetischen Codes und seiner Flexibilität, sowie des Zusammenspiels verschiedener zellulärer Prozesse vorantreiben könnten.

Zur Adaptation E. colis an Methionin-Analoga wurde eine Met-Auxotrophie im Wildtypstamm MG1655 etabliert, welche unter allen Kultivierungsbedingungen robust ist. Weiterhin, wurde dieser Stamm zur Umsetzung Ethionins in S-Adenosylethionin optimiert, um die Substitution von Transmethylierungsreaktionen durch Transethylierungen zu unterstützen, da Met als Vorstufe des Haptmethyldonors fungiert. Nach 31 Passagen fortlaufender Kultivierung, konnte über eine stabilere Anzahl der koloniebildenden Einheiten in der Gegenwart von Ethionin eine Verbesserung der allgemeinen Fitness beobachtet werden. Diese Ergebnisse suggerieren, dass eine Adaptation E. colis an das synthetische Substrat Ethionin möglich sein könnte.

(6)
(7)

I Table of Contents

II Abbreviations ... III

1 Introduction ... 1

1.1 Genetic Code Origin and Evolution ... 1

1.1.1 Trp and Met: Two of the Latest Additions to the Genetic Code ... 6

1.1.2 S-Adenosylmethionine (SAM) ... 8

1.2 Xenobiology ... 10

1.2.1 Genetic Code Engineering and Expansion ... 11

1.2.2 Biocontainment ... 19

1.3 Adaptive Laboratory Evolution (ALE)... 20

1.3.1 Adaptation towards [3,2]Tp usage ... 22

2 Aim of this Study... 24

2.1 Evolution of bacterial strains toward methionine analog usage ... 24

2.2 Biocontainment of TUB170 ... 25

3 Results and Discussion ... 27

3.1 Evolution of bacterial strains toward methionine analog usage ... 27

3.1.1 Establishing Met-auxotrophy ... 27

3.1.2 Choice of analogs ... 30

3.1.3 ALE starting conditions ... 35

3.1.4 Adaptive Laboratory Evolution (ALE) ... 37

3.1.5 Plug and play with metK ... 41

3.1.6 ALE 2.0 ... 51

3.2 Biocontainment of TUB170 ... 55

3.2.1 M. mazei PylRS Library ... 57

3.2.2 E. coli TrpRS Library ... 63

3.2.3 M. jannaschii TyrRS Library ... 71

4 Conclusion and Outlook ... 87

4.1 Evolution of bacterial strains toward methionine analog usage ... 87

4.1 Biocontainment of TUB170 ... 89

5 Materials and Methods ... 92

5.1 Materials ... 92

5.1.1 Chemicals ... 92

5.1.2 Media and supplements ... 92

5.1.3 Strains ... 94

5.1.4 Plasmids ... 95

5.1.5 Primers ... 96

(8)

5.1.6 Biomolecular reagents, enzymes, and kits ... 98

5.1.7 Buffers and Solutions ... 99

5.1.8 Miscellaneous ... 102

5.1.9 Technical equipment ... 102

5.2 Methods ... 104

5.2.1 Polymerase chain reaction (PCR) ... 104

5.2.2 DNA purification and Gel extraction ... 105

5.2.3 Protein expression ... 106

5.2.4 Protein purification ... 106

5.2.5 Agarose gel electrophoresis ... 107

5.2.6 Polyacrylamide gel electrophoresis ... 107

5.2.7 Restriction digest ... 107

5.2.8 Ligation ... 108

5.2.9 Assembly of aaRS Libraries ... 108

5.2.10 Double-sieve selection ... 109

5.2.11 Fluorescence readout ... 110

5.2.12 Expression of tryptophan synthase ... 110

5.2.13 Enzymatic [3,2]Tpa synthesis ... 111

5.2.14 Production of Competent Cells ... 111

5.2.15 Bacterial transformation ... 112

5.2.16 Isolation of plasmid DNA ... 112

5.2.17 Genome engineering ... 112

5.2.18 DNA concentration measurements ... 116

5.2.19 Protein concentration measurements ... 117

5.2.20 Sequencing ... 117

5.2.21 Mass spectrometry (MS) ... 118

6 Bibliography ... 119

7 List of Figures ... 148

8 List of Tables ... 155

9 Appendix ... 156

9.1 Heterologous MAT expression ... 156

9.2 ALE 2.0 ... 159

9.3 M. mazei PylRS Library ... 163

9.4 SCS of sfGFP(R2TAG) ... 164

9.5 Enzymatic [3,2]Tpa synthesis ... 165

(9)

II Abbreviations

[3,2]Tp L-β-(thieno[3,2-b]pyrrole

[3,2]Tpa L-β-(thieno[3,2-b]pyrrolyl)alanine

A absorption

AA amino acid

aaRS aminoacyl-tRNA synthetase ALE adaptive laboratory evolution

Amp ampicillin

AMP adenosine monophosphate

ATP adenosine triphosphate

bp base pair

c concentration

cAA canonical amino acid

CAGO CRISPR/Cas9-assisted gRNA-free one-step Cas CRISPR-associated

cat chloramphenicol resistance gene CAT chloramphenicol acetyl transferase

CDS coding DNA sequence

CFU colony forming unit

Cm chloramphenicol

CP1 connective peptide 1

CRISPR Clustered Regularly Interspaced Short Palindromic Repeats

ºC degree Celsius

Da Dalton (1.66018 × 10-24 g) dATP deoxyadenosine triphosphate dCTP deoxycitidine triphosphate

ddNTP dideoxyribonucleotide triphosphate dGTP deoxyguanosine triphosphate dH2O autoclaved distilled water DMSO dimethyl sulfoxide DNA deoxyribonucleic acid

dNTP deoxyribonucleotide triphosphate

ds double-stranded

DTT Dithiotreitol

(10)

dTTP deoxythymidine triphosphate E. coli Escherichia coli

Ec Escherichia coli

EDTA ethylene-diamine-tetraacetic acid EF-Tu elongation factor Tu

em emission

EP-PCR error-prone polymerase chain reaction ESI electron spray ionization

et al. et alii

EtBr ethidiumbromide

EtOH ethanol

ε280 molar extinction coefficient at λ = 280 nm

fwd forward

g gram

·g multiples of the standard gravity GMO genetically modified protein

h hour

HCl hydrochloric acid HGT horizontal gene transfer

IPTG isopropyl-beta-D-thiogalactopyranoside

K kilo

Kan kanamycin

kb kilobasepairs

L liter

LTEE long-term evolution experiment LUCA last universal common ancestor

λ wavelength

M molar

m milli

MAT methionine adenosyl transferase

max maximum, maximal

MetO methionine sulfoxide MetRS methionyl-tRNA synthetase

min minute

MS mass spectrometry

(11)

MTase methionine transferase

µ micro

M. barkeri Methanosarcina barkeri Mb Methanosarcina barkeri

M. jannaschii Methanocaldococcus jannaschii Mj Methanocaldococcus jannaschii M. mazei Methanosarcina mazei

Mm Methanosarcina mazei

n nano

NaCl sodium chloride NaOH sodium hydroxide

ncAA non-canonical amino acid

nt nucleotide

OD optical density

OD600 optical density at λ = 600 nm ori origin of replication

OTS orthogonal translation system PAGE polyacrylamide gel electrophoresis PCR polymerase chain reaction

PDB protein data bank PLP pyridoxal phosphate

PMSF phenylmethylsulfonyl fluoride

PPi pyrophosphate

Pyl pyrrolysine

PylRS pyrrolysyl-tRNA synthetase pylT pyrrolysyl-tRNA gene

pylS pyrrolysyl-tRNA synthetase gene R resistant, resistance

RBS ribosome binding site REase restriction endonuclease

rev reverse

RF1 release factor 1 RNA ribonucleic acid

RT room temperature

s second

(12)

SAM S-adenosyl methionine SCS stop codon suppression

sfGFP super folder green fluorescent protein SGC standard genetic code

SPI selective pressure incorporation SSM site-saturation mutagenesis

T temperature

TfMet trifluoromethionine

Tris tris(hydroxymethyl)aminomethane

Trp tryptophan

TrpRS tryptophanyl-tRNA synthetase tRNA transfer ribonucleic acid TyrRS tyrosyl-tRNA synthetase

UV ultra violet

V volume

v/v volume per volume w/v weight per volume w/w weight per weight

wt wild-type

XNA xeno nucleic acid

(13)

1 Introduction

„Wir wollen nicht nur wissen, wie die Natur ist (und wie ihre Vorgänge ablaufen), sondern wir wollen auch nach Möglichkeit das vielleicht utopisch und anmassend erscheinende Ziel erreichen, zu wissen,

warum die Natur so und nicht anders ist.“

“We not only want to know how nature is organized (and how natural phenomena proceed), but also as far as possible to gain the aim, which may look Utopian and impudent, to find out why the nature

is just such and not another.”

Albert Einstein, Über den gegenwärtigen Stand der Feld-Theorie (in: „Festschrift Prof. Dr. A. Stodola zum 70. Geburtstag“)1

1.1 Genetic Code Origin and Evolution

The mechanisms of evolution yielded extant living organisms with immense biodiversity and the potential to adapt and even further evolve in continuously varying environments. At the basis of it lies protein translation according to the genetic code with its components; mRNA, tRNA, aminoacyl-tRNA synthetases (aaRSs), and the ribosome. A closer look at the genetic code prompts several questions:

Why are amino acids encoded by triplets and why are the codon assignments what they are today?

What is the mechanism behind these assignments? Why are there 20 universal proteinogenic amino acids and why are these specific amino acids used? Why is the standard genetic code (SGC) universal?

These musings have occupied scientists for the past six decades and have led to numerous hypotheses regarding the origin as well as the evolution of the genetic code. Some of the most common theories are outlined below.

The stereochemical theory first proposed by Gamow in 1954 aims to explain the assignment of amino acids to their cognate triplets2. It was further developed by Woese in 1966 upon the observation that the amino acids exhibit different mobility in paper chromatography with pyridine as the mobile phase.

It was proposed that distinct amino acids might also display differing affinities towards the four nucleotides and that therefore amino acids might have been assigned to their cognate triplet (codon or anticodon) through direct interactions3,4. The theory claims support by experiments conducted with random aptamer libraries. After selection for amino acid binding, enrichment of cognate triplets was observed for some amino acids5–7. However, in some cases, the codon and in other cases the anticodon was enriched, and the amino acids with the strongest affinities are those commonly thought to be late additions to the genetic code due to their complex biosyntheses. While the larger and more complex sidechains of these amino acids offer wonderful handles for interactions with nucleic acids, it contradicts the notion that amino acids first entered the genetic code because of their affinity to their cognate triplets8.

In his seminal paper from 1968 Crick presents his “frozen accident” theory, wherein he proposes that once the genetic code had been defined, already a single codon reassignment would have been deleterious to the translation process and thereby the fidelity and viability of the organism9. This notion is supported by the fact that the SGC has barely changed since the emergence of the last universal common ancestor (LUCA) at least 3 billion years ago10,11. But why was there only one frozen accident?

Why are there not multiple codes of high fitness, separated by valleys of low fitness12? The answer likely lies within the existence of horizontal gene transfer (HGT), i. e. the lateral, inter-organismal

(14)

propagation of genetic information via plasmids, transposons, or viruses. While HGT confers vulnerability towards parasitic genetic elements, it provides an enormous advantage in changing environments, thereby promoting evolution13,14. The benefits seem to outweigh the disadvantages, slight variations in the SGC occur only through random drift in isolated microbial populations such as parasitic and endosymbiotic bacteria, as well as organelles, where no HGT can take place15–17. The

“frozen” part of the theory attempts to explain why there are (only) 20 common proteinogenic amino acids, however, the “accident” part is challenged by the observation that the genetic code is clearly nonrandom18.

In 1975 Wong proposed the coevolution theory, arguing that the genetic code evolved parallel to the amino acid metabolic pathways19–22. Accordingly, simple, abiogenic amino acids entered the code first, and as biosynthetic pathways for more complex amino acids evolved, these amino acids became available and were added to the code. Abiogenic amino acids are those that can be produced in prebiotic experiments, such as the famous Miller experiment23, under plausible prebiotic conditions from inorganic material and can also be found in meteorites24. Their order of abundance in both, meteorites and prebiotic experiments, corresponds to their chemical complexity and thermodynamic stability: Gly, Ala, Asp, Glu, Val, Ser, Ile, Leu, Pro, Thr25–28. The initial (few) amino acids were encoded by large blocks of codons, which split to accommodate new additions. Thus, precursors made room for their products, which were likely only one catalytic step away19.

In this context, it is worth considering that the biosynthesis of polymers, such as proteins and nucleic acids, requires a continuous supply of monomers, which can be more reliably provided by a stable metabolism rather than abiogenic sources such as meteorites. Thus, the role of the core metabolism, which is universal to life on earth, should not be underestimated when reflecting on the emergence of life.

The coding coenzyme handle theory attempts to illustrate the advantage of amino acid acquisition in the RNA world leading to the origin of the genetic code. Amino acids, with their diverse chemical functionalities, might have served as catalysts at the catalytic core of ribozymes, thereby fulfilling the role of coenzymes. An amino acid on a nucleotide-based cofactor (e.g. a handle’s triplet end loop) might have been chemically attached to ribozymes as a sort of a proto-tRNA handle29,30. There are 123 peptides spanning all main enzymatic classes that only have a single amino acid at their catalytic core, demonstrating that single amino acids are indeed capable of catalysis. While histidine is generally the most abundant amino acid at catalytic sites, aspartic acid and glutamic acid, which are thought to be among the earliest additions to the code, more frequently act alone18. Furthermore, the cofactor S- adenosylmethionine (SAM), comprised of an amino acid attached to an adenosine moiety, might very well be a relic from that time supporting the coenzyme theory31,32. On a different note, experiments with the dipeptide PheLeu show that upon attachment to vesicle membranes, the dipeptide recruits fatty acids, promoting vesicle growth and thereby demonstrating another adaptive virtue of amino acids during early life evolution33. Nonetheless, the coding coenzyme handle theory fails to provide a sound reason for why amino acids became associated with triplets18.

When plotting the variations in GC content of the first, second, and third codon positions against the GC contents of a variety of organisms, it becomes apparent that the variation of GC content between different organisms varies significantly for each codon position. While at position one there is a 31%

variation in GC content between organisms, the variation is only 12% at position two and 80% at position three34,35. As during evolution advantageous mutations are selected for, while disadvantageous mutations are selected against, the low variation at position two might indicate mutational constraints and thereby the importance of this position35,36. While the second position specifies the type of amino acid (hydrophobic, hydrophilic, semipolar), the first position indicates a

(15)

specific amino acid and the third position is often redundant (Figure 1) and allows for wobble base pairing, where only the type of nucleobase (purine or pyrimidine) is important37. These observations gave rise to the 2-1-3 model38 and the related 4-column theory39. According to these models, at first, only the middle codon position specified the amino acid. With G at position one stabilizing codon- anticodon interactions with its three hydrogen bonds, the primordial code could have looked like this:

GGN=Gly, GAN=Glu/Asp, GCN=Ala, and GUN=Val. These amino acids are the five most abundant in prebiotic soups and meteorites. As more amino acids were added to the code, position one became coding, followed by position three8,37.

Figure 1 I Radial representation of the genetic code in mRNA format. The primary importance of the second codon position in determining the type of amino acid is emphasized. The first position determines the specific amino acid and the third (wobble) position demonstrates the degeneracy of the genetic code. The natural expansion of the genetic code at opal (SeC) and amber (Pyl) is illustrated in pink.

Perusal of the codon table (Figure 1) shows that the codons of similar amino acids often only differ in one position. For example, all aliphatic hydrophobic amino acids are encoded by U at position 2 and the two acidic amino acids Asp and Glu only differ by either having a pyrimidine or a purine at position three. Consequently, point mutations often only have a minimal effect on the physicochemical properties of the encoded amino acid, making the code very robust against errors in replication, transcription, and translation. The error minimization theory attributes these properties to evolution under selection for maximum robustness40. Critics of this theory point out that in this case, the only observable pattern of the SGC should be error minimization, however, other patterns, such as biosynthetically close amino acids having similar codons, are there10,41,42. Numerous cost functions to

U C A G

U C A G U

U

C A C

A G G

U C A G G

G

G

G G G G

G G G G

U G

G G

U U

U

U

U U U U

U U U A

U U

U C

C

C

C

C C C C

C C C G

C C

C

A A

A

A A A A A

A A A

C A

A

A

C

A U G

amber opal

semipolar

hydrophobic hydrophilic

(16)

assess the robustness of the SGC and compare it to other codes have been computed and indicate that, while the SGC is well adapted to mitigate errors, many even more robust versions are conceivable12,36,43–50. In another study, evolution simulations of random codes compare evolution according to three different models: the 2-1-3 model, the precursor-product expansion model (coevolution), and the ambiguity reduction model. The latter model operates under the assumption that initially groups of amino acids were encoded ambiguously and specificity was increased over the course of evolution51. The codes resulting from the simulations of the 2-1-3 and the ambiguity reduction model were more robust than the standard genetic code, while the one resulting from the coevolution model was inferior45–47. Although the robustness of the SGC is undeniable, these findings suggest that it might have arisen as a selectively neutral by-product of evolution, as opposed to being the driving force behind the evolution8.

As all of these theories capture some, but not all aspects of the SGC, they are not mutually exclusive, and newer theories draw on them offering slight variations and piecing them together to supply a broader picture. Koonin and Novozhilov for example, propose grouping the amino acids according to the free energy of their codon-anticodon interactions (which naturally correlates to the GC content):

a) strong (-3.1 kcal/mol mean free energy) -> Gly, Ala, Pro, Arg; b) intermediate (-2.2 kcal/mol) -> Asp, Glu, Cys, Ser, Gln, His, Lys, Thr, Trp; and c) weak (-1.0 kcal/mol) -> Asn, Ile, Leu, Met, Phe, Tyr8. The interactions of the latter group are so weak, that an extended anticodon loop with a modified tRNA base is required for stable codon-anticodon pairing, further supporting the idea of these amino acids being late additions to the code52. Grosjean and Westhof also observe a correlation between the increase of complexity of the modifications for the anticodon bases 34 and 37 and the AU-content of codon-anticodon pairs53. This order of appearance is well in line with the coevolution theory.

Phylogenetic analyses of the Rossmann fold and biotin synthase superfamilies assert that their members had already been evolved by the time the aaRS classes evolved their specificities54–56, suggesting high fidelity translation prior to the introduction of aaRSs. Therefore, early mRNA decoding might have been handled by proto-tRNAs bearing unique pockets for amino acid attachment12 and the first anticodons could have been assigned randomly through a “frozen accident”. Driven by diversification of the repertoire with error minimization as a by-product, the code expanded via duplication of the proto-tRNAs, resulting in codon grouping of related amino acids8.

Borrowing from the coevolution theory as well, Hartman and Smith57 present a theory not too different from that of Koonin and Novozhilov. Looking at the ribosome and aminoacyl-tRNA synthetases and combining these observations with a metabolic metric, Hartman and Smith arrive at an amino acid order very similar to the one presented above. There are two classes of aaRSs defined by their catalytic domains with ten amino acids belonging to each class. The class II enzymes bear a fold found in Biotin Carboxylases and are mostly associated with prebiotic amino acids, which led to the proposition that it might be the older of the two classes, whereas class I aaRSs contain a Rossmann fold58. Furthermore, since not all aaRSs interact with the anticodon loop, but interaction with positions 1-2-3 in the acceptor arm is always necessary for the identification of cognate tRNAs, an ancient operational code consisting mostly of GC pairs was proposed59,60. Taken together with the observation that the oldest parts of ribosomal proteins, thought to be the most ancient peptides61, show a bias for the amino acids Gly, Arg, Pro, Lys, Ala and additionally minding the number of catalytic steps from the citric acid cycle, Hartman and Smith suggest a GC-GCA-GCAU scheme for the evolution of the genetic code. According to this scheme, the first amino acids, Gly, Ala, Pro, and a simpler Arg precursor, were encoded by triplets consisting solely of G and C. The resulting peptides were largely unstructured, but the positively charged arginine precursor allowed interaction with negatively charged RNA backbones. The addition of A to the triplets allowed the recruitment of catalytic and polar amino acids and led to the formation

(17)

of α-helices. Finally, together with U, hydrophobic amino acids entered the SGC, enabling the formation of globular structures with a hydrophobic core, as well as interaction with membranes. This GC-GCA-GCAU model correlates not only with the metabolic significance of the amino acids but also with the hierarchy of protein folding57,62.

Further, Budisa and Kubyshkin observe a correlation between the predominant secondary structure elements (α-helices, β-sheets) and the identity of the monomers that compose our catalytic polymers.

Building on the model proposed by Hartman and Smith, they form the “alanine-world” hypothesis, where they recognize that all amino acids added to the SGC after the proposed GC-phase are alanine derivatives, the amino acid with the greatest α-helical propensity (31,32). Protein backbones are constituted by alanine moieties, whereby the attached side-chain defines the chemical function. Thus, point mutations do not impact the backbone fold and only an accumulation of mutations impairs the secondary structure63.

Even after some six decades of research and the formulation of various theories and models, the origin and evolution of the standard genetic code remain an enigma and a comprehensive and conclusive story has yet to be assembled. This might well be, at least partly, attributed to the ancient “chicken or egg” paradox: a functional translation system depends on proteins, yet there are no proteins without a translation system8. Nevertheless, there are a few undeniable truths and characteristics of the code that everybody agrees on. Firstly, the code is degenerate, with several codons coding for a single amino acid and there is a rough correlation between the number of codons for one amino acid and its frequency in proteins64. The SGC is universal and nonrandom. Billions of years of evolution produced only slight variations in the code and while the codons remain mostly the same throughout all three domains of life, only the frequency of codon usage differs from organism to organism. However, codon usage should not be neglected, as rare codons are involved in the regulation of co-translational protein folding, have an effect on covalent protein modifications during and after synthesis, and affect co- and posttranscriptional secretion65. Highly expressed genes constitute many common codons with prevalent tRNAs and only few rare codons65,66. Lastly, the code is robust, but not optimal and it expanded from a smaller set of simple, primordial amino acids to the current set of 20, with methionine and tryptophan among the most recent acquisitions. It should be noted, that some organisms are capable of incorporating an additional amino acid into their proteins, either selenocysteine or pyrrolysine, thereby expanding their repertoire67–71. Pyrrolysine will be discussed in more detail later.

(18)

1.1.1 Trp and Met: Two of the Latest Additions to the Genetic Code

The amino acids methionine and tryptophan are thought to be two of the latest additions to the genetic code and are both activated and charged onto cognate tRNAs by class I aaRSs. They are among the more complex proteinogenic amino acids and those with the greatest metabolic cost. They are the only amino acids with a single codon, breaking the code’s degeneracy57 and the rarest with an abundance in proteins of 1.4% for Trp72 and 2% for Met73. In comparison, leucine has an abundance of 9.1% in proteins72.

Both amino acids are hydrophobic and contribute to protein stability. As the bulkiest amino acid, Trp affords a large surface for Van der Waals interactions in protein hydrophobic cores72. Furthermore, it can participate in cation – π interactions (preferably with Arg, Figure 2 c), which belong to the most stable non-covalent interactions74–76, and with its indole nitrogen Trp can act as a hydrogen-bond donor. A common feature for protein stabilization are aromatic – aromatic interactions, where often three or more aromatic side chains interact in the protein core77. The most frequent conformation among aromatic side chains is the perpendicular edge-to-face conformation (Figure 2 a)78, followed by parallel- displaced (Figure 2 b) or offset-stacked interactions, whereas aromatic stacking causes repulsion of the π electrons and is therefore rare79,80.

While Trp is considered to be hydrophobic, there are more than 40 hydrophobicity scales published with Trp in varying positions81,82. Therefore, together with Met, Trp shows no strict preferences for protein interiors or surfaces. Using the same interactions described above for protein stability, at the surface Trp plays a key role in enzyme-substrate binding, antigen-antibody recognition, receptor-ligand interactions, membrane anchoring, and due to its similarity with the nucleobases also DNA/RNA binding83,84.

Owing to the increased electron density in the π-system from the nitrogen lone-pair and the electron delocalization across this only polyaromatic amino acid, tryptophan is easily oxidized and susceptible to electrophilic substitutions, such as alkylation, nitration, or halogenation, resulting in vast chemical diversity85,86. It is therefore not surprising that Trp is not only used in protein biosynthesis but also serves as a precursor for many complex natural products such as alkaloids, hormones, antibiotics, anticancer agents, and antifungals87,88. These attributes make tryptophan an attractive target for drug discovery and development88, as well as studying protein function, structure and dynamics89. With the unique biophysical properties of Trp and its analogs and as the rarest amino acid, Trp is the ideal candidate for site-specific, intrinsic probes in proteins. Furthermore, the tryptophanyl-tRNA synthetase (TrpRS) is fairly permissive towards Trp analogs, enabling the incorporation of a variety of non- canonical amino acids (ncAA). Most aaRSs did not evolve any editing mechanisms against synthetic amino acids, likely because they were not present during evolution and therefore did not impose selection pressures against their incorporation into proteins90. Incorporation of ncAAs will be discussed in more detail further below.

Figure 2 I Common Trp orientations. a) Edge-to- face orientation in a β-hairpin peptide (1LE0) b) Parallel-displaced orientation in a parallel β-sheet (2KI0) c) Cation-π interaction between Arg and Trp (2B2U).

(19)

As already mentioned above, Met contributes to protein stability, often via interactions with an aromatic side chain91. While sulfur – aromatic interactions are longer (5-7Å) than salt bridges (<4Å), both types of interaction have comparable energies92,93. In the TrpRS catalytic site, for example, Met129 plays a crucial role in Trp recognition with the sulfur pointing to the middle of the indole ring94. Moreover, due to its unbranched nature and the fact that the S-C bond affords little energetic difference between its rotamers, the Met sidechain is very flexible and capable of molding itself to diverse sequences95. Additionally, the sulfur can be reversibly oxidized by either Mical or methionine sulfoxide reductase A (MsrA) and reduced back to methionine by MsrA or MsrB (in bacteria).

Sulfoxidation turns the apolar Met side chain into highly polar MetO, which in some instances causes drastic changes in the physicochemical properties of the entire protein and in other cases has little or no effect on the protein91.

In the latter instances, Met has been proposed to serve as an antioxidant by scavenging reactive oxygen species (ROS)96. It has been observed that cells bearing norleucine in their proteins instead of methionine exhibit lower fitness under oxidative stress97. Thus, the fact that norleucine lacks the sulfur and the associated scavenging properties, as well as the flexibility, might be the reason why methionine asserted itself over norleucine in the genetic code over the course of evolution. In another study, where longevity was used as an inverse proxy to ROS production, it was observed that mitochondrial peptides from short-lived species contain more Met residues than their counterparts from long-lived species98. Indeed, in mammalian nuclear DNA, the Met codon appears with a 2% abundance, whereas in their mitochondria (where the respiratory activity is highest) the abundance rises to 6%, as there the AUA Ile codon is reprogrammed to code for Met73. In a phenomenon known as adaptive mistranslation, the MetRS is phosphorylated by ERK1/2 under oxidative stress, causing MetRS promiscuity and misacylation of non-methionyl tRNA with Met99. While under non-stress conditions about 1% of Met residues are misacylated, the number increases to 10% during oxidative stress100. Taken together, these observations suggest a protective role for methionine against ROS.

As already indicated above, in other cases site-specific Met sulfoxidation provokes major changes in the affected protein and related pathways, suggesting a position as a posttranslational modification and concomitant cell signaling. For example, oxidation of actin Met44 by Mical results in depolymerization of F-actin, subsequent reduction of MetO44 by MsrB induces G-actin polymerization101–103. Further, there are examples of indirect signaling, where sulfoxidation within phosphorylation motifs impedes phosphorylation104,105, or where kinases are activated/inactivated upon sulfoxidation106,107.

A calculation on the probabilities of each amino acid being replaced by methionine over a certain evolutionary time period indicated Ile, Val, Leu, but also Gln, Lys, and Thr as the most probable amino acids108. Whereas Ile, Val, and Leu are the obvious candidates for Met substitution, Gln and Thr are great MetO mimics109,110, suggesting that these amino acids might have served as MetO predecessors and further hinting at a late arrival of Met in the genetic code, perhaps in response to local oxygenation111.

In contrast to Trp, which is not only the least abundant amino acid in proteins but also as a free amino acid in cells112, Met, while being rather scarce in proteins (about 2%), in the form of S- adenosylmethionine serves as the main methyl donor throughout all kingdoms of life.

(20)

1.1.2 S-Adenosylmethionine (SAM)

S-adenosylmethionine is the second most widely used enzyme substrate after ATP113. It is synthesized from methionine and ATP by the enzyme methionine adenosyltransferase (MAT, Figure 3a) and serves as a substrate in a variety of different cellular processes114. As the major methyl-donor in all living organisms113,115, S-adenosylmethionine is involved in a plethora of transmethylation reactions with diverse substrates ranging from nucleic acids to hormones, neurotransmitters, phospholipids, and natural products (Figure 3e)116,117. Upon donation of its methyl group, methyltransferases convert SAM to S-adenosylhomocysteine (SAH), which is in turn hydrolyzed to adenosine and homocysteine by SAH hydrolase118. Homocysteine is then either turned into the antioxidant glutathione or recycled back to methionine119.

Methylation of DNA occurs at adenine and cytosine nucleotides, resulting in N6adenine, C5cytosine, and in bacteria also N4cytosine120,121. In bacteria, DNA methylation is primarily carried out by restriction-modification systems (RMS), which serve as a primitive immune system122. They consist of a methyltransferase (MTase), as well as a restriction endonuclease (REase)123 and protect the cell from foreign DNA, by digesting unmethylated sequences, such as those derived from bacteriophages. Self- recognition is achieved by methylation of specific sequence motifs, rendering those sequences resistant to REase activity124.

Furthermore, methylation plays an important role in DNA replication by modulating the affinity of replication-associated proteins to the chromosomal origin of replication (oriC)125, preventing re- initiation of replication, as well as assisting in correct chromosome distribution to the daughter cells via hemi-methylated chromosome binding to designated areas of the cell membrane126. During mismatch repair, methylation helps in distinguishing the correct template from the newly synthesized strand bearing the replication error127,128. Methylation in promoter regions and protein binding sites affects the affinity of RNA polymerase and transcription regulators129,130, allowing for quick adjustment to environmental changes, for example via the RpoS-mediated stress response131. Gene expression is additionally regulated through methylation of histones in eukaryotes and histone-like proteins in bacteria, which also aid in the minimization of chromosome length via supercoiling132,133. Furthermore, DNA methylation regulates the virulence of human pathogens, as well as motility and adhesion134,135. There are 144 known types of RNA modification. Whereas the heavily modified tRNAs exhibit an extraordinary diversity of modifications, the most common modification in ribosomal RNA (rRNA) is methylation136. Methylation of rRNA occurs predominantly in the proximity of ribosome functional centers137, where it impacts rates and accuracy of translation138. For example, m2G966 in the 16S rRNA in bacteria interacts with the wobble base pair in the ribosomal P-site139, and loss of methylation at this position impairs translation initiation140,141. Further, rRNA methylation affects responses to metabolites142–144, and resistance of pathogenic bacteria to rRNA-targeting antibiotics is often achieved through methylation of key nucleotides145.

(21)

Figure 3 I Overview of some biological processes with SAM participation. a) SAM biosynthesis. b) Donation of the amino group for biotin biosynthesis. c) Donation of ribosyl group for tRNA modification. d) Donation of aminoalkyl group for tRNA modification. e) Methyl group donation in a range of biological reactions involving DNA, RNA, proteins, and natural products.

f) Aminoalkyl group used in polyamine synthesis. g) Donation of the aminoalkyl group in the synthesis of the quorum-sensing molecule N-acylhomoserine lactone. h) SAM aminoalkyl group utilized in 1-aminocyclopropane-1-carboxylic acid (ACC) synthesis (precursor of the plant hormone ethylene). i) SAM as a source of methylene groups in cyclopropane fatty acid (CFA) synthesis.

(22)

However, SAM does not only donate its methyl group to miscellaneous cellular processes but rather every single functional group of this versatile molecule is used116. The methylene group is utilized during cyclopropane fatty acid (CPA) biosynthesis (Figure 3i)146,147 and SAM donates its amino group to biotin biosynthesis (Figure 3b)148,149. One of the numerous tRNA modifications mentioned above is queuosine, whose ribosyl group stems from SAM (Figure 3c) and which can be found at position 34 in the anticodon loop of asparaginyl-, aspartyl-, histidyl-, and tyrosyl-tRNA150. SAMs aminoalkyl group is employed in the modification of phenylalanyl-tRNA (Figure 3d)151, as well as in the synthesis of the quorum-sensing molecule N-acylhomoserine lactone in bacteria (Figure 3g)152. During polyamine biosynthesis, decarboxy-SAM donates its aminoalkyl group to the conversion of putrescine to spermidine (Figure 3f)153. Even 5’-deoxyadenosyl radicals are derived from SAM and help radical SAM enzymes to carry out a multitude of biological reactions, usually by the abstraction of a substrate hydrogen atom154. Figure 3 provides an overview of some of the many biological pathways influenced by SAM and illustrates the importance of this molecule. Alteration of methylation has been implicated in cancer155, inflammation156,157, neurodegenerative and neuropsychiatric disorders158–160, metabolic disorders161, and drug resistance162–164. However, studying methyltransferase spatial and temporal resolution, as well as specificity and function is challenging165. Therefore, SAM analogs provide a valuable tool to investigate epigenetic regulation166–168 and track protein169–171 and RNA methylation172–

174.

1.2 Xenobiology

For already thousands of years humankind has been using microorganisms for baking and brewing.

Scientific understanding of how microorganisms and cell extracts can be applied for useful biotransformations gave rise to the first wave of biocatalysis more than a century ago175. The second wave of biocatalysis (1970s-1980s) was shaped by emerging protein-engineering techniques, allowing for overexpression and isolation of enzymes176,177. With the inception of directed evolution in the 1990s, tailoring and optimization of protein activity, stability, selectivity, as well as expansion of the substrate scope became possible and characterize the third wave of biocatalysis178–181. Nowadays, biocatalytic products can be found in a wide variety of goods ranging from medicines to vitamins, additives, biofuels, fragrances, and polymers178–186. However, although enzymatic transformations are regio-, enantio-, and chemoselective and achieve high rates and lifetimes, nature, employing mainly carbon, hydrogen, oxygen, nitrogen, sulfur, and phosphorus is somewhat limited when compared to the vast richness of organic synthetic chemistry187,188. For example, boron, fluorine, and silicon, which are important in medicinal chemistry189,190, rarely occur in living organisms and there are a plethora of industrial transformations that are not (yet) accessible through enzymatic reactions191. The chemical space of the macromolecules employed in nature is restricted by their building blocks, i. e. the four nucleotides adenosine, guanosine, cytidine, thymidine/uridine, and the 20 canonical amino acids.

Therefore, scientists have started to introduce new-to-nature building blocks and cofactors, heralding the fourth wave of biocatalysis and bringing about the field of xenobiology192,193.

All levels of the central dogma of molecular biology are subject to xenobiological research. At the informational level, unnatural base pairs offer the possibility of expanding the genetic alphabet, while xeno nucleic acids (XNA) can replace DNA or RNA in storage and propagation194. Nucleobase modifications can alter base pairing properties and modifications of the sugar or phosphate moiety

(23)

can confer nuclease resistance195. Furthermore, at the interface of the informational and executional level, manipulation of nucleic acids can broaden the substrate scope of aptamers. Aptamers are short nucleic acids, which bind to their target molecule selectively and with high specificity196,197. For example, combining click chemistry with SELEX (systematic evolution of ligands by exponential enrichment) enables the selection of aptamers against previously inaccessible protein targets via the incorporation of an alkyne-modified nucleotide. Subsequent modification with an azide of choice offers the possibility of modularly choosing various modifications while retaining compatibility with the conventional steps of the selection procedure198,199. Finally, at the translational level, the incorporation of noncanonical amino acids into proteins is used to either expand or alter the genetic code.

Xenobiology can serve several purposes. The budding field of functional xenobiology aims at endowing the resulting macromolecule with new abilities, such as catalysis of chemical reactions that do not occur in nature. For example, incorporation of (2,2’-bipyridin-5yl)alanine (BpyA) into the lactococcal multidrug resistance regulator LmrR facilitates site-specific CuII-binding. The thusly modified enzyme is capable of catalyzing Friedel-Crafts alkylation200, as well as water addition to enones201.

Controlled insertion of bioorthogonal functional groups and markers provide valuable tools for the study of protein function and structure, while another field is concerned with biosafety via biocontainment of genetically modified organisms (GMO)202. Lastly, exploring the boundaries of the genetic code and experimentation with alternate building blocks can afford fundamental insights into the origin and evolution of life203.

1.2.1 Genetic Code Engineering and Expansion

First experiments on the incorporation of noncanonical amino acids into proteins were conducted as early as the 1950s, where Levine and Tarver fed rats with the methionine analog ethionine, which bears an ethyl group instead of a methyl group at the sulfur atom. Labeling of the methylene group of the ethyl residue with 14C led to the observation that the ncAA ethionine is indeed incorporated into rat proteins204. By now, hundreds of different ncAA have been incorporated into specific protein targets202,205, ranging from synthetic substrates that are structurally similar to their canonical counterparts designated as analogs, to those that exhibit more diverse sidechains and are classified as surrogates206.

The introduction of cAA analogs enables precise manipulation of target proteins at the single-atom level207,208. For example, biologically abundant elements like sulfur and hydrogen can be substituted by elements that do not frequently appear in biological molecules, such as selenium, fluorine, or heavy atoms209–211; modifications that are helpful for the determination of protein structure via crystallography and 19F-NMR spectroscopy212–214. Further, redox reactions can be tuned by inserting electron-withdrawing/donating groups, such as nitro and methoxy groups215. Incorporation of ncAAs with chemoselective tags216,217 allows for site-specific protein labeling with fluorophores and probes, for example via click reactions218, thereby facilitating (among other things) protein localization studies219. UV-sensitive sidechains offer the possibility of spatio-temporally controlling functional moieties and inducing photoreactive groups, as well as using photoswitches to induce conformational changes219,220. Strategically placed spectroscopic probes also enable characterization of protein conformation by making use of Förster resonance energy transfer (FRET)221 or studying allosteric

(24)

information transfer via vibrational energy transfer (VET)222. Furthermore, the role of amino acids in catalytic mechanisms can be deciphered by utilizing ncAAs as biophysical probes223–226.

In contrast to techniques involving chemical peptide synthesis, co-translational incorporation of ncAAs has the advantage that the resulting enzymes are already a part of living systems, which facilitates further manipulations via directed evolution or integration into existing biosynthetic pathways192. There are two major approaches for the co-translational incorporation of ncAAs, namely selective pressure incorporation (SPI)227 and stop-codon suppression (SCS)228. The suitability of each technique for a given project depends on the choice of ncAA (analog or surrogate), as well as the nature of the desired modification (global or site-specific). Both techniques, however, depend on cellular uptake of the ncAA, usually via amino acid transporters or diffusion. As charged molecules are often impermeable to cell membranes, delivery in form of dipeptides can improve ncAA transport229. Alternatively, non-charged, permeable precursors can be transformed into the amino acid intracellularly230,231.

1.2.1.1 Selective Pressure Incorporation (SPI)

Selective pressure incorporation is most famously used for the incorporation of selenomethionine or azidohomoalanine for x-ray crystallography applications213,214 and in vivo protein labeling232. In SPI a canonical amino acid is replaced by a structural analog in a residue-specific manner. This technique (Figure 4, top). exploits the substrate tolerance of an endogenous aaRS and the translation machinery towards isostructural, synthetic amino acids, resulting in their incorporation into a target protein in response to their canonical counterpart’s codon233,234. Bacterial strains that are auxotrophic for the amino acid of interest, i. e. strains where the biosynthesis pathway of this amino acid has been deleted, can be driven toward ncAA incorporation upon cultivation in defined synthetic media deprived of the canonical amino acid and instead supplied with the non-canonical counterpart.

Typically, the cells are first cultivated in rich media until mid-log phase, allowing for the synthesis of essential cellular components under nutrient-rich conditions with the full set of canonical amino acids.

After washing and transferring the cells into minimal media containing the ncAA and only 19 cAA, production of the target protein is induced90. Deprived of one cAA and with no means of synthesizing that amino acid, the cells resort to incorporating the analog in the positions of the missing amino acid, culminating in the global replacement of this amino acid with its synthetic counterpart.

This method of protein modification is also known as genetic code engineering. While this technique is relatively simple and requires no additional translational components, it relies on the promiscuity of aaRSs and is therefore limited to structurally similar amino acids. As all sense codons of the replaced amino acid are suppressed, multiple incorporations of ncAAs are possible, but no specific position can be targeted209.

(25)

Figure 4 I Schematic overview of the two main techniques for ncAA incorporation: selective pressure incorporation (SPI, top) and stop-codon suppression (SCS, bottom).

1.2.1.2 Stop-Codon Suppression (SCS)

The first site-specific incorporation of a ncAA into a protein was already achieved in 1989 by the Schultz lab in a cell-free system. Phe analogs were incorporated into β-lactamase in response to an amber stop codon by supplying chemically acylated tRNAs to an in vitro transcription-translation system235. Twelve years later, the same group published the first orthogonal translation system (OTS) and reported in vivo incorporation of O-methyl-tyrosine228. To date, more than 200 ncAAs have been site-specifically incorporated in a variety of proteins205.

In contrast to SPI, where a canonical amino acid is replaced by an analog and the total number of amino acids stays the same, during stop-codon suppression an extra amino acid is added to the pool of 20 cAAs (Figure 4, bottom). Therefore, this method is also known as genetic code expansion and necessitates certain features: an unassigned or liberated codon that can be assigned to encode the non-canonical amino acid, an orthogonal aaRS/tRNA pair for the delivery of the ncAA to the ribosome, efficient transport of the non-canonical amino acid into the cell or its biosynthesis by the cell, as well as nontoxicity and metabolic stability of the ncAA. The orthogonal aaRS is not allowed to charge any endogenous tRNA’s with the ncAA and should not charge its cognate tRNA with any canonical amino acids, while the orthogonal tRNA, as well as the ncAA, should not serve as a substrate for any endogenous aaRS. However, the orthogonal aaRS/tRNA pair still needs to be compatible with the ribosome and its elongation factors. As the name of the technique already indicates, stop codons are

CGUXXX GCA ribosome

mRNA

5’ 3’

nonsense anticodon

}

nonsense codon CGU

5’

3’

ncAAs ncAAs

X

NNN 5’

3’

}

endogenous aaRS/tRNA

CGU 5’

3’

19 cAAs

20 cAAs

defined, synthetic media

rich media

SPI

SCS

NNN 5’

3’

orthogonal aaRS/tRNA

NNN

Referenzen

ÄHNLICHE DOKUMENTE

15 Cells transformed with mutNO 2 -PheRS, mutRNA CUA Tyr and the mutant Z domain gene were grown in the presence of 1 mM pNO 2 -Phe in minimal medium containing 1%.. glycerol and 0.3

In light of the asymmetry in market size between the two economies, MERCOSUR and the United States, the original conjecture of this study was that exports from the United States

I Über den Zeitpunkt des Empfangs kann keine Aussage getroffen werden (Unbounded indeterminancy). I Über die Reihenfolge der Empfangenen Nachrichten wird im Aktorenmodell keine

fimbriatus by its larger size (snout-vent length up to 200 mm vs. 295 mm), hemipenis morphology, colouration of iris, head and back, and strong genetic differentiation (4.8 %

Different objectives were envisaged with these reforms: the containment of illegal work, especially in private households, the strengthening of employment subject to social

For the first three, a 2GB slice of the MAWI dataset [7] was used, while the last (fourth) assessment is based on 13.8GB from the same dataset. In the first experiment, the

Joined analysis of topological properties of hydrogen bonds and covalent bonds from accurate charge density studies by the maximum entropy method.. Submitted to

[10] This approach revealed one of the major challenges of expanding the genetic code: the evolution of additional, noninteracting (orthogonal) translational