• Keine Ergebnisse gefunden

Analytical development, biochemical and biomedical applications of high resolution mass spectrometric proteome analysis

N/A
N/A
Protected

Academic year: 2022

Aktie "Analytical development, biochemical and biomedical applications of high resolution mass spectrometric proteome analysis"

Copied!
211
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Analytical development, biochemical and biomedical applications of high resolution mass

spectrometric proteome analysis

Dissertation

zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften

an der Universität Konstanz

vorgelegt von

Iuliana Şuşnea Konstanz 2010

Tag der mündlichen Prüfung: Donnerstag, den 17. Februar 2011 1. Referent: Prof. Dr. Dr. h.c. Michael Przybylski

2. Referent: Prof. Dr. Jörg Hartig

3. Referent: Prof. Dr. Valentin Wittmann

Vorsitzender der Prüfungskomission: Prof. Dr. Gerhard Müller

Konstanzer Online-Publikations-System (KOPS) URN: http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-131629

URL: http://kops.ub.uni-konstanz.de/volltexte/2011/13162/

(2)

"Şi chiar dacă aş avea darul prorociei, şi aş cunoaşte toate tainele şi toată ştiinţa;

chiar dacă aş avea toată credinţa aşa încît să mut şi munţii şi n-aş avea dragoste, nu sînt nimic."

1 CORINTENI 13, 2

"And if I have prophetic powers, and understand all mysteries and all knowledge, and if I have all faith, so as to remove mountains, but have not love, I am nothing."

1 Corinthians 13, 2

Pentru familia mea For my family

(3)

The present work has been performed in the time from January 2005 to December 2009 in the Laboratory of Analytical Chemistry and Biopolymer Structure Analysis, Department of Chemistry of the University of Konstanz, under the supervision of Prof. Dr. Dr. h. c. Michael Przybylski.

Special thanks go to:

Prof. Dr. Dr. h. c. Michael Przybylski for the very interesting research topic and discussions concerning my work and for his entire support;

Prof. Dr. Jörg Hartig and Prof. Dr. Valentin Wittmann for writing the second and the third evaluation of the dissertation;

Prof. Dr. Bernhard Schink, Laboratory of Microbial Ecology, Department of Biology, University of Konstanz, for the collaboration on the ″Unknown genome proteomics″ project;

PD Dr. Corinna Hermann, Laboratory of Biochemical Pharmacology, Department of Biology, University of Konstanz, for the collaboration on the Chlamydia pneumoniae project;

Prof. Dr. Iwona Adamska, Laboratory of Physiology and Plant Biochemistry, Department of Biology, University of Konstanz, for the collaboration on the Arabidopsis Thaliana project;

Prof. Dr. Reinhard Zeidler, Helmholtz Zentrum München, for the collaboration on the ″Exosomes proteomics″ project;

All collaborators and co-authors for their valuable contributions to this work: Dr.

Diliana Dancheva Simeonova, Dr. Marilena Manea, Adrian Moise, Dr. Sebastian Bunk, Dr. Irina Perdivara, Dr. Verena Reiser, Dmitry Galetskiy, Eliska

(4)

Svobodova, Rodrigo Villaseñor Molina, Bogdan Bernevic, Dr. Christina Battke, Dr. Andreas Marquardt, Kathrin Lindner;

All members of the group for the inspiring atmosphere;

Last but not least I wish to thank my family.

Mulţumiri speciale pentru cei care mi-au fost mereu alături: părinţii mei – Elisabeta şi Petru Şuşnea, sora şi cumnatul meu – Carmen şi Tudor Schipor, nepoţeii mei – Maria-Eliza şi Ştefan-Teodor Schipor.

(5)

This dissertation has been published in the following peer-reviewed journals, and presented at the following International Conferences.

Publications

[1] Simeonova, D. D.*; Susnea, I.*; Moise, A.; Schink, B.; Przybylski, M.

(2009) "Unknown genome" proteomics: a new NADP-dependent epimerase/dehydratase revealed by N-terminal sequencing, inverted PCR, and high resolution mass spectrometry. Mol Cell Proteomics 8, 122-131.

[2] Bunk, S.*; Susnea, I.*; Rupp, J.; Summersgill, J. T.; Maass, M.;

Stegmann, W.; Schrattenholz, A.; Wendel, A.; Przybylski, M.; Hermann, C.

(2008) Immunoproteomic identification and serological responses to novel Chlamydia pneumoniae antigens that are associated with persistent C.

pneumoniae infections. J. Immunol. 180, 5490-5498.

[3] Galetskiy, D.; Susnea, I.; Reiser, V.; Adamska, I.; Przybylski, M. (2008) Structure and dynamics of photosystem II light-harvesting complex revealed by high-resolution FTICR mass spectrometric proteome analysis. J Am Soc Mass Spectrom. 19, 1004-1013.

[4] Susnea, I; Bernevic, B.; Svobodova, E.; Simeonova, D. D.; Wicke, M.;

Werner, C.; Schink, B.; Przybylski, M. (2010) Mass spectrometric protein identification from two-dimensional gel separation with stain-free detection and visualization using native fluorescence. Int J Mass Spec., in press (doi:

10.1016/j.ijms.2010.06.003).

[5] Susnea, I.*; Bunk, S.*; Wendel, A.; Hermann, C.; Przybylski, M. (2011) Biomarker candidates of Chlamydophila pneumoniae proteins and protein

(6)

fragments identified by affinity-proteomics using FTICR-MS and LC-MS/MS. J Am Soc Mass Spectrom., in press (doi: 10.1007/s13361-011-0082-3).

[6] Wu, B.; Susnea, I.; Chen, Y.; Przybylski, M.; Becker, J. S. (2011) Study of metal-containing proteins in the roots of Elsholtzia splendens using LA-ICP-MS and LC-tandem mass spectrometry. Int J Mass Spec., in press ( doi:10.1016/j.ijms.2011.01.018).

*: both authors contributed equally

Conference presentations

Oral presentations

[1] Susnea, I.; Simeonova, D. D.; Moise, A.; Schink, B.; Przybylski, M. (2009, 26th - 28th of August) "Unknown-genome"-proteomics: A new NAD(P)- dependent pimerase/dehydratase revealed by N-terminal sequencing, inverted PCR and high resolution mass spectrometry. International Workshop "High performance mass spectrometry – new methods & applications in life science", Konstanz, Germany

[2] Susnea, I.; Bunk, S.; Hermann, C.; Przybylski, M. (2009, 27th of September - 1st of October) Identification of Chlamydia pneumoniae antigens by immuno-proteomics using FTICR-MS and LC-tandem mass spectrometry. 1st World Conference on Physico-Chemical Methods in Drug Discovery and Development, Rovinj, Croatia

[3] Susnea, I.; Bunk, S.; Hermann, C.; Przybylski, M. (2009, 13th – 15th of December) Immuno-proteomics identification of Chlamydophila pneumoniae antigens using high resolution mass spectrometry. Workshop "Bioaffinity - Mass Spectrometry in Life Science and Biomedical Analysis", Konstanz, Germany

(7)

Poster presentations

[1] Susnea, I.; Bunk, S.; Hermann, C.; Przybylski, M. (2006) Identification of Chlamydia pneumoniae antigens via immuno-proteomics using high resolution FTICR mass spectrometry, 39th Annual Meeting of the German Society for Mass Spectrometry, Mainz, Germany.

[2] Susnea, I.; Bunk, S.; Hermann, C.; Przybylski, M. (2006) Identification of Chlamydia pneumoniae antigens via immuno-proteomics using high resolution FTICR mass spectrometry, 54th ASMS Conference on Mass Spectrometry, Seattle, USA.

[3] Susnea, I.; Bunk, S.; Hermann, C.; Przybylski, M. (2007) Identification of Chlamydia pneumoniae antigens via immuno-proteomics using high resolution FTICR mass spectrometry, Chemical Genomics Workshop, Braunschweig, Germany.

[4] Susnea, I.; Galetskiy, D.; Reiser, V.; Adamska, I.; Przybylski, M. (2007) Elucidation of structure and dynamics of photosystem II light-harvesting complex assemblies in Arabidopsis thaliana by high-resolution FTICR mass spectrometry, 8th European Fourier Transform Mass Spectrometry Meeting (EFTMS), Moscow, Russia.

[5] Simeonova, D. D.; Susnea, I.; Moise, A.; Przybylski, M.; Schink, B. (2007) Proteins involved in anaerobic phosphite oxidation: combined proteomic and genetic approach, 6th Symposium on remediation focusing on "Microbe-mineral interfaces at heavy metal polluted sites", Jena, Germany.

(8)

[6] Susnea, I.; Simeonova, D. D.; Moise, A.; Schink, B.; Przybylski, M. (2007) Proteomic and genomic elucidation of a new protein involved in anaerobic phosphite oxidation, Congress of the Swiss Proteomics Society, Lausanne, Switzerland.

[7] Moise, A.; Susnea, I.; Simeonova, D. D.; Schink, B.; Przybylski, M. (2008)

"Unknown-genome" proteomics-based identification of a new NAD(P)- epimerase/dehydratase from Desulfotignum phosphitoxidans by inverted-PCR, Edman-sequencing and high resolution mass spectrometry, 30th European Peptide Symposium, Helsinki, Finland.

[8] Susnea, I.; Simeonova, D. D.; Moise, A.; Schink, B.; Przybylski, M. (2008)

"Unknown-genome" proteomics identification of a new NAD(P)-dependent epimerase/dehydratase from Desulfotignum phosphitoxidans bacterium, 3rd ESF Conference on Functional Genomimcs and Disease, Innsbruck, Austria.

[9] Susnea, I.; Simeonova, D. D.; Moise, A.; Schink, B.; Przybylski, M. (2009)

"Unknown-genome" proteomics identification of a new NAD(P)-dependent epimerase/dehydratase from Desulfotignum phosphitoxidans bacterium, 42th Annual Meeting of the German Society for Mass Spectrometry, Konstanz, Germany.

[10] Susnea, I.; Svobodova, E.; Bernevic, B.; Simeonova, D. D.; Schink, B.;

Przybylski, M. (2009) Native fluorescence detection enables stain free protein identification from one- and two-dimensional gel separations, 42th Annual Meeting of the German Society for Mass Spectrometry, Konstanz, Germany.

[11] Susnea, I.; Simeonova, D. D.; Moise, A.; Schink, B.; Przybylski, M. (2009)

"Unknown-genome" proteomics identification of a new NAD(P)-dependent epimerase/dehydratase from Desulfotignum phosphitoxidans bacterium, 18th International Mass Spectrometry Conference, Bremen, Germany.

(9)

TABLE OF CONTENTS

1 INTRODUCTION 1

1.1 Genomics and Proteomics 1

1.2 Methods and concepts of proteome analysis 6

1.2.1 Protein preparation and separation techniques 7 1.2.2 Protein detection in gel electrophoretic separations 10

1.2.3 Protein identification strategies 12

1.3 Mass spectrometric methods for proteome analysis 14

1.4 Analytical challenges in proteomics 17

1.5 Scientific aims of the dissertation 19

2 RESULTS AND DISCUSSION 21 2.1 Development of high resolution mass spectrometric methods in

proteomics 21 2.1.1 High resolution Fourier transform-ion cyclotron resonance mass

spectrometry and two-dimensional gel electrophoresis for proteome analysis 21 2.1.1.1 Principles of high resolution FT-ICR-MS 21 2.1.1.2 Two-dimensional gel electrophoresis for proteome analysis 24 2.1.2 Liquid chromatography – tandem mass spectrometry (LC-MS/MS) for

proteome analysis 28

(10)

2.1.3 Stain-free protein detection and visualization using native fluorescence for mass spectrometric protein identification 30 2.1.3.1 Native fluorescence protein detection for proteome analysis 30 2.1.3.2 Comparison between conventional staining procedures and stain free

visualization using native fluorescence 35

2.1.3.3 Application of native fluorescence detection to protein identification from

two-dimensional gel separations 38

2.2 Biomedical applications of high resolution mass spectrometric proteome analysis 46 2.2.1 Immunoproteomics for identification of protein biomarkers of Chlamydia

Pneumoniae 46

2.2.1.1 Problems in the investigation of Chlamydia Pneumoniae antigenic

protein structures 46

2.2.1.2 Detection of antigenic proteins by two-dimensional gel eletrophoresis

and immunoblotting 49

2.2.1.3 Identification of Chlamydia Pneumoniae antigens by high resolution FT-

ICR-MS and LC-MS/MS 54

2.2.1.4 Identification of neo-antigenic protein fragments by high resolution mass spectrometry 61 2.2.1.5 Interpretation of the results obtained in the C. pneumoniae

immunoproteomics study 67

2.2.2 Application of high resolution mass spectrometric methods to exosomes proteomics 70 2.2.2.1 Composition, localization and biological activity of human exosomes 70

(11)

2.2.2.2 Identification of antigenic proteins by 2D gel eletrophoresis,

immunoblotting and mass spectrometry 72

2.2.2.3 Affinity-mass spectrometry and two-dimensional gel electrophoresis for

exosomal protein identification 80

2.3 Biochemical applications of high resolution mass spectrometric proteomics 85 2.3.1 "Unknown genome" proteomics by high resolution mass spectrometry

and Inverted PCR – a novel approach for protein identification 85 2.3.1.1 Desulfotignum phosphitoxidans - a bacterium with unknown genomic

background 85 2.3.1.2 Detection of phosphite-induced protein spots using two-dimensional gel

electrophoresis and PDQuest software 88

2.3.1.3 Degenerate primers derived from N-terminal protein sequences for

inverted PCR 89

2.3.1.4 High resolution mass spectrometric identification of a new protein, a NAD(P)-dependent epimerase/dehydratase 92 2.3.1.5 Protein structure confirmation by sequence determination of internal

peptides 96 2.3.1.6 Identification of phosphorylations in NAD(P)-dependent

epimerase/dehydratase by direct molecular weight determination of the

intact protein and IMAC 98

2.3.1.7 Summary of the "unknown genome" proteomics results 101 2.3.2 High resolution mass spectrometric proteome analysis of photosystem II

light-harvesting complex of Arabidopsis Thaliana 103

(12)

2.3.2.1 Localization of photosystems I and II in the chloroplast of higher plants 103 2.3.2.2 Composition of PSII light-harvesting complex (LHCII) in Arabidopsis

Thaliana 104

2.3.2.3 2D gel electrophoretic separation and mass spectrometric identification of proteins in LHCII subcomplexes under light-stress conditions 106 2.3.2.4 Summary of the results from the A. thaliana proteomics study 116 3 EXPERIMENTAL PART 119

3.1 Materials and reagents 119

3.2 Sample preparation for proteome studies 120

3.2.1 Preparation of Chlamydia Pneumoniae lysates 120 3.2.1.1 Cultivation and preparation of Chlamydia Pneumoniae protein extracts 120 3.2.1.2 Microimmunofluorescence characterization of human serum samples 121 3.2.2 Protein preparation from Desulfotignum phosphitoxidans 121

3.2.3 Preparation of muscle proteomics samples 122

3.2.4 Preparation of exosomal antibody and cell lysates 123 3.2.5 Acetone precipitation for removal of contaminants and protein

concentration 123 3.2.6 Plant growth under stress conditions and isolation of monomeric and

trimeric LHCII associated with PSII in Arabodopsis Thaliana 124 3.2.7 Isolation of trimeric LHCII subcomplexes associated with PSI attributed

to state transition in Arabodopsis Thaliana 125

(13)

3.3 Chromatographic and electrophoretic separation methods 126 3.3.1 High Performance Liquid Chromatography (RP-HPLC) 126

3.3.2 Gel electrophoresis 127

3.3.2.1 Sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS- PAGE) 127

3.3.2.2 Two-dimensional gel electrophoresis 130

3.3.2.3 Sensitive colloidal Coomassie staining 132

3.3.2.4 Silver staining 133

3.3.2.5 BioAnalyzer Gel reader for native fluorescence detection 135

3.4 Immuno-analytical methods 136

3.4.1 Dot blot assay 136

3.4.2 Western blotting 139

3.4.3 Immobilised metal ion affinity chromatography (IMAC) 141

3.4.4 Immuno-affinity chromatography 143

3.4.4.1 Preparation of antibody columns for affinity-proteomics of exosomes 143 3.4.4.2 Affinity experiments with human cell lysate 146 3.4.5 Polymerase chain reaction (PCR) and inverted PCR (IPCR) 147

3.5 Enzymatic fragmentation of proteins 150

3.5.1 Tryptic in-gel-digestion for Coomassie stained proteins 150 3.5.2 Tryptic in-gel-digestion for Silver stained proteins 151

(14)

3.5.3 Lys-C in-gel-digestion for Coomassie stained proteins 151

3.6 Passive elution of intact proteins 152

3.7 ZipTip procedure for sample concentration and purification 152

3.8 Mass spectrometric methods 152

3.8.1 MALDI-TOF mass spectrometry 153

3.8.2 MALDI-FT-ICR mass spectrometry 154

3.8.3 Liquid chromatographic-ESI-Ion Trap tandem mass spectrometry 155 3.8.4 Linear trap quadrupole (LTQ) orbitrap mass spectrometry 157

3.9 Edman sequence determination 157

3.10 Bioinformatic tools for mass spectrometry 158

3.10.1 GPMAW 158

3.10.2 PDQuest software 159

3.11 Search engines for protein identification using MS and MS/MS data 159

4 SUMMARY 161

5 ZUSAMMENFASSUNG 165

6 REFERENCES 169

7 APPENDIX 194

7.1 Appendix 1 194

7.2 Appendix 2 197

(15)

1 INTRODUCTION

1.1 Genomics and Proteomics

Every organism has a genome that encompasses the entire biological information needed to construct and maintain a living example of that organism.

First introduced by H. Winkler in 1920 [1], the term genome comes from a combination of two other terms: gene and chromosome. Thus, genome can be defined as a complete set of chromosomes and their corresponding genes. Most genomes, including the human genome, are made of deoxyribonucleic acid (DNA), but a few viruses have ribonucleic acid (RNA) genomes. DNA and RNA are polymeric molecules made up of chains of monomeric subunits called nucleotides [2].

DNA is a linear, unbranched polymer in which the monomeric subunits are four nucleotides that can be linked together in any order in chains of different length in units (from hundreds to thousands, or even millions of units). Each nucleotide in DNA has three major components: (i) 2' – deoxyribose which is a pentose (a sugar with five carbon atoms numbered from 1' to 5'), (ii) a nitrogenous base: one of adenine (A), cytosine (C), guanine (G) or thyamine (T), and (iii) a phosphate group, comprising one, two or three linked phosphate units attached to the 5' - carbon of the sugar [2-3]. In a DNA polynucleotide, nucleotides units are linked together by phosphodiester bonds between their 5' - and 3' –carbons from the sugar component.

RNA is also a polynucleotide and has a similar structure to DNA. RNA is different from DNA by: (i) the sugar, which is ribose, and (ii) RNA contains uracil (U) instead of thymine (T) [4]. RNA polynucleotides contain 3' -5' phosphodiester bonds, less stable than the phosphodiester bonds from DNA polynucleotides.

(16)

The double-helix model of DNA structure was first published in 1953 in Nature [5-6]. For its discovery Watson, Crick and Wilkins received the Nobel Prize in 1962. The most common form of DNA double-helix, shared by the majority of living cells, is called the B-form of DNA, where the double-helix is right-handed with about 10 – 10.5 nucleotides per turn [7]. The helix is stabilized by two types of chemical interaction: (i) base-pairing between the two strands: hydrogen bonds between an A of one strand and a T on the other strand, or between a C and a G (the only two base-pair combinations possible are A with T, and G with C [8-9]), and (ii) base-stacking, also known as π-π interactions: hydrophobic interactions between adjacent base pairs (bp). These interactions increase the stability of the double-helix once the DNA strands have been brought together by base-pairing [2].

In eukaryotic cells most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial DNA or mtDNA). An important property of DNA is that it can replicate. Each DNA strand in the double-helix can serve as a pattern for duplicating the sequence of bases. In the replication of DNA some enzymes are required, such as topoisomerase (used to unwind the DNA strand), helicase (for preventing the strand from recoiling), DNA polymerase (copies sequences from nucleic acid templates, but needs a primer to begin attaching the nucleotides on the strand) and ligase (used to reattach the strands with their new nucleotides).

A chromosome is a long, continuous piece of DNA, which contains many genes, regulatory elements and other intervening nucleotide sequences. A gene is the basic physical and functional unit of heredity. Genes encode information essential for the construction and regulation of polypeptides, proteins and other molecules that determine the growth and functioning of the organism. In molecular biology, genes are DNA segments which cells transcribe into RNA and translate, at least in part, into proteins (Figure 1.1). As a result of transcription

(17)

three types of RNA are produced: (i) messenger RNA (mRNA): encodes the amino acid sequence of one or more polypeptides specified by a gene or a set of genes, (ii) transfer RNA (tRNA): reads the information encoded in the mRNA and transfers the appropriate amino acid to a growing polypeptide chain during protein synthesis, and (iii) ribosomal RNA (rRNA): is the central component of the ribosome's protein manufacturing machinery. Transcription is followed by the process of translation. In translation, the mRNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide that will later fold into an active protein. Translation occurs in the cell's cytoplasm, where the large and small subunits of the ribosome are located, and bind to the mRNA.

A single gene can give rise to multiple gene products. Multiple protein isoforms can be generated by RNA processing. The formation of mRNA is only the first step in a long sequence of events resulting in the synthesis of a protein (Figure 1.1) [10]. First, mRNA is subject to posttranscriptional control in the form of alternative splicing, polyadenylation, and mRNA editing [11]. In this step many different protein isoforms can be generated from a single gene. Second, mRNA then can be subject to regulation at the level of protein translation [12]. Proteins, having been formed, are subject to posttranslational modification. It is estimated that up to 200 different types of posttranslational protein modification exist [13].

Proteins can also be regulated by proteolysis and compartmentalization. The average number of protein forms per gene was predicted to be one or two in bacteria, three in yeast, and three or more in humans [14].

(18)

Figure 1.1: Schematic representation of the mechanisms by which a single gene can give rise to multiple gene products. Genes are segments of DNA which cells transcribe into RNA. RNA is processed into mRNA, which is further translated into proteins. Multiple protein isoforms can be generated by RNA processing when RNA is alternatively spliced or edited to form mature mRNA.

mRNA can be regulated by stability and efficiency of translation. Proteins can be regulated by additional mechanisms, including posttranslational modification, proteolysis, or compartmentalization.

The information in the nucleic acid sequence of a gene is read in only one direction, from the 5' end of the coding sequence to the 3' end. Therefore, it is possible to describe gene structure in terms of upstream and downstream. In humans, genes vary in size from a few hundred DNA bases to more than 2 million bases. Completed in 2003, the Human Genome Project has estimated that humans have between 20,000 and 25,000 genes [15]. Every person has two copies of each gene, one inherited from each parent. Most genes are the same in all people, but a small number of genes (< 1 %) are slightly different between people. Alleles are forms of the same gene with small differences in their sequence of DNA bases. These small differences contribute to each person’s unique physical features.

The genetic code comprises the translation table between nucleic acid sequences of genes and amino acid sequences of proteins. The unit of coding is a triplet of nucleotides named a codon. The four nucleotides (in DNA: A, T, G, C;

in RNA: A, U, G, C) can form totally 64 possible triplets, or codons [16].

The study of the global properties of genomes of related organisms is usually referred to as genomics, which distinguishes it from genetics which generally

DNA RNA mRNA Protein

Transcription Processing Translation

Transcriptional regulation

Alternative splicing mRNA editing Polyadenylation

Translational regulation

Proteolysis

Post-translational modification

Compartmentalization

DNA RNA mRNA Protein

Transcription Processing Translation

Transcriptional regulation

Alternative splicing mRNA editing Polyadenylation

Translational regulation

Proteolysis

Post-translational modification

Compartmentalization

(19)

studies the properties of single genes or groups of genes. Structural genomics aims at sequencing the total DNA and at mapping all genes of a given genome [17], while functional genomics describes gene functions and interactions, focusing on gene transcription, translation, and protein-protein interactions [18- 19].

While molecular biology focuses on individual genes, messenger RNA, and proteins, the more recently developed field – bioinformatics – studies the complete collection of DNA (the genome), RNA (the transcriptome) and protein sequences (the proteome). using computer databases and algorithms [19]. A more appropriate definition for bioinformatics is given by the National Institutes of Health (NIH). Bioinformatics is the "research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, analyze or visualize such data" [20]. Three major and publicly accessible databases store large amounts of nucleotide and protein sequence data: (i) GenBank at the National Center for Biotechnology Information (NCBI) of the National Institutes of Health (NIH) [21-23], (ii) DNA Database of Japan (DDBJ) at the National Institute of Genetics [24-25], and (iii) European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database at the European Bioinformatics Institute (EBI) [26-27]. These three databases share their sequence data daily, being coordinated by the International Nucleotide Sequence Database Collaboration (INSDC) [28]. Five years ago, INSDC announced that the total amount of sequenced DNA had reached 100 billion base pairs [19].

The proteome is the entire set of proteins expressed by a genome, cell, tissue or organism. This term was first introduced by M. Williams in 1994 [29]. The study of the proteome, called proteomics, means not only all the proteins in any given cell, but also the set of all protein isoforms and modifications, the interactions between them, the structural description of proteins and their higher-order complexes.

Proteomics would not be possible without the previous achievements of

(20)

genomics, which provided the "blueprint" of possible gene products that are the focal point of proteomics studies [30].

1.2 Methods and concepts of proteome analysis

Unlike DNA where polymerase chain reaction (PCR) has revolutionized the isolation and amplification of nucleic acids [31], proteomics is confronted with an entire range of problems, such as sample degradation, variable and limited sample material, vast dynamic range, a large scale of post-translational modifications, almost boundless tissue, developmental and temporal specificity, and disease and drug perturbations [30,32]. Proteomics had to evolve to be able to address these problems. The real starting point in the development of proteomics was after the classical separation method - two-dimensional gel electrophoresis (2D) - was joined by “soft” ionization techniques of mass spectrometry - matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI).

A classical proteomics experiment includes several steps such as (i) sample preparation, (ii) separation of proteins by gel electrophoretic methods, (iii) detection of proteins using different staining procedures, (iv) computer-assisted data analysis, and (v) protein identification and characterization [32]. Figure 1.2 shows a schematic representation of the major mass spectrometry (MS)-based techniques, which are used to investigate proteomics systems in the present thesis. The central pillar for all experiments is two-dimensional gel electrophoresis – a powerful and efficient tool used to reduce system complexity, coupled with high resolution mass spectrometry for the identification of specific proteins.

(21)

Figure 1.2: Schematic view of the analytical and immunological methods used in this thesis for protein identification and characterization. After 2D gel separation proteins are visualized by staining methods such as Coomassie and silver, or by "stain-free" native fluorescence. Protein spots are excised from gels, digested usually with trypsin, and digestion mixtures are analysed by mass spectrometry. Upon database search proteins are identified. Following 2D gel electrophoresis proteins can also be transferred onto membranes for Edman sequencing or for Western blotting experiments. For exact mass determinations proteins are eluted from gels, destained and measured by mass spectrometry.

1.2.1 Protein preparation and separation techniques

Sample preparation is one of the most crucial processes in proteomics research.

The results of the experiment depend on the condition of the starting material.

Therefore, the proper experimental model and careful sample preparation are vital to obtain significant and reliable results, especially in comparative proteomics area, where minor differences between experimental- and control samples are investigated [33]. Sample preparation involves tissue and cell lysis,

IEF

SDS-PAGE

Protein Separation

Protein visualization:

- Coomassie Blue - Silver

- Native fluorescence Transfer on membrane for:

- Edman sequencing - Western blotting

Protein elution

Spot excision

In-gel digestion with different proteases

Mass spectrometric evaluation

Database search

Protein identification

IEF

SDS-PAGE

IEF

SDS-PAGE

Protein Separation

Protein visualization:

- Coomassie Blue - Silver

- Native fluorescence Transfer on membrane for:

- Edman sequencing - Western blotting

Protein elution

Spot excision

In-gel digestion with different proteases

Mass spectrometric evaluation

Database search

Protein identification

(22)

protein extraction, protein solubilization and protein quantitation. Each protein sample is unique, so conditions must be optimized to ensure that contaminants (e.g. nucleic acids, lipids, carbohydrates, phenolics, detergents, salts) are removed, all proteins are effectively solubilized, and proteins are analyzed at appropriate concentrations [34]. For contaminants removal proteins can be precipitated using ammonium sulfate, acetone, trichloroacetic acid (TCA), TCA + acetone or ethanol [35]. When solubilizing difficult proteins like membrane proteins the presence of strong detergents may be required to make them soluble in aqueous solution [36]. To assess protein concentration, assays like Bradford or bicinchoninic acid (BCA) are usually employed.

Protein separation can be performed using chromatography (as a non-gel-based approach) or gel electrophoresis [37]. With chromatography techniques (e.g. ion exchange, reverse-phase) high selectivities can be achieved, but these methods, from a purely proteomics point of view, present serious disadvantages like sample loss (strongly interacting proteins can be irreversibly adsorbed onto the chromatographic surface), or extended times for analysis (pure fractions may need more than only one chromatographic run). Therefore, chromatography may not be employed as the sole separation method in proteome analysis, but can be used as a complementary technique to electrophoresis, in order to reduce the complexity of the protein sample [32]. In this thesis reverse-phase high performance liquid chromatography (RP-HPLC) has been used in conjunction with two-dimensional gel electrophoresis (2D), mass spectrometry (MS) and Edman sequencing to obtain protein sequence information data for a protein coming from an organism with unknown genomic background (see 2.3.1).

Two gel-separation methods are mostly used to separate protein mixtures. When working with simple protein mixtures (<100 components), one-dimensional gel electrophoresis (1D) is employed. Complex protein mixtures, such as total cell lysates, require the use of the highly resolving two-dimensional gel electrophoresis (2D). Introduced in 1975 by O’Farrel and Klose [38-39], 2D gel

(23)

electrophoresis is able to separate more than 10,000 different compounds [37] by two distinct properties: isoelectric point (pI) in the first dimension (isoelectric focusing, IEF), and molecular weight in the second dimension (sodium dodecyl sulfate-polyacrylamide gel electrophoresis, SDS-PAGE) [40]. With the introduction of immobilized pH gradient (IPG) strips the first dimension became more straightforward [41]. Another great benefit of the immobiline technique is that gels with very narrow pH ranges can be prepared to separate proteins that differ by less than 0.01 pH unit in their pI [42]. In IEF proteins move to their isoelectric point in a pH gradient where they lose their net charge and thus their electrophoretic mobility. In the second dimension (SDS-PAGE) all proteins are loaded with SDS forming negatively charged complexes, and move in the direction of anode in an electric field, being separated in the polyacrylamide gel matrix according to their size [32]. With 2D highly resolved protein patterns can be obtained which can be used further for protein identification.

One of the strengths of 2D gel electrophoresis is the ability to resolve proteins that have undergone post-translational modifications. This resolution is possible in 2D because many types of protein modifications confer a difference in charge as well as a change in mass on the protein. One such example is protein phosphorylation. Frequently, the phosphorylated form of a protein can be resolved from the nonphosphorylated form by 2D. In this case, a single phosphoprotein will appear as multiple spots on a 2D gel [43]. In addition, 2D can detect different forms of proteins that arise from alternative mRNA splicing or proteolytic processing. However, this technique has its limitations [44-46].

Membrane proteins are poorly soluble and are subject to interferences in 2D because of the high quantity of lipids that they contain. Also, large or low abundant proteins, and those of a highly acidic or basic nature may be absent on a 2D gel. Other methods like 1D or chromatography may be required to separate and identify these proteins [47].

(24)

1.2.2 Protein detection in gel electrophoretic separations

Gel-separated proteins can be visualized by different staining procedures such as Coomassie Brilliant Blue, silver, fluorescent staining, or more recently by the natural flurescence exhibited by the aromatic amino acids [48-51]. Also post- translational modifications are targeted by special fluorescent dyes as for example, Pro-Q® Diamond which is used for specific detection of phosphorylated proteins or Pro-Q® Emerald for detection of glycosylated proteins [52-53].

Though a wide range of organic dyes have been used to visualize proteins in polyacrylamide gels, Coomassie Blue dyes (R and G) are often employed in staining due to their low cost, ease of use and very good compatibility with microchemical characterization methods, such as mass spectrometry. During staining with sensitive colloidal Coomassie Blue an equilibrium is achieved between colloidal particles and freely dispersed dye in solution. The low concentration of free dye penetrates the gel matrix and preferentially stains the proteins, but the colloidal dye particle is excluded from the gel, thus preventing matrix staining. This dye provides a linear response with the protein amount over a 10 – 30-fold. The principal limitations associated with Coomassie Blue-based staining methods are the poor detection sensitivity and the small linear dynamic range obtained with these dyes [54].

Much more sensitive than Coomassie, silver staining techniques are based upon saturating the gels with silver ions, washing the less tightly bound metal ions out of the gel matrix and reducing the protein-bound metal ions to form metallic silver [55-57]. Silver staining is accomplished using silver nitrate in combination with formaldehyde developer in alkaline carbonate buffer or using an ammonia–silver complex in combination with formaldehyde developer in citrate buffer. Silver stain methods have a very low detection limit (around 1 nanogram of protein), but the linear dynamic range of the stain is restricted to a 10-fold range. Though the best silver stain methods use aldehyde-based fixatives prior to silver impregnation,

(25)

this prevents subsequent microchemical analysis by Edman sequencing or mass spectrometry. Therefore, alternative silver staining methods that omit aldehydes in the fixatives must be employed [58-59]. With the modified staining techniques, detection sensitivity is poorer and background staining less uniform. Additionally, destaining methods are often used to improve the compatibility of silver staining with peptide mass profiling methods [60].

An alternative to classical staining procedures such as Coomassie and silver is represented by the fluorescent detection of proteins in gel electrophoretic separations. This type of detection is used lately more often, particularly in laboratories engaging in large scale proteomics research. For example, Sypro Red and Sypro Orange dyes can detect proteins in SDS–polyacrylamide gels using a simple, one-step staining procedure that requires only 30 – 60 min to complete and does not involve a destaining step [61-63]. With detection limits around 5 - 10 ng, fluorescent staining rivals the sensitivity of silver staining technique. Unfortunately, both SYPRO Orange and SYPRO Red dyes require 7% acetic acid in the staining solution, which is problematic when electroblotting, electroeluting or measuring enzyme activity is indicated. Moreover, these fluorescent dyes are expensive.

In addition to Coomassie and silver staining, the native fluorescence of aromatic amino acids was used in this thesis to visualize protein spots in 1D and 2D gel separations. The native fluorescence of a protein sums up the individual fluorescences of the aromatic amino acids. Therefore, proteins with low amounts of aromatic amino acids show low sensitivity in this procedure. Using the native fluorescence of aromatic protein amino acids with UV transmission at 343 nm as a fast gel imaging system [64], unstained visualized protein spots were localized within 1D and 2D gels. With special tools protein spots were excised and used for protein identification. This system proved to be a very efficient tool for MS-based proteome analysis.

(26)

1.2.3 Protein identification strategies

After protein separation using gel electrophoretic techniques, detection and localization by different staining procedures, PDQuest analysis can be employed in the comparison of different gel patterns. Using PDQuest software (Bio-Rad) removal of background noise, gel artifacts, and horizontal or vertical streaking from the gel electronic images can be achieved, as well as spot matching, spot counting and spot density quantification. With all these features it is possible to compare hundreds of gels in one analysis set. An example of such a comparison is shown in Figure 1.3. Two different 2D gels were compared by PDQuest. In a first step one of these gels was chosen as standard. In the analysis window the first gel which is shown is the standard (gel 33), followed by the gels which were compared (gel 11 and gel 33). Using the spot review tool it was possible to localize spot SSP 5301 only in gel 33, this spot being absent in gel 11.

Figure 1.3: Two different 2D gels prepared with cell lysates from Desulfotignum phosphitoxidans bacterium were compared using the PDQuest software. The first gel shown is the standard gel (gel 33), followed by the gels which were compared – gel 11 and gel 33. With the "Spot Review Tool" it was possible to detect spot SSP 5301 (the number was given by the software) in gel 33 only, the spot being absent from gel 11.

SSP 5301 SSP 5301

SSP 5301 SSP 5301

(27)

In the classical proteomics experimental set-up after proteins are detected, they are excised from gels, digested with different proteases (mostly trypsin) and enzymatic mixtures are further measured by mass spectrometry (MS) using two approaches for protein identification: (i) peptide mass fingerprinting (PMF), which involves determination of molecular weights of all peptides in the digest, and (ii) fragmentation of selected peptides (parent ions) inside the mass spectrometer into series of sequence-diagnostic ions (using tandem mass spectrometry). A general scheme of the experimental approach used for protein identification from gel separations is shown in Figure 1.4.

Figure 1.4: Schematic view of the experimental set-up used for protein identification from a 2D gel. Spots of interest are excised from gels and digested with trypsin. The resulting peptides are analyzed by MS peptide mass fingerprinting or MS/MS peptide sequence tagging, and obtained masses are used for database search, resulting in protein identification.

From the obtained product ions, a part of the amino acid sequence of the peptide ("sequence tag") can be obtained. All peptide-mass fingerprints, product-ion data

Excision of gel bands

2D gel

Proteolytic digestion Extraction

Peptide fragments

Mass spectrometer Peptide mass mapping

Selected Ion G L F E

MS/MS Peptide CID Spectrum

Sequence Tag Database

search

Protein Identification

Excision of gel bands

2D gel

Proteolytic digestion Extraction

Peptide fragments

Mass spectrometer Peptide mass mapping

Selected Ion G L F E

MS/MS Peptide CID Spectrum

Sequence Tag Database

search

Protein Identification

(28)

or peptide-sequence tags are used to search a protein sequence database to identify the protein of interest [65]. The identification is made by comparing experimental mass spectrometric data with theoretical data calculated for each database entry. A list of candidate proteins that most closely match the input data is generated by the search, and the candidate proteins are ranked by various scoring algorithms. The search can be limited by different constraints (e.g.

choose certain taxonomy). Several protein sequence databases are publicly available. A very good annotated database is Swiss-Prot (maintained by The Swiss Institute of Bioinformatics and The European Bioinformatics Institute [66]).

The main advantages of Swiss-Prot are the high degree of annotation and the low redundancy. Other search engines can also be accessed online and are free of charge, for example the ProFound program [67] at the PROWL server [68], MS-Fit and MS-Tag at the Protein Prospector server [69], or MASCOT at the Matrix Science server [70]. The number of peptides observed in the peptide mass fingerprint and the accuracy to which they are measured determines the confidence of the protein identification. Identification selectivity can be increased with the use of very accurate masses provided by Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR-MS).

1.3 Mass spectrometric methods for proteome analysis

Mass analysis requires that peptides are ionized so that they can travel according to their mass-to-charge ratio (m/z) in an electric or magnetic field. Hence, mass spectrometers consist of an ionization source, a mass analyzer and detectors which detect the ions that have traveled through the analyzer. A scheme of the main components of a mass spectrometer is shown in Figure 1.5.

(29)

Figure 1.5: Scheme showing the main parts of a mass spectrometer: (i) the ion source - matrix- assisted laser desorption/ionisation (MALDI) or electrospray (ESI), (ii) the mass analyzer - time- of-flight (TOF), Fourier transform-ion cyclotron resonance (FT-ICR), ion-trap, quadrupole, (iii) the detector, and (iv) the computer. MALDI is usually coupled to TOF analyzers, whereas ESI is mostly coupled to ion traps and triple quadrupole instruments and used to generate fragment ion spectra (collision induced dissociation (CID) spectra).

For most proteomics experiments, peptides are ionized either by electrospray (ESI) [71] or by matrix-assisted laser/desorption and ionization (MALDI) [72].

Because of the lack or minimal extent of analyte fragmentation during the ESI and MALDI processes, they are also referred to as "soft" ionization methods [73].

For ESI, an electric field at the sample exit causes a fine spray of droplets to form. Eventually, nanometer-sized droplets, often multiply charged, are produced for entry into the mass analyzer. For MALDI, the peptides are cocrystallized with an organic, light-absorbing matrix (e.g. α-cyano-4-hydroxy cinnamic acid or sinapinic acid) which, when activated by a laser, ionizes each peptide as it enters the gas phase. For the development of ESI and MALDI, Fenn and Tanaka received the Nobel Prize for Chemistry in 2002.

The mass analyzer has a crucial importance in proteomics, its key parameters being sensitivity, resolution, mass accuracy and the ability to generate information-rich ion mass spectra from peptide fragments (tandem mass or MS/MS spectra) [73-75]. Four basic types of mass analyzer are currently used in proteomics research: ion trap, time-of-flight (TOF), quadrupole and Fourier transform-ion cyclotron resonance (FT-ICR). They differ in design and

Mass analyzer Detector Computer Ion source

MALDI ESI TOF FT-ICR Ion-trap Quadrupole

Mass analyzer Detector Computer Ion source

MALDI ESI TOF FT-ICR Ion-trap Quadrupole

(30)

performance, each having its own strength and weakness. In ion-trap analyzers, the ions are first trapped for a certain time interval and are then subjected to MS or MS/MS analysis. Ion traps are quite advantageous, being robust, sensitive and relatively inexpensive. But they have also low points. One disadvantage of ion traps is their relatively low mass accuracy, due in part to the limited number of ions that can be accumulated at their point-like centre before space-charging distorts their distribution and thus the accuracy of the mass measurement. The linear or two-dimensional ion trap [76] is a more recent development, where ions are stored in a cylindrical volume that is considerably larger than that of the traditional, three-dimensional ion traps, allowing increased sensitivity, resolution and mass accuracy. The FT-MS instrument is also a trapping mass spectrometer, although it captures the ions under high vacuum in a high magnetic field. Its main advantages are high sensitivity, mass accuracy, resolution and dynamic range [77-80]. But high maintenance costs, operational complexity and low peptide-fragmentation efficiency of FT-MS instruments have limited their routine use in proteomics research.

Usually, MALDI is coupled to TOF analyzers that measure the mass of intact peptides, whereas ESI has mostly been coupled to ion traps and triple quadrupole instruments and used to generate fragment ion spectra (collision induced (CID) spectra) of selected precursor ions [73].

Tandem mass spectrometers have the ability to fragment peptide ions and to record the resulting fragment ion spectra. For tandem mass spectrometers such as triple quadrupole, ion trap, or quadrupole/TOF instruments, fragment ion spectra are generated by collision induced dissociation (CID) in which the peptide ion to be analyzed is isolated and fragmented in a collision cell, and the fragment ion spectrum is recorded. Typically, these types of mass spectrometers are used in conjunction with ESI. For the most part, the low-energy CID spectra of peptides generated by ESI-MS/MS are of high quality and are sequence specific.

Each peptide tandem mass spectrum will contain mainly b and y ions, but also

(31)

other fragment ions that can be used to interpret the amino acid sequence.

These include diagnostic ions generated by the neutral loss of specific groups from amino acid side chains (e.g. the loss of ammonia (-17 u) from glutamine, lysine, and arginine or of water (-18 u) from serine, threonine, aspartic acid and glutamic acid) and low mass ions that result from the fragmentation of amino acids down to a basic unit consisting of the side chain residue and an immonium functionality. Due to the complexity of resulted spectra, the time required for the analysis of a single peptide by CID becomes a limiting factor in proteome studies if each ion has to be manually identified and selected. Therefore, protocols for automated, instrument-controlled selection of precursor ions have been developed. In these methods, ion selection for CID is under computer control and based on signals observed in the full-scan mass spectrum [73,81-82].

1.4 Analytical challenges in proteomics

Identification of post-translational modifications (PTM), particularly of protein phosphorylation, is still one of the main challenges for researchers. Two counteracting enzyme systems, kinases and phosphatases, catalyze protein phosphorylation and dephosphorylation, respectively. There are assumed to be hundreds of protein kinases/phosphatases differing in their substrate specificities, tissue distribution, kinetic properties, and association with regulatory pathways.

All experiments reported in literature, involving the identification of phosphorylation sites, first attempt to purify the phosphorylated protein before the protein is enzymatically fragmented and the phosphopeptides are isolated and analyzed by mass spectrometry. Usually, proteins are frequently phosphorylated to low stoichiometry (phosphopeptides are present in the sample in very small amounts as compared to the nonphosphorylated peptides with the same sequences) and at multiple sites (resulting in differentially phosphorylated forms of the same protein). Therefore, it is quite difficult to isolate sufficient material for analysis by MS methods. In such cases, selective enrichment of

(32)

phosphopeptides by immobilized metal ion affinity chromatography (IMAC) can be employed [83]. This technique involves chelation of metals such as Fe3+ or Ga3+ onto a chromatographic support consisting of iminodiacetic acid (IDA) or nitrilotriacetic acid [84-85]. Phosphopeptides are acidic because of the phosphate group and bind with some selectivity over nonphosphopeptides. Fractions enriched for phosphopeptides are afterwards eluted by phosphate or increased pH. While the method is somewhat selective for phosphopeptides, other peptides (particularly those containing strings of acidic amino acids or histidine) are also enriched, generating the peptides known as acidic peptides that bind additionally to the IMAC column.

In this thesis the characterization of the phosphorylated pattern of a new protein - NAD(P) dependent epimerase/dehydratase from Desulfotignum phosphitoxidans bacterium - has been accomplished using a combination of 2D, IMAC and high resolution mass spectrometry. Upon MALDI-FT-ICR-MS, masses corresponding to tryptic fragments of the protein with or without phosphopeptides enrichment were compared, and two peptides with different phosphorylation degrees were identified. Thus, it could be explained the 2D pattern of this protein with spots having the same molecular weight but different pI values.

Another analytical challenge in proteomics studies comes from the uncharacterized genomes. The genomic DNA databases represent a key element in the identification of proteins. In case of unsequenced genomes suitable derivatization approaches [86], and/or "de novo" identification [87] are typically required for protein identification. Only a few studies have been published on the issue of unknown genome proteomics.

In this thesis several techniques were used, in a novel approach, to tackle the problem of protein identification from Desulfotignum phosphitoxidans, a bacterium with unknown genome (see 2.3.1). Specific proteins separated by 2D gel electrophoresis and electroblotted onto PVDF membranes were used for

(33)

Edman sequencing. Based on N-terminal sequences degenerate primers were designed, and were used for inverted polymerase chain reaction (IPCR) experiments. IPCR revealed several possible open reading frames (ORF) candidates coding for the protein under investigation. MALDI-FT-ICR-MS measurements of tryptic peptide fragments provided the unambiguous identification of a new protein, a NAD(P)-epimerase/dehydratase, by specific assignment of peptide masses to a single ORF, excluding other possible ORF candidates. The protein identification was ascertained by chromatographic separation (RP-HPLC) and sequencing of internal proteolytic peptides [88].

1.5 Scientific aims of the dissertation

Proteome analysis is most commonly accomplished by a combination of two- dimensional gel electrophoresis (2D) to separate and visualize proteins, and mass spectrometry (MS) for protein identification. In addition to these well known techniques a couple of new techniques have been used in this thesis for protein identification and characterization.

In summary, the main goals of the dissertation are summarized as follows:

ƒ Development of high resolution mass spectrometric methods in proteomics (subchapter 2.1.3)

A new "stain-free" approach based on the native fluorescence of aromatic amino acids present in proteins was successfully used for detection, localization and mass spectrometric identification of proteins from one- and two-dimensional gel electrophoretic separations.

ƒ Application of high resolution mass spectrometry methods to Chlamydia Pneumoniae immunoproteomics (subchapter 2.2.1)

Using a combination of 2D gel electrophoresis, immunoblotting and high resolution mass spectrometry new antigenic protein structures were

(34)

identified for Chlamydia Pneumoniae. These protein biomarker candidates may be useful for serodiagnosis and future drug development.

ƒ Application to exosomes proteomics (subchapter 2.2.2)

For the identification of specific proteins recognized by an exosomal antibody, two different approaches were used. In a first approach 2D gel electrophoresis was combined with immunoblotting and mass spectrometry, while in a second experimental set-up affinity-mass spectrometry was successfully used together with 2D gel electrophoresis.

With both approaches the same protein was identified: glyceraldehyde-3- phosphate dehydrogenase (GAPDH).

ƒ "Unknown genome" proteomics (subchapter 2.3.1)

A new combination has been developed for the identification of proteins with unknown genomic background. In this novel approach, high resolution mass spectrometry was successfully combined with inverted PCR (IPCR), Edman sequencing and high resolution mass spectrometry, and led to the identification of a new protein, a NAD(P)-dependent epimerase/dehydratase from Desulfotignum phosphitoxidans bacterium.

Direct molecular weight determination by passive elution and MALDI-TOF mass spectrometry confirmed the phosphorylated state of the protein.

ƒ High resolution mass spectrometric proteome analysis of photosystem II light-harvesting complex of Arabidopsis Thaliana (subchapter 2.3.2)

Using 2D gel electrophoresis and MALDI-FT-ICR mass spectrometry proteins from three light-harvesting subcomplexes were successfully identified, and modifications by oxidation were found.

(35)

2 RESULTS AND DISCUSSION

2.1 Development of high resolution mass spectrometric methods in proteomics

2.1.1 High resolution Fourier transform-ion cyclotron resonance mass spectrometry and two-dimensional gel electrophoresis for proteome analysis

2.1.1.1 Principles of high resolution FT-ICR-MS

Fourier transform-ion cyclotron resonance (FT-ICR) mass spectrometry is probably the most complex method of mass analysis. First described by Comisarow and Marshall in 1974 [89-90], FT-MS has been reviewed later by Marshall and Amster [77,91]. Interest in FT-MS for peptide and protein analysis has arosen only since electrospray (ESI) and matrix-assisted laser desorption (MALDI) have been used for ionization [92-93]. Typically, the FT-ICR-MS instrument consists of five major components: (i) an external ion source (e.g.

electrospray - ESI, matrix-assisted laser desorption ionization - MALDI), (ii) a magnet, (iii) an analyzer cell, (iv) an ultra-high vacuum system, and (v) a complex system for data acquisition and analysis. FT-ICR is a type of mass analyzer for determining the mass-to-charge ratio (m/z) of ions based on the cyclotron frequency of the ions in a fixed magnetic field. Ions are usually generated externally in a separate ion source and then injected in the cell, which is typically cubical or cylindrical in geometry. In the ICR cell (which is located in the homogeneous region of a magnet) ions are trapped and constrained to move in circular orbits (Figure 2.1).

(36)

Figure 2.1: Schematic representation of a cubic ICR cell. Coherent motion of ions in the cell induces an image current in the receiver plates. The time domain signal subjected to a Fourier transform algorithm yields a mass spectrum.

The ICR cell has a strong magnetic field (typically 4.7 to 9 Tesla) and consists of three parallel plates arranged in a cube: trapping, excitation (transmitter) and receiver (detection) plates. In order to trap the ions in the ICR cell a small voltage (+ or - 1 - 2 V, for positive or negative ions) is applied between the trapping plates, perpendicular to the magnetic field. When the ions enter the cell (ion trap) pressures are in the range of 10-10 to 10-11 mBar (for high resolution high vacuum is required). When the ions pass into the strong magnetic field they are bent into a circular motion (cyclotron motion) in a plane perpendicular to the field by the Lorentz Force, FL (see Equation 1). This force is contrabalanced by the centrifugal force FZ, which depends on the mass of the ion, its velocity and the radius of the cyclotron orbit (see Equation 2).

FL = zv X B (1)

(37)

where FL is the Lorentz force observed by the ion when entering the magnetic field, B is the magnetic field strength, v is the incident velocity of the ion, z is the charge of the ion.

FZ = mv2/r (2) where FZ is the centrifugal force, m is the mass of the ion, v is the ion velocity, and r is the radius of the cyclotron orbit.

Ions are prevented from processing out of the cell by the trapping plates at each end (see Figure 2.1). The frequency of rotation of the ions is dependent on their m/z ratio (see Equation 3).

ωc = (zB)/m (3) where ωc is the ion cyclotron frequency and m represents the mass of the ion. All

ions of a given mass-to-charge ration (m/z) have the same ICR frequency independent of their velocity, which is one of the fundamental reasons why FT- ICR-MS is able to achieve ultra-high resolution [94]. When the magnetic field strength B is constant all ions of the same m/z ratio have the same frequency, but may have different velocities. An ion with higher velocity has a cyclotron orbit with a larger radius than one with lower velocity. If the radius becomes larger than that of the cell, the ion is expelled.

The movement of ions is influenced both by the magnetic field (cyclotron motion) and the electric field (trapping motion), these two fields giving rise to the magnetron motion of the ion. For ion excitation an external oscillating field (excite pulse) is applied, which is transmitted by a sine wave signal generator via the two transmitter plates (see Figure 2.1). If the frequency of the external oscillating field equals the cyclotron frequency, all ions with a particular m/z are accelerated to a larger orbit radius. After excitation these ions move as a single ion packet on an

(38)

orbit with a radius, which is independent of the original velocity of the ions. Each ion packet emits a resonance frequency signal to the receiver plates at its characteristic cyclotron frequency. This signal is amplified, converted to a digital signal, and added into the memory of the computer. A quench pulse at the end of each scan clears the ICR cell from previously detected ions. These events can be repeated as fast as 100 times per second. The resultant summed digital data (known as the time domain spectrum: the signal intensity recorded versus time) are then subjected to a mathematical algorithm, the Fourier transformation, which ends up with a frequency domain spectrum. This transformation is performed in order to obtain the frequency components representing each m/z value in the ICR cell. A frequency-to-mass conversion then yields the corresponding mass spectrum [95].

In the present work MALDI was used as an external ionization source in the FT- MS process. MALDI-FT-ICR mass spectrometry has become a powerful tool when investigating complex protein mixtures. Coupling of two-dimensional gel electrophoresis (2D), as an efficient separation method for proteins, with MALDI- FT-ICR-MS or liquid chromatography – tandem mass spectrometric analysis has been shown to be a successful combination in the proteomic field, leading to fast and unambiguous protein identification [96-101].

2.1.1.2 Two-dimensional gel electrophoresis for proteome analysis

For protein identification high resolution mass spectrometry is usually used in combination with two-dimensional gel electrophoresis (2D). 2D is one of the most powerful and widely used method for the analysis of complex protein mixtures extracted from tissues, cells, or other biological samples. 2D separates proteins according to two completely independent physico-chemical parameters of proteins, isoelectric point and size. Several steps are typical for a 2D experiment:

(i) sample preparation, (ii) rehydration, (iii) first dimension - isoelectric focusing

(39)

(IEF), (iv) equilibration, (v) second dimension - sodium dodecyl sulphate- polyacrylamide gel electrophoresis (SDS-PAGE), (vi) protein visualization, and (vii) gel analysis. Following protein separation, FT-ICR mass spectrometric analysis leads to protein identification, only if all steps from the 2D experiment were correctly performed. So, optimization work has to be performed for each individual protein sample, such as determining how much total protein is needed for a certain type of staining or what solution to use for contaminants removal.

For 2D separation proteins have to be completely soluble under electrophoresis conditions and protein contaminants must be removed with appropriate tools [102]. To remove the contaminants (e.g. salts, nucleic acids, lipids) that might interfere with electrophoresis and lead to background streaking [34,50] some techniques have to be employed, such as dialysis, gel filtration and protein precipitation.

The classical acetone precipitation method and the 2D clean-up kit (Amersham Biosciences) were used in this thesis in sample preparation. When acetone is employed contaminants such as detergents, salts and lipids remain in solution after the completion of the precipitation procedure [103-105]. In addition to contaminants removal, protein precipitation helps inhibiting protease activity (proteases that may be liberated upon cell disruption and can complicate the 2D results) and in the concentration of low abundant proteins. Another precipitation method is represented by the 2D Clean-up procedure. It is easy to perform and can be used with almost any protein sample to generate improved 2D results by reducing streaking, background staining, and other gel artifacts. Additionally, the kit concentrates proteins from samples that are too dilute, allowing for higher protein loads that can improve spot detection. Two solutions, “precipitant” and

“co-precipitant”, are provided with the kit and are used to precipitate proteins from crude cell extracts. After precipitation the proteins are pelleted by centrifugation and the precipitate is washed to further remove nonprotein contaminants. The resultant pellet can be easily resuspended into a 2D sample solution. One

Referenzen

ÄHNLICHE DOKUMENTE

Domestically, the bans had less or little impact, although the fact that returns do not appear to show a decline may be seen as evidence that the bans stemmed further

France is running on fumes, while the UK is choosing to be less engaged suffering from a justified “Bruxelles fatigue.” And the Mediterranean countries

Nach dem Ende der German Open wird das Team mit der Entwicklung einer neuen Generation von Software beginnen, die derTU Graz in einem Jahr vielleicht den Europameister- oder gar

Not only does this formula relate the two concepts of derivatives of set-valued maps, but it enjoys many applications, some of them being mentioned in this

Since Wingless activity was recently shown to depend on a lipid raft dependent pathway (Zhai et al., 2004), we analyzed the Wingless expression pattern in

The 2-DE at the pI region of 4.0 to 5.5 revealed the presence of 14 protein spots with molecular masses between 20 and 35 kDa for the LHCII subcomplex 1, including five isoforms

A commonly used method of analyz- ing protein carbonyls is labeling with 2,4-dinitrophenyl hydrazine, thus forming a stable dinitrophenylhydrazone (ONP) product. ONP- labeled

All components of the 20S proteasome of mouse intestine separated by 2-D gel electrophoresis were unambiguously identi- fied by UV-MALDI FTICR-MS following tryptic in-gel