• Keine Ergebnisse gefunden

1.4 Mass spectrometry based proteomics

1.4.3 Protein identification and quantification

After data acquisition, peptide m/z-ratios and fragments need to be identified and attributed to a protein sequence. In DDA shotgun measurements, mainly fragment spectra are used for identification. Several search algorithms have been developed, the most common ones are Mascot, Sequest or Andromeda181. They employ different ways and means on how to assign a spectrum to a peptide sequence, which are all based on similar principles. First, several search parameters need to be specified. Spectra are searched against one or more protein databases of the respective organism (of the sample). If the sample was digested with an enzyme, the protease can be specified.

Furthermore, peptide modifications of interest or those expected due to respective sample preparation are included. They can be fixed, meaning that they have to occur in a peptide, which is reflected by the addition of this mass to each peptide mass. They can also be defined as variable, which leads to an increase in search space as each peptide is searched in modified and unmodified version. The database is now digested in silico to generate potential peptides, which are expected to occur in the sample. This leads to theoretical spectra that are then compared with the actual, acquired data. Mascot and Andromeda compute a probability if the detected matches between theoretically calculated and experimentally determined fragment masses might be a random hit. A peptide spectrum match (PSM) is then assigned to the protein, which contains this peptide. Some peptides can occur in several proteins which makes protein identification difficult. A peptide occurring in only one protein is called unique. Peptides preferentially occurring in one protein group are called razor.

As the whole data analysis is based on heuristic and probability scores, quality of peptide and protein identifications is evaluated by the so-called false discovery rate (FDR). The rate is an estimation of the percentage of false matches in an experiment. Therefore, spectra are also searched against a decoy database, i.e. a database containing reversed or scrambled peptide sequences. If fragment spectra match one of these decoy-hits, it has to be a false hit by definition.

A common assumption is that random hits in the target space occur at a similar rate like the decoy hits182. Then, the proportion of decoy hits in all PSMs is determined. This is continued until a defined fraction of decoy hits is reached (often 1%). All matches counted so far are accepted, all other PSMs are discarded as false positives. The same principle applied on peptide and protein level allows local and global FDR calculation183-185. The classic approach in the community leads to an overestimation in large data sets, why currently efforts are made to overcome this issue186, 187.

The search engine Andromeda was used for this thesis and is implemented in the MaxQuant software181, 185, 188. The implementation of Andromeda into MaxQuant combines easy and robust peptide and protein identification and quantification in one user interface and is, therefore, easily applicable for large-scale datasets. Scoring is based on binominal distributions that can also be applied to decide on the probable localization of a peptide side-chain modification. The more fragment masses of the actual experiment spectrum match the theoretically calculated masses, the lower is the probability of this being a random hit189. The Andromeda score is then defined as -10x log10 of the probability of matching a least number of experimentally determined fragment masses out of the theoretical masses by chance. This also takes the highest intensity peaks per 100 Da m/z window into account. The higher the score, the more likely it is that the peptide spectrum match did not occur by chance. Moreover, Andromeda can also distinguish co-fragmented peptides (e.g. one spectrum with information on two peptide series) by a second peptide search option181.

21 Quantification

Accurate quantification of peptides and proteins by mass spectrometry, even though still facing challenges, has been implemented in many workflows133, 190. Proteolytic digestion and ionization influence the abundance of peptides and, therefore, potentially alter the true quantitative amounts in cells or organisms. Direct comparison of individual peptides between experiments addresses this challenge. Various options exist in proteomic workflows (Figure 9). The metabolic labeling approach is very popular. In this approach, stable isotopes are incorporated into the proteome of cells or small organisms. Samples can then be combined in the earliest possible point in the workflow so that errors/variations downstream of cell lysis affect all samples leading to high quantification accuracy and precision. Stable isotopes behave identically during chromatography and mass spectrometric analysis. The mass analyzer can then differentiate the two peptides due to their mass difference between heavy and light amino acids. Similar abundant peptides should then show comparable intensities/peak areas whereas peptides of changing proteins show different intensities allowing relative quantification. Stable isotope labeling with amino acids in cell culture (SILAC) is the most applied metabolic labeling technique in proteomics. In a typical SILAC experiment, isotope-labeled arginine and lysine (13C, 15N) are used, rendering each tryptic peptide with at least one labeled amino acid191. As the variety of useful isotope-combinations is limited to three, the multiplexing capabilities of metabolic labeling strategies are narrow. Besides SILAC, also

15N labeling can be used, but incorporation of 15N fluctuates between peptides making data analysis rather complicated133, 192. However, metabolic labeling strategies are only applicable to cell culture systems and some smaller organisms.

Another use of stable isotopes is employed in the chemical labeling approach. Either proteins or peptides can get labeled after lysis and digestion, be combined, and then subsequently processed as one sample. The quantification channel can either occur on the MS1 level or on the MS2 level.

For the first variant, many options exist, ranging from 18O-incorporation during digestion193 to dimethyllabeling194. For the later variant, isobaric tags have been designed195. They all have the same mass but different distribution of isotopes in their structure. The tags can be coupled to free amines in lysines and peptide/protein N-termini by NHS-reaction. Samples are combined afterwards and analyzed by mass spectrometry. As each tag has the same mass (isobaric), complexity in the MS1 spectrum is not increased (a problem in MS1 based labeling approaches).

Only in MS2 mode, the tag fragments and gives rise to different ions, whose intensity then reflects the quantity of peptide per condition. The two most popular ones are tandem mass tags (TMT) and isobaric tags for relative and absolute quantification (iTRAQ). TMT allows multiplexing of up to ten samples in one experiment and MS run, iTRAQ is suited for eight conditions196-199. One major drawback in MS2 based methods is the phenomenon of ratio compression leading to underestimation of the true abundance of a peptide. This occurs due to potential co-isolation and co-fragmentation of peptides as well as overlapping isotopic patterns of the tags themselves200, 201. Further fragmentation of fragment ions and a third MS measurement (MS3) can reduce this effect202.

22

Figure 9: Overview of relative quantification strategies in proteomics. Blue and yellow boxes refer to different experimental conditions (taken from133).

Another option is the addition of spike-in peptides with known quantities that can also serve as relative measure of peptide abundance.

A simple and economic method to determine the relative abundance of peptides and proteins can be realized with label-free quantification. Here, samples are only compared on the data analysis level, assuming robust and reproducible sample processing steps and mass analysis. It is not limited by the sample type or number of conditions compared. With the routine application of high-resolution mass spectrometers, it has become possible to compare intensities or areas under the curve directly between samples, similar to isotope-labeled quantification methods. Therefore, the peptide peak is integrated over the whole area of elution from the analytical column, the extracted ion chromatogram (XIC). It correlates well with the concentration of a peptide and covers a high dynamic range of at least four orders of magnitude203, 204. Only high mass resolution enables the accurate determination of XICs for a respective peptide. A second approach correlates the number of MS2 spectra per peptide to its abundance based on the assumption that peptides of a more abundant protein should also be selected for sequencing events more often (spectral counting).

However, today’s instrument methods in DDA mode often apply a dynamic exclusion list. A peptide sequenced once is not fragmented anymore for a specific amount of time to enable sequencing of low intensity peptides and, thus, a deeper coverage of the proteome. Therefore, the MS1 intensity based approach with high resolution instruments is nowadays superior to spectral counting205. Most DDA-methods embrace a compromise between the number of MS1 spectra needed for proper determination of chromatographic elution profiles for quantification and the number of MS2 spectra for deeper proteome coverage133. The MaxQuant platform employs a sophisticated algorithm for label-free quantification termed MaxLFQ. It features a ‘delayed normalization’ for up front separated samples assuming that abundance of most proteins is not changing and extracts the ‘maximum ratio information for peptide signals’ for accurate quantification. A protein is quantified by first determining pairwise peptide ratios and then calculating the median of peptide ratios present in both samples206. For increased coverage across samples, MaxQuant employs a

‘match between runs’ algorithm. If the information from the raw file is insufficient for identification of a feature, because it was not measured or is of too low intensity, MaxQuant matches the MS/MS

23 and sequence information through aligning the different runs within a tight retention time and mass window. A peptide is then still identified in a search containing more samples, compared to a single search of the specific run185.

Besides relative quantification across samples, absolute quantification methods have been developed and applied in various biological questions. These approaches either use stable isotope-labeled standards (AQUA, QconCAT, or absolute SILAC) or are based on label-free methods with respective algorithms and scoring strategies (PAI, APEX, IBAQ) (reviewed in 133).