• Keine Ergebnisse gefunden

4. RESULTS AND DISCUSSION

4.5 Assessing the reproducibility of the improved BBM preparation

4.5.3 Estimation of experimental reproducibility

4.5.3.3 Findings and discussion

Many researchers in the field have claimed that the LC-MS/MS approach lacks the technical reproducibility to reliably identify proteins in a proteomics experiment. In this study, we aimed to examine this common belief by setting up an experiment in which we evaluated the reproducibility of the LC-MS measurements by counting how many proteins could reliably be identified in the analysis of triplicate technical BBM preparations.

The triplicate LC-MS/MS measurements of the 19 gel bands corresponded to each BBM preparation (see § 4.2) resulted in almost 90% commonly identified proteins in at least two of three preparations with at least two different peptides in one of the reprlicates. This degree of reproducibility was much higher than what was optimistically expected in our group and better than what has been openly claimed so far by other proteomics groups.

This unexpected outcome led me to investigate in more details whether this very high level of identification reproducibility achieved in this study was not based on an over-optimistic interpretation of the data. For this reason I decided to study in a systematic manner the contribution in variability that selected technical steps might add in the overall analytical process. For this purpose, the gel bands 2, 9 and 11, and a standard peptide mixture were used as references to set up an experiment to monitor the variability of the LC-MS(/MS) system (mass spectrometer performance, column history and column or/and buffer change), the gel band excision process, and the variations due to the BBM isolation protocol.

Data analysis and interpretation proved to be a very hard and time consuming process, because there was no unique value to comprehensively report on the simultaneous comparison of three (or more) samples. One of the main reasons to adopt the Venn diagram representation was that the commonly identified proteins within the triplicate analysis could be easily visualized. Simultaneously, we sought to take advantage of the high mass accuracy provided by the Orbitrap mass spectrometer and the reproducibility of the liquid chromatographic system to evaluate, whether such a data analysis could also be performed using the precursor mass intensity.

Taken as a whole, the systematic evaluation of the technical variability added by selected technical steps to the overall analytical process unambiguously confirmed the high reproducibility achieved by the LC-MS/MS process at the protein identification level. As expected, sample comparison based on injection replicates showed the best reproducibility, stressing the stability and the robustness of the LC-MS/MS system. The strategy used to excise bands from SDS-PAGE gels (horizontally in adjacent lanes of the same gel, or one lane

cut after the other on different gels) was identified to potentially represent a major cause of variability if not appropriately controlled. An irreproducible gel excision pattern might cause a considerable variability in the protein content of a given gel band leading to inconsistent identifications when compared to the equivalent gel band of another sample.

The BBM isolation protocol itself was a potential source of variability due to its relative length and complexity and due to the inclusion of several steps that might have been difficult to carry out quantitatively, such as the CaCl2 precipitation step. Also, the remaining proteolytic activity of the abundant BBM proteases, if unchecked, could also contribute to sample degradation and add extensive variability to the protein identification process. The preparation variation as mirrored in the comparison of the BBM technical triplicate analysis of bands 2, 9 and 11 showed a very high level of reproducibility in the number of commonly identified proteins (almost at the level of injection replicates) as long as all other technical steps (SDS-PAGE gel, band excision strategy, LC-MS/MS conditions) were kept constant.

The variability in the number of commonly identified proteins was significantly higher when the samples were analyzed without a design of experiment. Thus, the number of proteins commonly identified in the three technical replicates of band 2 was rather similar in the preparation variation study (all technical steps were controlled) and in the total variation study (analysis without design of experiment) while the number of commonly identified proteins in the technical replicates of band 9 and 11 was significantly lower in the total variation study compared to the preparation variation study. One of the main reasons for this differences is believed to be due to column and buffer changes during the analysis of the bands 9 and 11 samples, while the analysis of the band 2 samples were performed using the same column and buffer batch.

The degree of variability added by each technical step and by the BBM preparation protocol was also investigated at the full MS signal level. The comparative analysis of the precursor ion signals might provide additional information on sample similarity, since data comparison at the protein identification level (using successful MS/MS signals) makes use of only 10% of the total available signals. On the other hand, the sample similarity information that was derived from the MS/MS-based comparison must also be detected at the precursor ion signal intensity level. Overall, the scatter plot representations of the total MS signals and of the subset of MS signals that were identified through a successful MS/MS analysis correlated well with each other across all compared samples. Moreover, the derived Spearman correlation value, which indicated the degree of similarity between two samples, showed the same trends as what was observed using the Venn diagram representations. Thus, the scatter

plots representations and the associated Spearman correlation values also confirmed the very high degree of reproducibility observed between sample injection replicates, as previously shown at the protein identification level. Likewise, the gel band excision scheme was also identified as a potential source of variability, if gel bands could not precisely be cut from the gel, as for example in the case of band 11. Finally, the preparation variation study, as reflected in the scatter plots and the Spearman correlation values for all the three bands, showed also a high degree of reproducibility, strong evidence that the BBM preparation protocol could be performed robustly and quantitatively, if all technical steps were controlled. An unexpected finding of this systematic study was the large impact of LC buffer change on the analytical variability. This effect was best illustrated in the total variation study of band 11, where in one of the three triplicates clearly deviated from the two other samples. The MS signal alignment of those three samples showed clearly the major shift of the peptides’ retention time caused by the LC buffer change. In contrast, column change did not appear to contribute significantly to the overall technical variability, as minor column backpressure heterogeneities were compensated by the HPLC system’s active flow splitter system (the LC system measures the actual flow going through the column, meaning that the LC column pressure is adjusted to keep the linear velocity constant through the column).

In summary, it is apparent that sample comparison based on protein identification or on precursor ion signals were inherently similar. However, it is obvious that the optimized use of the mass spectrometric information will heavily depend on the generation of reproducible experimental data following a standard operating procedure. As a general rule, samples that should be compared to each other should be analyzed with identical column and buffer batches. In cases where this is not feasible, it is very important that buffers and columns are reproducibly and accurately prepared. The repeated analysis of a standard complex peptide mixture along the experiment and after any changes of buffer or/and column allow to monitor the variation of the peptides’ retention time and to judge the “good health” of the chromatographic system. However, even in an optimal experimental setting, comparison of serially acquired LC-MS(/MS) data will require the development of bioinformatics tools attuned to this specific data type. As highlighted in this study, one of the most immediate tasks to perform in a differential study is to measure the level of variability present in a given analysis, which in turn defines the criteria of acceptance for a reproducibly observed signal.

The assessment on how to define such a signal, including the noise and variability introduced by the various technical steps of the analytical protocol, has been one of the most challenging

problem in data analysis of proteomics experiments, but also central to determine to which degree a signal difference can be confidently interpreted as a real change.

This study constitutes the first attempt of our group to compare biological samples using the precursor ion intensity information. In due course, this first step was to be followed by a full fledge proteomics study in which a much more comprehensive biological experiment could have been investigated, such as a control mouse vs statins vs ezetimibe treatment, knowing in advance the level of technical reproducibility that could be expected. Beside the obvious biological interest for such a study (the impact of those drugs in the BBM has not been investigated in detail), a long term goal of this experiment could have been to evaluate the feasibility of a label-free quantification scheme based on the common precursor ion signals.

In this approach, every peptide signal within the sensitivity range of the MS analyzer can be extracted and incorporated into the quantification process independently of a MS/MS acquisition. In a first step, data acquisition is performed using a high resolution/high mass accuracy mass spectrometer using a stable chromatographic system to generate the most stable <m/z;RT;ion intensity> possible applying the analytical principles that this initial study helped to uncover. In a second step, the identity of the differentially regulated proteins is achieved using a targeted tandem mass spectrometry analysis approach using for example inclusion lists in selected LC-MS runs (78, 117, 118).