• Keine Ergebnisse gefunden

Materials and Methods

3.2 Mapping of nuclear proteins recognising ubiquity- ubiquity-lated histones

3.2.1 Chromatin affinity purification - mass spectrometry

To isolate protein complexes that bind modified histones, an affinity purification approach was previously developed in our laboratory [132]. This technique uses biotinylated nucleoso-mal arrays as an affinity tag for nuclear proteins. After purification, the proteins which are recruited to chromatin are identified by mass spectrometry. Chromatin affinity purification - mass spectrometry (ChAP-MS) enriches for proteins from nuclear extracts (Figure 3.11B).

To separate false-positives from true-positive interactors, ChAP-MS is combined with stable isotope labeling of amino acids in cell culture (SILAC) and uses extracts that have been pre-pared from human cells grown in media that have been supplemented with heavy arginine (+6 Da, 13C) and lysine (+4 Da, 2H) isotopes.

A regular ChAP-MS scheme contains a forward and a reverse experiment (Figure 3.11A). In the forward experiment, unmodified chromatin is incubated with light nuclear extract and modified chromatin is incubated with heavy nuclear extract. In the reverse experiments, la-bels are swapped such that the unmodified chromatin is incubated with heavy nuclear extract and the modified chromatin is incubated with light nuclear extract. Pooled eluates from the forward experiment are analysed by mass spectrometry. Peptide intensities from proteins enriched from the heavy or the light extract are measured and the resulting ratios between heavy and light protein groups (H/L) are sorted on a logarithmic scale (Figure 3.11C). In parallel, the eluates from the reverse experiment are also analysed by mass spectrometry.

The H/L ratios are inverted and sorted as in the forward experiment (Figure3.11D). The distributions of H/L ratios identified in the two experiments follow closely a normal distri-bution of 0 mean and 1 standard deviation (Figure 3.11C, Figure3.11D). If most measured protein groups fall within the boundaries of the normal distribution, several proteins are found on either side of the distribution mean. Enriched outliers have a positive H/L ratio.

Excluded outliers have a negative H/L ratio.

To separate false-positive from true-positive outliers the two H/L distributions were plotted against each other (Figure 3.11E). Such an interactome plot separates all measured protein groups into four quadrants. The top left quadrant included ubiquitin, histones and pro-teins that bind strongly to chromatin with no regard for the modification. The bottom left quadrant included proteins that are reproducibly excluded from modified chromatin. The bottom right quadrant identified interactors that have been enriched during the preparation of the heavy extract and bind chromatin with no regard for the modification. The top right quadrant included protein groups that have been reproducibly enriched on the modified chromatin.

Until recently, true-positive and true-negative outliers were selected from the top right and bottom left quadrants based on an arbitrary threshold or cutoff value which was applied to both the forward H/L and the reverse L/H ratio distributions (Figure 3.11F). Since the

Figure 3.11: Chromatin affinity purification - mass spectrometry. (A) Schematic represen-tation of experiments performed for isolation of nuclear proteins that recognise modified chromatin.

(B) Input nuclear extract and eluted proteins from forward (fwd) and reverse (rev) biochemical experiments. (C) Histogram of heavy/light (H/L) ratios for proteins identified in the forward exper-iment, with normal distribution plotted on top. (D) Histogram of L/H ratios for proteins indetified in the reverse experiment. (E) Intersection of the H/L distributions from the forward and reverse experiments. (F) Highlighted enriched (cyan) and excluded (light green) proteins in the two H/L distributions and their intersection.

Figure 3.12: Statistical analysis of ChAP-MS datasets. (A) One-sample student’s t-test analysis of the grouped H/L ratios from the forward and reverse experiemnts. The dashed parabolas deliniate the test’s significance threshold (B) Two-sample student’s t-test analysis of the separate mean H/L ratios from the forward and reverse experiments. The dashed parabolas represent the significance treshold. (C) Representation of the one-sample t-test significant outliers on the interac-tome plot. (D) Representation of the two-sample t-test significant outliers on the interacinterac-tome plot.

(E) Representation of significant reproducible outliers on an interactome plot. (F) Representation of significant reproducible outliers on a volcano plot. Enriched outliers are highlighted in cyan, excluded outliers are highlighter in light green.

affinity purification experiments are measured thrice in the mass spectrometer, the repro-ducibility of the measurements can be statistically quantified [133]. In order to identify significant outliers the triplicate measurements from the forward and the reverse experi-ments were subjected to student’s t-test analysis. Two types of analyses can be performed.

A one-sample t-test statistical analysis pools the information from the forward and the re-verse experiments in one dataset. The test calculates the overall mean of the three H/L ratios from the forward distribution and the three inverse L/H ratios from the reverse distribution and compares it to the zero mean of the entire dataset. The one-sample t-test calculates if the pooled mean is significantly different from the zero mean and plots the difference values against the corresponding p values (Figure 3.12A). The one-sample t-test is an indicator of measurement reproducibility across the forward and the reverse datasets.

A two-sample student’s t-test calculates if the difference between the mean of the three H/L ratios of a protein identified in the forward experiment is significantly different from the mean H/L ratios of the same protein identified in the reverse experiment. The difference between the mean H/L ratios of each protein group measured in the two experiments is plotted against the corresponding t-test p value (Figure3.12B). The two-sample t-test is an indicator of experimental difference between the forward and the reverse datasets.

The p values in the two t-test analyses were calculated using a permutation based algorithm whose false discovery rate (FDR) value was set to 0.01. Additionally, a S0 = 2 constant was added to increase pooled sample variance (decrease background noise) during calculations of both the t-test statistics. These parameters specified the significance threshold of the two analyses.

The significant outliers identified by the statistical analyses did not all agree with the previ-ously set H/L cutoff values. Both t-test analyses were more permissive in the identifiaction of enriched factors than the set H/L cutoff. In the one-sample t-test analysis, many inter-actors which defied the null hypothesis had low H/L enrichment or depletion ratios (Figure 3.12C). This selection, included all enriched or depleted factors, which were selected based on the H/L cutoff, but had a relatively high identification background. In the two-sample t-test, the comparison between the measurements in the forward and reverse experiments, improved the identification confidence (Figure 3.12D). This being said, some significantly enriched interactors were still only enriched in one of the H/L ratio distributions and some significantly excluded interactors were only excluded from one of the H/L ratio distributions.

This happened because the two-sample t-test evaluated if the difference between the mean H/L ratios from the forward and the reverse experiments was statistically significant. The two-sample t-test did not check for strict reproducibility of the two experiments, that is if the inverse of the reverse experiment matched the forward experiment, which was controlled by the one-sample t-test. The two-sample t-test was superior in its identification confidence to the one-sample t-test, but had some limitations with regard to the reproducibility of the forward and reverse biochemical experiments.

The t-test analysis was complemented by the previously set H/L cutoff values (Figure3.12E, Figure3.12F). The intersection of the two-sample t-test with the H/L distribution thresholds was chosen to insure that biochemically reproducible enriched or excluded outliers were also statistically significant. Throughout the thesis, the thresholds were set to a log2 (H/L) value

two-sample t-test FDR value of 0.01 and S0 value of 2. Two parameters were thus arbitrary set for confident identification of statiscally enriched interactors: the H/L cutoff and the S0 constant. The final interactome (H/L ratio distributions) and volcano (t-test statistics) plots presented in the thesis contained all measured protein groups and highlighted the significantly enriched outliers. To this end, the interactome plots focused on the representation of the top right quadrant and the volcano plots display only the positive t-test significance area.