Identification of cross-linked peptides with OpenMS and OMSSA

2.2 Methods

2.2.10 MS data analysis

2.2.10.6 Identification of cross-linked peptides with OpenMS and OMSSA

Cross-linking experiments of ASH1, Cwc2 and yeast protein–mRNA complexes after TAP tag isola-tion were analyzed with OpenMS^{[99, 100]}and OMSSA^[34]as search engine. Data analysis workflows were developed in the course of this work and are explained in more details in the results sec-tion. Workflows are based on OpenMS tools written especially for our purpose as well as existing tools. Code was written and TOPPAS pipelines were assembled by Timo Sachsenberg (Prof. Oliver Kohlbacher, Applied Bioinformatics Group, Eberhard Karls University, Tübingen).

MS data in Thermo proprietary .raw format was converted into the open .mzMLformat^[101] with msconvert, part of the ProteoWizard^[102] software bundle. Q Exactive data was processed with the OpenMS tool FileFilter with the option "sort" for correct assignment of MS1 and MS2 spectra.

MS data recorded in profile mode, i.e. MS1 spectra of Velos measurements and both MS1 and MS2 of Q Exactive measurements, were centroided with the OpenMS tool PeakPickerHiRes. If automatic XIC filtering was desired later, an additional processing step was included: LC-MS data of control and UV irradiated sample were aligned to correct for small retention time shifts. The corresponding pipeline is shown in Figure 2.1. The pipeline requires the.mzMLfiles of both control and UV irradiated sample as input. Output file is the control .mzML with transformed retention times.

After data processing and before creating precursor mass variants, the MS data was reduced by iden-tification (ID) and extracted ion chromatogram (XIC) filters if desired. The ID filter pipeline (Figure 2.2) performed a standard database search with OMSSA to identify noncross-linked peptides, the corresponding MS/MS fragment spectra were removed from the MS data file. The database con-tained contaminant sequences (those distributed with MaxQuant^[103]) as well as decoy sequences.

The latter were used to determine a false discovery rate (FDR) and were created with the OpenMS DecoyDatabase tool by reversing the target sequences from the original database. A peptide hit was considered a confident match and subsequently used for filtering if the FDR was below 0.01.

Parameters for the OMSSA search are listed below. Input file is an .mzML, output files are an .idXML file containing the peptide matches used for filtering, and a reduced .mzML. The output .idXML can be annotated to the input.mzMLto retrace the peptide identifications.

Figure 2.1:

Pipeline for retention time alignment of LC-ESI-MS/MS data of control and UV irradiated sample (screenshot from TOPPAS). First, in both measurements peptides (features) are identified in the two-dimensional retention time versus m/z map by FeatureFinderCentroided.

Based on the features, maps of both measurements are aligned by Map-AlignerPoseClustering and the retention time transformations are applied by MapRTTransformer. Importantly, the control is transformed relative to the UV irradiated sample and not vice versa.

OMSSA search parameters precursor mass tolerance 10 ppm

fragment mass tolerance 0.1 Da min/max precursor charge 2/5

precursor charge determination believe input file variable modifications oxidation (M)

carbamylation (K), carbamylation (N-term) phospho (S), phospho (T), phospho (Y)

enzyme trypsin

max number missed cleavages 2

The XIC filter was applied to remove MS/MS spectra of precursors that appeared in both control and UV irradiated sample at comparable intensity (default: fold change less than two). This filtering step was done with the OpenMS RNP^xlXICFilter specifically created for our purpose. Input are the .mzMLfiles of both samples. The tool then calculates the intensity of a precursor in both control and UV irradiated sample in a small retention time window. If the intensity in the UV irradiated sample is less than twofold higher than in the control, the corresponding spectrum is filtered and not written into the output, the reduced.mzMLfile of the UV irradiated sample.

2.2 Methods 51

Figure 2.2: ID filter pipeline for removal of MS/MS spectra with confident peptide identification (screenshot from TOPPAS). OMSSAAdapter submits the OMSSA searches and re-trieves the search result. PeptideIndexer determines whether identified peptides corre-spond to target or decoy sequences. FalseDiscoveryRate determines the false discovery rate for each identification. Finally, IDFilter keeps only identifications below a certain false discovery rate, typically 0.01. Confident identifications that pass this criterion are reported in an .idXML output file. Finally, MS2FilterByPositionOverlap removes the MS/MS spectra that gave rise to the confident identifications from the.mzMLfile, the reduced .mzMLis output of the pipeline.

The crucial step of the data analysis, precursor mass variant generation and database searches, were performed with the RNP^xl tool, another OpenMS tool specifically created for our purpose.

The tool takes an .mzMLfile as input. This file can be a reduced .mzMLfrom any of the filtering steps described above or the original .mzML containing all raw data. Output files are an .idXML and a .csv file, both containing the database search results and RNA marker ion intensities for all MS/MS spectra contained in the input.mzML. The.idXMLfile can be used to annotate the search results to the MS data in .mzML in TOPPView, while the .csv file can be opened in programs like Microsoft Excel, e.g. to add notes about manual validation. Parameters for the RNP^xl tool are shown in Figure 2.3, the values correspond to the optimized parameters for yeast protein–RNA complexes after TAP tag purification. OMSSA search parameters are essentially as described for the ID filter with two important differences: The database is a limited database or the proteome of the respective organism, it does not contain contaminant or decoy sequences as those would increase analysis time and lead to false positive matches. For similar reasons, phosphorylation is not considered as a variable peptide modification.

Figure 2.3: Parameters of the RNP^xltool (screenshot from TOPPAS).length determines the maxi-mum length of RNA combinations to be considered for precursor variant generation.

sequence allows the input of a nucleotide sequence if only those combinations that ap-pear in the sequence should be considered. When left empty, all combinations from the nucleotides defined below are calculated. target_nucleotidesallows the definition of any nucleotide (RNA, DNA, substituted or labeled with stable isotopes) by its sum formula.

Themappingoption is used to define an input sequence that is randomly labeled, then the labeled and the native nucleotide are mapped on the same letter in the input se-quence. restrictions are used to require a certain nucleotide in all sequences considered for precursor mass variants. The parameters shown here would only allow sequences that contain at least one uracil. In themodifications field, all modifications are listed that should be considered for each of the nucleotide combinations. The parameters shown here resemble a standard experiment where the 152 adduct is also expected. All modifications have to be given as sum formulas. precursor_mass_threshold sets the (uncharged) threshold for the low mass filter, whileprecursor_variant_m/z_threshold sets them/z threshold for the precursor mass variants that are written in the output file. If CysteinAdduct is set to "true", 152 is considered as an adduct without any nucleotide. in_OMSSA_ini andin_fasta require the paths of the OMSSA parameter file and the database (in .fasta format), respectively. Finally, marker_ion_tolerance sets the mass tolerance for the determination of the presence and intensity of RNA marker ions.

2.2 Methods 53

Im Dokument Investigation of protein-RNA interactions by UV cross-linking and mass spectrometry: methodological improvements toward in vivo applications (Seite 71-75)