• Keine Ergebnisse gefunden

5.4 Proceedings of the EuBIC Developer’s

Contents lists available atScienceDirect

Journal of Proteomics

journal homepage:www.elsevier.com/locate/jprot

Proceedings of the EuBIC developer's meeting 2018

Sander Willemsa, David Bouyssiéb, Dieter Deforcea, Viktoria Dorferc, Vladimir Gorshkovd, Dominik Kopczynskie, Kris Laukensf, Marie Locard-Pauletb, Veit Schwämmled, Julian Uszkoreitg, Dirk Valkenborgh,i, Marc Vaudelj,k, Wout Bittremieuxf,l,⁎

aLaboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium

bInstitute of Pharmacology and Structural Biology, University of Toulouse, CNRS, UPS, Toulouse, France

cBioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg, Austria

dDepartment of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark

eLeibniz-Institut für Analytische WissenschaftenISASe.V., Dortmund, Germany

fDepartment of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium

gMedizinisches Proteom-Center, Ruhr University Bochum, Bochum, Germany

hInteruniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, Belgium

iCentre for Proteomics, University of Antwerp, Antwerp, Belgium

jKG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway

kCenter for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway

lDepartment of Genome Sciences, University of Washington, Seattle, WA, USA

A B S T R A C T

The inaugural European Bioinformatics Community (EuBIC) developer's meeting was held from January 9thto January 12th2018 in Ghent, Belgium. While the meeting kicked owith an interactive keynote session featuring four internationally renowned experts in thefield of computational proteomics, its primary focus were the hands-on hackathon sessions which featured six community-proposed projects revolving around three major topics:

1. quality control

2. workows, protocols, and guidelines 3. quantication.

Here, we present an overview of the scientific program of the EuBIC developer's meeting and provide a starting point for follow-up on the covered projects.

1. Introduction

The European Bioinformatics Community (EuBIC) is an initiative of the European Proteomics Association (EuPA) to promote the use of bioinformatics for computational mass spectrometry (MS) and MS-based proteomics. Our goal is to bring together the European MS bioinformatics community, including students and early-career re-searchers as well as long-standing experts from both academia and in-dustry. Through the setup of community-driven dynamics, EuBIC mainly focuses on improving education in computational methods, job and funding opportunities, international collaborations, publication of specialized studies, and training of software tools. To this end, EuBIC maintains several web resources that include educational videos, grant overviews, a job fair, and tutorials (https://www.proteomics-academy.

org/). Besides these online resources, EuBIC regularly organizes

workshops and hubs at the major international conferences on com-putational MS and proteomics. Additionally, an annual conference on computational MS-based proteomics is organized by EuBIC itself, forming an important community outreach eort to bring together bioinformatics researchers from all over Europe.

The first EuBIC conference took place in January 2017 in Semmering, Austria [13]. As this turned out to be an overwhelming success, we envisioned to organize the EuBIC conference as an annual series. However, although this event brought together the European proteomics community, we observed that not all computational ex-pertise was utilized to its full potential in the typical conference setup consisting of presentations and workshops. Therefore we decided to alternate the annual EuBIC conference between a Winter School tar-geting a broad end user-oriented audience and a developer's meeting for software developers.

https://doi.org/10.1016/j.jprot.2018.05.015 Received 15 May 2018; Accepted 27 May 2018

Corresponding author at: Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.

E-mail address:wout.bittremieux@uantwerpen.be(W. Bittremieux).

Journal of Proteomics 187 (2018) 25–27

Available online 02 June 2018

T

The inaugural EuBIC developer's meeting was organized in Ghent, Belgium, from January 9th to January 12th 2018 (http://uahost.

uantwerpen.be/eubic18/). A total of 43 participantsFig. 1, including students, keynote speakers, and industry representatives from 14 dif-ferent countries participated in the developer's meeting. To stimulate direct collaboration and the active development of bioinformatics ap-plications, its main activity was a hackathon focusing on six important topics in computational proteomics which were crowd-sourced from the community. Additionally, prior to these hackathon sessions the meeting participants engaged in an interactive keynote session led by four in-ternationally renowned scientists with experience in tool development for MS-based proteomics.

2. Keynote presentations

The EuBIC developer's meeting kicked owith four keynote pre-sentations illustrating some important current drawbacks of MS-based data analysis and the crucial role of bioinformatics in solving these outstanding issues.

Prof. dr. Lennart Martens of Ghent University, Belgium, opened the meeting by describing his vision on the role of a bioinformatics scientist as aresearcher-developer. As life sciences research has accelerated enormously over the past two decades, nowadays it is heavily domi-nated by the huge amount of data that are generated and the advanced algorithmic techniques that are necessary to analyze these data. He outlined that the job of a researcher-developer is to use and develop sophisticated algorithms and powerful tools to increase our under-standing of the sheer complexity of biological systems [5]. This was followed by an interactive discussion on career aspects and the growth path of bioinformatics researchers.

Next, dr. Frédérique Lisacek of the Swiss Institute of Bioinformatics (SIB), Switzerland, presented her work on bridging proteomics and glycomics. She described diculties prohibiting the fully automated identification of glycoproteomics data and explained how her group has tackled some of these issues. By making use of open modification searching peptides with previously unconsidered post-translational modifications (PTMs) could be successfully identified [4]. Next, she explained how new computational tools can be used for the analysis of glycoproteomics data [3].

The third keynote speaker was dr. Laurent Gatto of the University of Cambridge, England, who gave a presentation on the ecosystem of open-source tools in the R programming language for the analysis of MS data [2]. Dr. Gatto showed a historical perspective on how increasingly powerful and popular R packages for the analysis of proteomics data have been developed. Based on a few use cases he demonstrated how several popular packages are related to each other and reinforce each other, thereby illustrating the effectiveness of open-source.

Thefinal keynote speaker was prof. dr. Lukas Käll of the KTH Royal Institute of Technology, Sweden. Prof. Käll explained that although the

characterized analytes in an MS proteomics experiment are peptides, researchers are typically interested in their parent proteins instead. As a result, protein inference has to be performed to reassemble protein sequences from the measured peptide sequence data. Based on simu-lated data and a sample of known content, prof. Käll demonstrated the eect of dierent design choices of protein inference algorithms [9].

Furthermore, he discussed the protein summarization problem, which aims to recreate proteins' relative concentration from peptides' abun-dances, and his Diacto algorithm [14].

In addition to these invited scientific keynotes two sponsored pre-sentations were given by company representatives. First, Adam Tenderholt from Veritomyx presented the PeakInvestigatorsoftware, which helps with deconvoluting and centroiding mass spectra. Second, Lyle Burton from SCIEX explained which application programming in-terfaces (APIs) they provide and how to use them. He also showed some examples of how these APIs are already used in open source and pro-prietary projects.

3. Hackathon

During the subsequent days of the EuBIC developer's meeting the participants split up into small groups to actively develop bioinfor-matics applications. Project proposals for the hackathon sessions were crowd-sourced in a transparent and open process. Prior to the devel-oper's meeting community members could submit project proposals for inclusion in the hackathon, which were subsequently evaluated on scientic merit and community interest. This resulted in a hackathon program consisting of six different projects in three main tracks:

1. quality control

2. workows, protocols, and guidelines 3. quantification.

3.1. Quality control

3.1.1. Dashboard for longitudinal QC monitoring

During this hackathon session the participants developed a web tool for the visualization and analysis of quality control (QC) metrics. Based on data in the qcML format [11] an interactive R/Shiny dashboard was developed using a microservice architecture. The dashboard includes functionality to visualize specic QC metrics longitudinally and per-form a robust principal component analysis to detect low-perper-forming experiments [12].

3.1.2. Data management and instrument performance monitoring During this hackathon session the participants added novel func-tionality to assess the quality of an MS experiment to the Proline–MS-Angel proteomics management software system. First, an execution environment to run external scripts was added to MS-Angel to extract Fig. 1.Participants of the EuBIC developer's meeting 2018.

S. Willems et al. Journal of Proteomics 187 (2018) 25–27

26

QC metrics from experimental raw files. Second, a semi-supervised approach to discriminate between high-quality and low-quality ex-periments was implemented in MS-Angel [8]. Third, the session parti-cipants established a roadmap to implement further QC features to Proline and MS-Angel.

3.2. Workflows, protocols, and guidelines

3.2.1. Implementation of software protocols in computational proteomics During this hackathon session the participants created a framework to implement fully documented and interactive protocols describing how to successfully carry out popular workows to analyze MS data.

Controlled environments in which to perform specific tasks were cre-ated using Docker containers and Jupyter notebooks to allow the full reproducibility of analysis pipelines and workows.

3.2.2. Third-party tool integration and method development in OpenMS The participants of this session first got an introduction to the OpenMS software platform [7]. Afterwards they developed their own plugins under the guidance of experienced OpenMS maintainers. Ex-amples of new OpenMS plugins that were developed include the MaRaCluster algorithm for spectral clustering [10].

3.3. Quantification

3.3.1. Statistical modelling to improve the quantitative analysis of post-translationally modified peptides

Using a recent phosphoproteomics dataset [6], the participants of this session evaluated three strategies for the dierential analysis of PTMs:

1. based on modied peptides only

2. based on modied peptides and any unmodied peptides from the corresponding protein

3. based on modified peptides, their unmodified counterparts, and any other unmodied peptides from the corresponding protein.

For each of these three cases linear models were developed to de-scribe the quantication of modied peptides under dierent condi-tions.

3.3.2. Novel algorithms for DIA-based label-free quantication

During this hackathon session the participants created new algo-rithms for label-free quantification of data-independent acquisition datasets to be included in IsoQuant [1]. A density-based clustering approach was developed to group corresponding features across the retention time, mass, and drift time dimensions.

4. Conclusion and outlook

The inaugural edition of the EuBIC developer's meeting was a re-sounding success. In a follow-up survey all participants expressed their overall satisfaction with the meeting, with two thirds of the survey respondents giving it a perfect score. Participants especially indicated that they enjoyed the unique interactive nature of the hackathon ses-sions. As envisioned, the restricted number of attendees allowed many interactions and facilitated effective communication and collaboration.

Even though the EuBIC developer's meeting only ran for a few days signicant progress was made during the hackathon sessions on all projects. We are encouraged by the productivity of the participants to start solving important problems in only a limited time. The hackathon groups have committed to continue their collaboration and complete their projects, which will hopefully lead to scientic publications and ultimately better software solutions for MS-based proteomics end users.

Encouraged by the enthusiastic support of the community we are already planning the next EuBIC Winter School, which will take place in

January 2019 in Zakopane, Poland.

Conflict of interest

The authors declare no conict of interest.

Acknowledgements

Funding for the EuBIC developer's meeting 2018 was provided by:

the Research FoundationFlanders (FWO), Ghent University's Doctoral Schools, the University of Antwerp's Framework for Young Researchers (OJO), the Flemish government, and EuPA. Additional sponsoring was provided by Thermo Fisher Scientific and the Belgian Proteomics Association (BePA).

Special thanks goes out to prof. dr. Lennart Martens for supporting the EuBIC developer's meeting through his FWO Scientic Research Network grant “Novel Knowledge from Public Life Sciences Data”.

Additionally, we would like to thank Inge Huyghe from the Laboratory of Pharmaceutical Biotechnology, Ghent University for her guidance through all the administration involved. Finally, we would like to thank all EuBIC members who volunteered to help on all fronts, as well as all keynote speakers and participants who contributed to the success of the developer's meeting.

References

[1] U. Distler, J. Kuharev, P. Navarro, Y. Levin, H. Schild, S. Tenzer, Drift time-specific collision energies enable deep-coverage data-independent acquisition proteomics, Nat. Methods. 11 (2) (2014) 167–170,http://dx.doi.org/10.1038/nmeth.2767.

[2] L. Gatto, A. Christoforou, Using R and bioconductor for proteomics data analysis, Biochim. Biophys. Acta 1844 (1) (2014) 42–51,http://dx.doi.org/10.1016/j.

bbapap.2013.04.032.

[3] O. Horlacher, C. Jin, D. Alocci, J. Mariethoz, M. Müller, N.G. Karlsson, F. Lisacek, Glycoforest 1.0, Anal. Chem. 89 (20) (2017) 10,932–10,940,http://dx.doi.org/10.

1021/acs.analchem.7b02754.

[4] O. Horlacher, F. Lisacek, M. Müller, Mining large scale tandem mass spectrometry data for protein modifications using spectral libraries, J. Proteome Res. 15 (3) (2016) 721–731,http://dx.doi.org/10.1021/acs.jproteome.5b00877.

[5] L. Martens, O. Kohlbacher, S.T. Weintraub, Managing expectations when publishing tools and methods for computational proteomics, J. Proteome Res. 14 (5) (2015) 2002–2004,http://dx.doi.org/10.1021/pr501318d.

[6] A. Rabiee, V. Schwämmle, S. Sidoli, J. Dai, A. Rogowska-Wrzesinska, S. Mandrup, O.N. Jensen, Nuclear phosphoproteome analysis of 3t3-l1 preadipocyte differ-entiation reveals system-wide phosphorylation of transcriptional regulators, PROTEOMICS 17 (6) (2017) 1600,248https://doi.org/10.1002/pmic.201600248.

[7] H.L. Röst, T. Sachsenberg, S. Aiche, C. Bielow, H. Weisser, F. Aicheler, S. Andreotti, H.C. Ehrlich, P. Gutenbrunner, E. Kenar, X. Liang, S. Nahnsen, L. Nilse, J. Pfeuffer, G. Rosenberger, M. Rurik, U. Schmitt, J. Veit, M. Walzer, D. Wojnar, W.E. Wolski, O. Schilling, J.S. Choudhary, L. Malmström, R. Aebersold, K. Reinert,

O. Kohlbacher, OpenMS: Aflexible open-source software platform for mass spec-trometry data analysis, Nat. Methods 13 (9) (2016) 741–748,http://dx.doi.org/10.

1038/nmeth.3959.

[8] E.M. Solovyeva, A.A. Lobas, A.T. Kopylov, M.V. Gorshkov, Semi-supervised quality control method for proteome analyses based on tandem mass spectrometry, Int. J.

Mass Spectrom. 427 (2018) 59–64,http://dx.doi.org/10.1016/j.ijms.2017.09.008.

[9] M. The, F. Edfors, Y. Perez-Riverol, S.H. Payne, M.R. Hoopmann, M. Palmblad, B. Forsström, L. Käll, A protein standard that emulates homology for the char-acterization of protein inference algorithms article ASAP, J. Proteome Res. (2018), http://dx.doi.org/10.1021/acs.jproteome.7b00899.

[10] M. The, L. Käll, MaRaCluster: A fragment rarity metric for clustering fragment spectra in shotgun proteomics, J. Proteome Res. 15 (3) (2016) 713–720,http://dx.

doi.org/10.1021/acs.jproteome.5b00749.

[11] M. Walzer, L.E. Pernas, S. Nasso, W. Bittremieux, S. Nahnsen, P. Kelchtermans, P. Pichler, H.W.P. van den Toorn, A. Staes, J. Vandenbussche, M. Mazanek, T. Taus, R.A. Scheltema, C.D. Kelstrup, L. Gatto, B. van Breukelen, S. Aiche, D. Valkenborg, K. Laukens, K.S. Lilley, J.V. Olsen, A.J.R. Heck, K. Mechtler, R. Aebersold, K. Gevaert, J.A. Vizcaino, H. Hermjakob, O. Kohlbacher, L. Martens, qcML: An exchange format for quality control metrics from mass spectrometry experiments, Mol. Cell. Proteomics 13 (8) (2014) 1905–1913,http://dx.doi.org/10.1074/mcp.

M113.035907.

[12] X. Wang, M.C. Chambers, L.J. Vega-Montoto, D.M. Bunk, S.E. Stein, D.L. Tabb, QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statis-tics, Anal. Chem. 86 (5) (2014) 2497–2509,http://dx.doi.org/10.1021/ac4034455.

[13] S. Willems, D. Bouyssié, M. David, M. Locard-Paulet, K. Mechtler, V. Schwämmle, J. Uszkoreit, M. Vaudel, V. Dorfer, Proceedings of the EuBIC winter school, J.

Proteomics 161 (2017) 78–80,http://dx.doi.org/10.1016/j.jprot.2017.04.001.

[14] B. Zhang, M. Pirmoradian, R. Zubarev, L. Käll, Covariation of peptide abundances accurately reflects protein concentration differences, Mol. Cell. Proteomics 16 (5) (2017) 936–948,http://dx.doi.org/10.1074/mcp.O117.067728.

S. Willems et al. Journal of Proteomics 187 (2018) 25–27

Chapter 6

Discussion

Identifying peptides from mass spectra is the first step towards the inves-tigation of a biological sample, the mechanisms taking place in a cell and the active proteins. The application of an algorithm suited for these de-mands and designed to exploit the possibilities of modern instruments is therefore crucial. MS Amanda, the search engine described in this thesis (see Chapter 3), has been developed to perfectly meet these needs and has proven to be able to cope with the presented challenges. Compared to the search engine Mascot [76], which is despite its drawbacks still mostly used in the proteomics community, MS Amanda is able to identify up to 50% more reliable PSMs at the same false discovery rate. This increase is enabled by the improved scoring function utilized in MS Amanda. The following three elements are key factors in this case:

1. Calculation of the binomial coefficient: In contrast to other search engines, that also use a binomial scoring function, such as An-dromeda [12],N, the number of peaks that can be at most matched, is defined by the number of picked peaks out of the experimental mass spectrum.

2. Estimation of probability p to match a peak by chance: The formula to calculate the probability to match a peak by chance has been designed to be able to accurately incorporate fragment mass to-lerances in ppm (parts per million) to account for the potential of modern mass spectrometers.

3. Consideration of peak intensities: Peaks intensities of matched peaks are incorporated in the scoring function enabling a discrimina-tion of peptides matching the same number of peaks, favouring pep-tides matching the higher peaks, as these are more relevant for the spectrum.

MS Amanda has been the basis for further developments. Despite the achievements of this new algorithm in terms of spectrum identification, a

90

lot of spectra were lacking the assignment of a confident peptide and a de-cent amount of identified spectra still contained high intense peaks that remained unexplained. These issues indicated the distinct presence of chi-meric spectra, which has already been observed before [63, 41, 3]. The exi-sting solutions capable of identifying co-eluting precursors [94, 79, 12] were hardly used in an everyday proteomics workflow, although the potential of this information is tremendous and easily retrievable. To ease the access and usage of chimeric spectra identification, new algorithms have been de-veloped for a chimeric search functionality. Several aspects and strategies have been considered and tested, the following mechanism has proven to be most successful (see also Chapter 4):

• First search: A first peptide identification search is performed using the specified precursor mass and corresponding PSMs are stored and reported.

• Spectrum cleaning: Spectra are cleaned for peaks already identi-fied by the best matching peptide of the first search. The overlap of shared peaks of peptides from co-eluting precursors has been proven to be negligible.

• Co-eluting precursor identification: MS1 spectra are investigated and potential co-eluting precursors within the isolation window are identified.

• Second search: For every potential identified co-eluting precursor the cleaned spectrum is cloned and the corresponding precursor is assigned. Spectra are searched again using the MS Amanda algorithm.

Results show that depending on the instrument settings up to 50% of all spectra carry an additional peptide that can be reliably identified. More than 40% additional unique peptides can be identified even for narrow iso-lation windows, increasing to almost 50% for an isoiso-lation width of 4m/z.

On average, 20% of all spectra even contained more than two peptides. The identification of chimeric spectra has proven to be an indispensable task when it comes to gaining deep insight into a biological sample. Although the underlying approach is computationally challenging, results revealed that it is worth investing the time.

Having successfully identified tandem mass spectra, validation of the matches is of high importance. Several approaches have been already con-ducted performing this tasks, as also shown in Chapter 4 and Section 5.1 using retention time prediction and white box modeling. Similar strategies have been followed recently by Granholm et al. [38] or Tu et al. [85].

Several spectra still remain unexplained. This may be due to lack of corresponding proteins in the database or – more likely – due to unconsidered PTMs. We have already conducted research into the prior identification of

modified spectra before database [25], still there is more work to be done.

Another very promising approach is the work on spectral library identifi-cation methods, which we are further pursuing; we have already performed some work in this area [18]. This thesis will serve as a perfect foundation for further research endeavors.

92

Bibliography

[1] R. Aebersold and D. R. Goodlett. Mass Spectrometry in Proteomics.

Chemical Reviews, 101(2):269–296, 2001.

[2] R. Aebersold and M. Mann. Mass spectrometry-based proteomics.

Nature, 422(6928):198–207, 2003.

[3] G. Alves, A. Y. Ogurtsov, S. Kwok, W. W. Wu, G. Wang, R.-F. Shen, and Y.-K. Yu. Detection of co-eluted peptides using database search methods. Biology Direct, 3:27, 2008.

[4] T. E. Angel, U. K. Aryal, S. M. Hengel, E. S. Baker, R. T. Kelly, E. W. Robinson, and R. D. Smith. Mass spectrometry-based proteo-mics: existing capabilities and future directions. Chemical Society Re-views, 41(10):3912–28, 2012.

[5] A. Bateman, M. J. Martin, C. O’Donovan, M. Magrane, R. Apweiler, E. Alpi, R. Antunes, J. Arganiska, B. Bely, M. Bingley, C. Bonilla, R. Britto, B. Bursteinas, G. Chavali, E. Cibrian-Uhalte, A. Da Silva, M. De Giorgi, T. Dogan, F. Fazzini, P. Gane, L. G. Castro, P. Garmiri, E. Hatton-Ellis, R. Hieta, R. Huntley, D. Legge, W. Liu, J. Luo, A. Mac-dougall, P. Mutowo, A. Nightingale, S. Orchard, K. Pichler, D. Poggioli, S. Pundir, L. Pureza, G. Qi, S. Rosanoff, R. Saidi, T. Sawford, A. Shyp-itsyna, E. Turner, V. Volynkin, T. Wardell, X. Watkins, H. Zellner, A. Cowley, L. Figueira, W. Li, H. McWilliam, R. Lopez, I. Xenarios, L. Bougueleret, A. Bridge, S. Poux, N. Redaschi, L. Aimo, G. Argoud-Puy, A. Auchincloss, K. Axelsen, P. Bansal, D. Baratin, M. C. Blat-ter, B. Boeckmann, J. Bolleman, E. Boutet, L. Breuza, C. Casal-Casas, E. De Castro, E. Coudert, B. Cuche, M. Doche, D. Dornevil, S. Duvaud, A. Estreicher, L. Famiglietti, M. Feuermann, E. Gasteiger, S. Ge-hant, V. Gerritsen, A. Gos, N. Gruaz-Gumowski, U. Hinz, C. Hulo, F. Jungo, G. Keller, V. Lara, P. Lemercier, D. Lieberherr, T. Lom-bardot, X. Martin, P. Masson, A. Morgat, T. Neto, N. Nouspikel, S. Paesano, I. Pedruzzi, S. Pilbout, M. Pozzato, M. Pruess, C. Rivoire, B. Roechert, M. Schneider, C. Sigrist, K. Sonesson, S. Staehli, A. Stutz, S. Sundaram, M. Tognolli, L. Verbregue, A. L. Veuthey, C. H. Wu, C. N.

Arighi, L. Arminski, C. Chen, Y. Chen, J. S. Garavelli, H. Huang, K. Laiho, P. McGarvey, D. A. Natale, B. E. Suzek, C. R. Vinayaka, Q. Wang, Y. Wang, L. S. Yeh, M. S. Yerramalla, and J. Zhang. UniProt:

A hub for protein information. Nucleic Acids Research, 43(D1):D204–

D212, 2015.

[6] S. A. Beausoleil, J. Vill´en, S. A. Gerber, J. Rush, and S. P. Gygi. A probability-based approach for high-throughput protein phosphoryla-tion analysis and site localizaphosphoryla-tion. Nature Biotechnology, 24(10):1285–

1292, 2006.

[7] K. Biemann. Nomenclature for peptide fragment ions (positive ions).

Methods in Enzymology, 193(C):886–887, 1990.

[8] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

[9] J. S. Brodbelt. Photodissociation mass spectrometry: New tools for characterization of biological molecules. Chemical Society Reviews, 43(8):2757–2783, 2014.

[10] J. M. Chick, D. Kolippakkam, D. P. Nusinow, B. Zhai, R. Rad, E. L.

Huttlin, and S. P. Gygi. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modi-fied peptides. Nature Biotechnology, 33(7):743–749, 2015.

[11] T. U. Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Research, 41(Database issue):D43–7, 2013.

[12] J. Cox, N. Neuhauser, A. Michalski, R. A. Scheltema, J. V. Olsen, and M. Mann. Andromeda: A peptide search engine integrated into the MaxQuant environment. Journal of Proteome Research, 10(4):1794–

1805, 2011.

[13] R. Craig and R. C. Beavis. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics, 20(9):1466–1467, 2004.

[14] R. Craig, J. C. Cortens, D. Fenyo, and R. C. Beavis. Using annotated peptide mass spectrum libraries for protein identification. Journal of Proteome Research, 5(8):1843–1849, 2006.

[15] V. Dancik, T. A. Addona, K. R. Clauser, J. E. Vath, and P. A. Pevzner.

De Novo Peptide Sequencing via Tandem Mass Spectrometry. Journal of Computational Biology, 64(3):327–342, 1999.

[16] S. Dasari, M. C. Chambers, M. A. Martinez, K. L. Carpenter, A. J. L.

Ham, L. J. Vega-Montoto, and D. L. Tabb. Pepitome: Evaluating improved spectral library search for identification complementarity and quality assessment. Journal of Proteome Research, 11:1686–1695, 2012.

94

[17] E. de Hoffmann and V. Stroobant. Mass Spectrometry Principles and Applications. John Wiley & Sons ltd, 3rd edition, 2002.

[18] E. W. Deutsch, Y. Perez-Riverol, R. J. Chalkley, M. Wilhelm, S. Tate, T. Sachsenberg, M. Walzer, L. K¨all, B. Delanghe, S. B¨ocker, E. L.

Schymanski, P. Wilmes, V. Dorfer, B. Kuster, P.-J. Volders, N. Jehm-lich, J. P. Vissers, D. W. Wolan, A. Y. Wang, L. Mendoza, J. Shofs-tahl, A. W. Dowsey, J. Griss, R. M. Salek, S. Neumann, P.-A. Binz, H. Lam, J. A. Vizca´ıno, N. Bandeira, and H. R¨ost. Expanding the use of spectral libraries in proteomics. Journal of Proteome Research, 17(12):4051–4060, 2018.

[19] B. Dommon and R. Aebersold. Mass Spectrometry and Protein Ana-lysis. Science, 312(5771):212–217, 2006.

[20] V. Dorfer, G. Duernberger, S. Winkler, and K. Mechtler. Searching Chi-meric MS/MS Spectra with MS Amanda. Proceedings of the Austrian Proteomics Research Symposium (APRS2014), 2014.

[21] V. Dorfer, S. Maltsev, S. Dreiseitl, K. Mechtler, and S. M. Winkler. A Symbolic Regression Based Scoring System Improving Peptide Identi-fications for MS Amanda. Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference -GECCO Companion ’15, pages 1335–1341, 2015.

[22] V. Dorfer, S. Maltsev, S. Winkler, and K. Mechtler. Chimeric MS/MS Spectra Identification using MS Amanda. Proceedings of 63rd ASMS Conference on Mass Spectrometry and Allied Topics (ASMS 2015), 2015.

[23] V. Dorfer, S. Maltsev, S. Winkler, and K. Mechtler. CharmeRT: Boost-ing Peptide Identifications by Chimeric Spectra Identification and Re-tention Time Prediction. Journal of Proteome Research, 17(8):2581–

2589, 2018.

[24] V. Dorfer, P. Pichler, T. Stranzl, J. Stadlmann, T. Taus, S. Winkler, and K. Mechtler. MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. Journal of Proteome Research, 13(8):3679–3684, 2014.

[25] S. Dorl, S. Winkler, K. Mechtler, and V. Dorfer. PhoStar: Identify-ing Tandem Mass Spectra of Phosphorylated Peptides before Database Search. Journal of Proteome Research, 17(1), 2018.

[26] J. E. Elias and S. P. Gygi. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.

Nature Methods, 4(3):207–214, 2007.

[27] J. K. Eng, A. L. McCormack, and J. R. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectro-metry, 5(11):976–989, 1994.

[28] J. B. Fenn, M. Mann, C. K. Meng, S. F. Wong, and C. M. Whitehouse.

Electrospray ionization for mass spectrometry of large biomolecules.

Science, 246(4926):64–71, 1989.

[29] K. Flikka, L. Martens, J. Vandekerckhove, K. Gevaert, and I. Eidham-mer. Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. Proteomics, 6(7):2086–

2094, 2006.

[30] A. Frank and P. Pevzner. PepNovo: De novo peptide sequencing via probabilistic network modeling. Analytical Chemistry, 77(4):964–973, 2005.

[31] C. K. Frese, A. F. M. Altelaar, H. Van Den Toorn, D. Nolting, J. Griep-Raming, A. J. R. Heck, and S. Mohammed. Toward full peptide se-quence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry. Analyt-ical Chemistry, 84(22):9668–9673, 2012.

[32] C. K. Frese, H. Zhou, T. Taus, A. F. M. Altelaar, K. Mechtler, A. J. R.

Heck, and S. Mohammed. Unambiguous phosphosite localization using electron-transfer/higher-energy collision dissociation (EThcD).Journal of Proteome Research, 12(3):1520–1525, 2013.

[33] B. E. Frewen, G. E. Merrihew, C. C. Wu, W. S. Noble, and M. J.

MacCoss. Analysis of peptide MS/MS spectra from large-scale pro-teomics experiments using spectrum libraries. Analytical Chemistry, 78(16):5678–5684, 2006.

[34] J. S. Garavelli. The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics, 4(6):1527–1533, 2004.

[35] L. Y. Geer, S. P. Markey, J. A. Kowalak, L. Wagner, M. Xu, D. M.

Maynard, X. Yang, W. Shi, and S. H. Bryant. Open mass spectrometry search algorithm. Journal of Proteome Research, 3(5):958–964, 2004.

[36] Google Scholar - MS Amanda. https://scholar.

google.at/scholar?oi=bibs{&}hl=de{&}cites=

6214716635964261011{&}as{_}sdt=5. Accessed on 2019-02-04.

[37] V. Gorshkov, S. Y. K. Hotta, T. Verano-Braga, and F. Kjeldsen. Pep-tide de novo sequencing of mixture tandem mass spectra. Proteomics, 16(18):2470–2479, 2016.

96