• Keine Ergebnisse gefunden

Analysis of Highly Expressed PIGs with Unknown Function

3 Materials and Methods

4.2 Analysis of Highly Expressed PIGs with Unknown Function

4.2.1 Cleavage Site Prognosis

A signal sequence with a probable cleavage site at the beginning of a protein provides an indication about the future localization of the mature protein. Therefore, all highly expressed PIGs were tested for the presence of a N-terminal, hydrophobic signal sequence. As a prerequisite for this test, the starting point of the cDNA sequence coding for the PIG must be known (which could not easily be confirmed). The results indicated that six of the highly expressed PIGs (PIGs found six times or more during the EST project) with unknown function (a BLASTX search showed no, or very minor homology to known genes, see Table 4-1, Annex 8.1) have a probable cleavage site.

Table 4-6 lists the protein sequences of the PIGs that provided positive cleavage site prognosis results. As recommended for SignalP-neural networks, only the first 70 amino acids were used for the signal sequence prediction. The predicted signal sequence is underlined.

Table 4-6 Protein sequence of highly expressed PIGs which showed positive cleavage site prognosis results using SignalP- neural networks (signal sequences underlined)

Protein Sequence of PIG5

MQLHHLVTVLGLAFSQAQAAIPMVANHDQVIRQIVTLITDPTHVMEAVPGLKELVPEAKLSPTLSHAQ LNQGYREIFTSLRDVDNQHAPLIIDSLAGYHHTQDPARAEIYRNSLYSRIRDLSEHHPAQIRPHLDLTHH RLLTTLNLQLQNLEESHLPLIGSDKAVQELKEFVARFKNRNEKEIKQLYHEFLSTIPPTPKSEMVYLSPE QVNKALMETNSASKTPALPVDGNLWISTITHHHHHQTAPVAQTSPSLAAHAPQPVAASA

Protein Sequence of PIG7 (RTP1)*

MSNLRLLFTIISLAAIARAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKRELDQDANPGHRR HKSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Protein Sequence of PIG9

MKSEQLFILVVLCLPCLYLGRPLITAETAHIPVIAGPVQDALVSNEAIHSGSNLGLEVKDATAGGPSSIK DVVGHANKGASPASEIAPQADKVKPEANEVAPDASDPVPKSAKSKWSDRFKEGMKKTETYSKDLFER WFAKSKSAIKRWLAFIKDYFSRLRPKKATEKGTKVAAEEVELGLKPISQGKHDGPNVASPVTIVKTHP QSEATIERQTIAKNPLADKEVKSSSEDQQSFNDATVEVPSQVHPPVEPQTIAKNPVGDEVKTISEDQKG VSDAVPVKTLNPVGEEVTTISEDHQSLNNAVPVKVPSQVHPPVEPQTIAKNPVGEEVKTISEDQQGLNV AIPAKTPTHVQHPLESPGIAKSPQQSSSDTPPSKPS

Protein Sequence of PIG14

MPLTFAFLATILLSGSASANATGPSHVSRLAPRDEGANLRGTFIMSDNIDYSKGPLPVYYPNGSVAYLY DQWYKQAGISTSTLTPVFPGTHHPTPVITLHSVDDGCAAKSHYSEFEVPGTSHLTYKIDPRGIKSDRWY FEFVNHQGVRFRYRYHRNILSKGGKVYQYADKRPVRLVARLQEQLRWESWLNPGKSGAPTFTLSYDE STGLGDKLVTLMALVMSRVENCGL

Protein Sequence of PIG15

MHYSLFLVLSLFLQHARFAQSVRPYNRIPILTVNEYKSAWHSADNRLILLDNDGVLLPQHGRNDHDVK KAQDLLEALARDPKNTVWVITARGFDHVQEQYRVLRDRKVRLNLAGQLGTESRKWSEDTVRIQEGA RDDVAALHRSLVEATGATFMPRNCCILYPKNKGQNDKLVLKEMQRLAGDFTFEPNSGDHATLTHPTI NKGGFARDLFEKHERKTFVMSFGDADIDEKMHEAVNNSGFQTAVSTYVNTNGQSAARTRLDSHHDV HRFFREMVGPEWF

Protein Sequence of PIG23

MRSYLLQFFLIGIAPAVRSFSQAPVLMPRGMDHPKQLLKGKVSIEEISTTFTEATQSVVNTVSSPRSKTD SATIAQQVTNVHIYAQQLTISVENLNDHKKMVTHQDTMIGAFVSYISMINAISDSKTRTTQCRNQLVSI NIAFRSISTTYLASGIDLREEYNKHPHSPQYDPAKFAALDLDPLFDQVNPPAADLSTPEFDQSGEETDHA

*As agreed with Prof. Dr. K. Mendgen, only the signal sequence and the bipartite NLS of PIG7p (RTP1p) are shown

In Figure 4-3, the results for cleavage site prognosis using SignalP2.0 are presented in the form of graphs. The results for neural networks (NN) and hidden Markov models (HMM) trained on eukaryotes are shown next to each other.

PIG5

PIG7

PIG9

PIG15 PIG14

Figure 4-3 Cleavage site prognosis for the predicted proteins encoded by PIG5, PIG7, PIG9, PIG14, PIG15 and PIG23 using SignalP2.0. Only the first 70 amino acids of the ORF were used. On the left side, the graphical results for Neural Networks (NN) can been seen, while on the right, the results for Hidden Markov Models (HMM) trained on eukaryotes are displayed

Because of differences in cleavage site predictions between Neural Networks and Hidden Markov Models in SignalP2.0, a second prediction program (PSORT II, University of Tokyo, Tokyo, Japan) (Nakai and Kanehisa 1992) was used as well. Without exception, PSORT II provided the same result as Neural Networks, with all of the predicted cleavage sites being located between the 19th and 21st amino acid. Thus, for subsequent phases of this project, the Neural Network result was assumed to correctly identify the cleavage site.

Table 4-7 Cleavage site prognosis for different PIGs using Neural Networks (NN), Hidden Markov Models (HMM) trained on eukaryotes and PSORT

Possible cleavage site (after amino acid position-no), as identified by:

Gene Neural Networks Hidden Markov Models PSORT II

(using G.V. Heinje)

PIG5 19 19 19

PIG7 19 19 19

PIG9 20 26 20

PIG14 19 19 19

PIG15 21 19 21

PIG23 19 19 19

4.2.2 Analysis of Further Sequence-Specific Characteristics

Using PROSITE at http://www.expasy.org/prosite/ and PSORT II, at http://psort.nibb.ac.jp/

form2.html, the six PIGs were tested for sequence-specific characteristics. The scope and functionality of PROSITE and PSORT II are explained in chapter 3.11.

Below, the most important results obtained from the comparison between the studied PIGs and the abovementioned databases are briefly described. In this context, special emphasis was placed on the theoretical prediction of the localization of the PIG-proteins and their stability.

PIG23

Mature Protein Localization

Using the MTOP algorithm (Hartmann et al. 1989; Nakai and Horton 1999), the N-terminal side was predicted to be inside the cell for PIG5p, PIG7p (RTP1p) and PIG23p.

PIG14p and PIG15p were predicted to have the C-terminal side inside the cell. This result is inconsistent with the predicted signal sequence.

Using the NNCN algorithm (Reinhardt and Hubbard 1998; Nakai and Horton 1999), all the PIGps were predicted to be nuclear proteins, with the exception of PIG14p and PIG15p, which were predicted to be cytoplasmic proteins.

According to the k-NN algorithm (Horton and Nakai 1997; Nakai and Horton 1999), PIG5p, PIG9p, PIG14p and PIG15p are located extracellularly or attached to the cell wall.

PIG7p and PIG23p seem to be mitochondrial proteins, which is inconsistent with the nuclear localization sequence found for PIG7p (see below) and with the NNCN prediction for both proteins.

Because of the inconsistent results provided by the various prediction programs, the data of the in silico analysis can only be interpreted in combination with the data from the lab experiments. Both together are discussed in chapter 5.2.3

Presence of NLS

Using the NUCDISC algorithm (Nakai and Horton 1999), PIG7p (RTP1p) and PIG9p are predicted to have a nuclear localization sequence (NLS). Especially for PIG7p, this result is extremely interesting in relation to the in situ localization results, which will be discussed later.

Protein Stability

The estimation of the in vivo half-life of proteins was carried out using to the “N-end rule” algorithm (Tobias et al. 1991). All six PIG-proteins were predicted to have a half-life of > 20 h in yeast and > 10 h in E. coli, which indicates that they are stable. In contrast, according to the Stability Index (SI) algorithm (Guruprasad et al. 1990), only PIG5p (SI: 36.98) and PIG15p (SI: 34.83) are predicted to be stable, whereas the other four proteins were classified as unstable. The cut off value for the stability index is generally defined as 40, with proteins with values above 40 being classified as unstable.

During lab work, only PIG9p (SI: 56.71) and PIG23p (SI: 57.37) demonstrated instability and degraded during purification. Therefore, and because PIG23 does not seem to be a real in planta induced gene (see chapter 4.8), PIG9 and PIG23 were not further studied.

Isolation and purification of PIG5p and PIG15p, PIG7p (RTP1p) and PIG14p could be successfully carried at RT, despite the theoretically calculated instability of RTP1p (PIG7p) (SI: 41.50) and PIG14p (SI: 43.76). In this context, it must be noted that the over expression, isolation and purification of PIG14p was repeated several times to compensate for a low protein yield.

A table with the complete results generated by the prediction programs, can be found in the annex 8.3.