• Keine Ergebnisse gefunden

3. RESULTS

3.3. Identification and characterisation of the PPO multigene family from Physcomitrella

3.3.1. Identification, manual adaptation and sequence comparison of PPO genes on DNA level 45

In order to obtain a more profound understanding of the organisation of PPOs in the bryophyte, further PPO gene family members were identified and characterised.

Using the derived amino acid sequence of PpPPO1 (3.1) as a query for BLASTp in the P.patens V1.2_protein database and for tBLASTn in the P.patens V1.2_genome database (2.12.1), 15 loci were identified possessing similarities to PPO1 (cut-off 35% identity over a length of 80 aa) (Fig. 3.3).

Fig. 3.3 Flow chart of bioinformatic identification of the PPO gene family members of Physcomitrella.

Using the amino acid sequence of PpPPO1, BLAST analysis was carried out (2.12.1) and 15 putative PPOs were identified. After manual evaluation and correction of the gene models according to transcript evidence and the presence of the complete tyrosinase domain PF00264 (2.12.2 and 2.12.3), 13 PPOs were selected for further studies.

These 15 loci were selected and named PpPPO1 to PpPPO15 in descending order according to their hit appearance in the BLAST results with PPO1 as a query. Their predicted gene models (Phypa numbers according to version V1.2) with their intron/exon structure were evaluated in detail according to transcript evidences, the presence of the two copper-binding domains CuA and CuB (tyrosinase domain Pfam Tyr PF00264), and homology to PpPPO1 as well as published plant PPO sequences (2.12.2). If necessary, other gene models, available on cosmoss.org, were selected and proposed (Tab. 3.1).

For the putative polyphenol oxidase encoding genes PPO6, PPO13, PPO14, and PPO15 no ESTs were available to support the predicted gene models (Tab. 3.1). All other PPO gene models were sustained by EST evidences, although only the gene models of PPO1, PPO9, and PPO11 were covered completely by ESTs. For the gene models of PPO2, PPO3, PPO4, PPO5, PPO7, PPO8, PPO10, and PPO12, ESTs were present covering parts of the predicted gene structure.

According to BLAST homology analysis, for PPO12 and PPO13, no appropriate gene model was proposed by the cosmoss.org genome browser. For this reason, the Phypa models predicted by

version V1.2 were prolonged manually at the 5´ end in the case of PPO12, and at the 3´ end in the case of PPO13.

After manual evaluation and verification, the Phypa gene models proposed by version V1.2 were confirmed for PPO1, -2, -3, -7, -8, -14, and -15, whereas for PPO4, -5, -6, -9, -10, -11, -12, and -13 other than the server-proposed gene models (all_Phypa numbers, available on cosmoss.org) were selected, based on EST evidences supporting the intron/exon structure and homology analysis with other plant PPOs. In Tab. 3.1 the gene models with their introns before and after manual correction are summarised, along with the total number of corresponding ESTs and properties of the derived amino acid sequences. Further detailed information on evaluation of PPO gene models according to the analysis described in 2.12.2 can be found in the appendix (6.2.1).

Analysis of the organisation of the gene family within the genome revealed, that PPO6 and PPO12 are located tail to tail in relative proximity on the same scaffold No. 83 separated by approximately 15 kbp.

PPO7 and PPO10 are also located on the same scaffold No. 3 head to head, but approx. 1.89 Mbp apart from each other. Due to the preparative procedure of the genomic DNA prior to sequencing and assembly of the sequenced DNA in scaffolds, genes located on the same scaffold are localised on the same chromosome. Hence, PPO6 and PPO12 as well as PPO7 and PPO10 are located on the same corresponding chromosomes.

a better fitting model according to homology analysis was selected (all_Phypa model). (a): gene models available on http://www.cosmoss.org/cgi/gbrowse/physcome/; (b): determined on http://scansite.mit.edu/calc_mw_pi.html; (c) conserved domain search (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi); (d): simple prediction determined with MultiLoc, (http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/); (e): prediction performed with TargetP (http://www.cbs.dtu.dk/services/TargetP/) as described in 2.12.3.

BLAST V1.2 resultsa Selected gene modelsa Properties of the deduced amino acid sequences

gene model (V1.2)

intron [bp]

EST [no]

gene model

after evaluation scaff_no: from..to intron [bp]

Pos. of intron(s)

ORF [aa]

MWb

[kDa] pIb Pfam Tyrc PF00264

MultiLoc predictiond [likelihood]

TargetP predictione [cleavage site]

PPO1 Phypa_215905 94 15 Phypa_215905 121: 167271..169785 94 after CuB 536 60.15 9.38 yes Golgi [0.51]

secretory pathway [23]

PPO2 Phypa_173565 128 1 Phypa_173565 491: 123370..124362 128 after CuB 537 60.71 9.18 yes Golgi

[0.97] secretory pathway [23]

PPO3 Phypa_140409 138 6 Phypa_140409 167: 513054..514871 138 after CuB 559 62.71 7.26 yes extracellular [0.54]

secretory pathway [22]

PPO4 Phypa_2725 0 2 all_Phypa_116543 16: 1780468..1782075 0 no intron 535 60.78 5.50 yes plasma membr.

[0.68] secretory pathway [29]

PPO5 Phypa_102269 88 2 all_Phypa_156596 559: 43493..45206 88 after CuB 541 61.28 5.31 yes plasma membr.

[0.97]

mitochondrial [11]

PPO6 Phypa_2707 93 0 all_Phypa_130554 83: 1328380..1330047 93 after CuB 524 59.61 5.31 yes plasma membr.

[0.93] -

[-]

PPO7 Phypa_158623 143

54 5 Phypa_158623 3: 1801505..1803375 143

54 after CuB 557 62.89 5.94 yes plasma membr.

[0.94]

secretory pathway [24]

PPO8 Phypa_130903 0 12 Phypa_130903 85: 1247599..1249239 0 no intron 546 61.94 5.74 yes plasma membr.

[0.92] secretory pathway [28]

PPO9 Phypa_155214 45 25 all_Phypa_173397 455: 71482..73437 0 no intron 549 62.12 6.23 yes vacuolar [0.27]

secretory pathway [26]

PPO10 Phypa_111922 0 2 all_Phypa_174105 3: 3692578..3694295 0 no intron 541 61.23 6.79 yes ER

[0.34] secretory pathway [20]

PPO11 Phypa_186186 139 59 all_Phypa_131684 90: 966474..968220 139 after CuB 535 60.81 8.62 yes plasma membr.

[0.92]

secretory pathway [19]

PPO12 Phypa_212826 233 5 138 nt (5´-end) +

Phypa_212826 83: 1344874..1347619 233 after CuB 535 60.50 8.21 yes plasma membr.

[0.9] secretory pathway [19]

PPO13 Phypa_122830 845 0 Phypa_122830 +

576 nt (3´-end) 41: 343237..345686 845 within CuA 535 60.61 7.28 yes cytoplasmic [0.33]

- [-]

PPO14 Phypa_169836 329 0 Phypa_169836 222: 7055..8088 329 in front of CuB 234 26.34 6.75 no n.a. n.a

PPO15 Phypa_86565 61 0 Phypa_86565 147: 625749..626220 61 in front of CuB 136 15.87 4.76 no n.a n.a

In Fig. 3.4 the intron/exon structure of the evaluated gene models defined in Tab. 3.1 is schematically shown. On genomic level, the four PPO genes, PPO4, -8, -9, and -10, were found to have no introns. By contrast, PPO1, -2, -3, -5, -6, -7, -11, and -12 possess a small intron that varies in size from 88 to 233 bp. These introns are located at the same corresponding position downstream of the CuB encoding region. For the gene model of PPO7, a second intron (54 bp) was predicted to be located 84 bp downstream of the first intron. The predicted intron in the selected gene model of PPO13 was found to be very large (845 bp) and located within the CuA encoding region, unlike those of the gene models of PPO1 to PPO12.

Fig. 3.4 Scheme of PPO gene models after manual adaptation according to Tab. 3.1. Coding sequences are displayed in green and yellow, UTRs in grey, position and length of introns are marked by black spikes. The sequence regions encoding for the copper-binding domains CuA and CuB are indicated in yellow. PPO14 and PPO15, were considered to be incomplete genes, because the Pfam domain Tyr PF00264 was not present in the gene models.

As shown in Tab. 3.1 and Fig. 3.4, the Pfam domain Tyr PF00264, consisting of the two copper-binding domains CuA and CuB, was found in the ORFs of the selected gene models of PPO1 to PPO13, but could not be detected for PPO14 and PPO15. Both latter sequences encode only a short ORF (234 and 136 aa, respectively) and contain only one copper-binding domain encoding region (CuB). PPO15 also possesses a small fragment homologous to a part of the copper-binding domain CuA. Other gene models for PPO14 and PPO15 with prolonged ORFs were not

available due to start and stop codons upstream and downstream of the existing models. Moreover, LTR retrotransposons were found ca. 340 bp downstream of the PPO14 gene model and ca. 2100 bp upstream and ca. 1800 bp downstream of the predicted gene model for PPO15. These results suggested that PPO14 and PPO15 are incomplete, probably due to an insertion of transposable elements. Therefore, they were excluded from the putative PPO gene family.

Based on these observations, it was concluded that Physcomitrella possesses thirteen putative polyphenol oxidase encoding genes, PpPPO1 to PpPPO13. Further studies including detailed amino acid sequence comparison as well as phylogenetic analyses were conducted on these genes (3.3.2 and 3.3.3). Transcription levels of PPO1 to PPO12 were analysed under standard cultivation conditions as well as under influence of certain stress conditions (3.5).

3.3.2. Sequence comparison of PpPPO1 to PpPPO13 on amino acid level

Properties of the derived amino acid sequences of PPO1 to PPO13 were analysed according to the in silico methods described in 2.12.3 and are summarised in Tab. 3.1.

The ORFs of the thirteen PPO genes encode for proteins with a length ranging from 524 (PPO6) to 559 (PPO3) amino acids with a calculated molecular weight of 59.61 to 62.89 kDa. The predicted isoelectric points (pI) of the derived amino acid sequences range from 9.38 to 5.31, and can be grouped as follows:

PPO1/PPO2 > PPO11/PPO12 > PPO13 > PPO3 > PPO7/PPO8/PPO9/PPO10 > PPO4/PPO5/PPO6.

Sequence comparison of PPO1 to PPO13 on amino acid level was performed using the MAFFT algorithm (2.12.4). Percentage identity of pairwise alignment of the overall amino acid sequences was determined using the EMBOSS::needle algorithm (2.12.1) and ranged from 28.9 % (PPO8 with PPO13) to 74.7 % (PPO1 with PPO2). Based on the MAFFT alignment, an average distance tree using the calculated BLOSUM62 scores was generated in Jalview 2.4 (2.12.4) as shown in Fig. 3.5.

Fig. 3.5 Average distance tree (based on BLOSUM62 score calculated with the MAFFT algorithm) of PPO1 to PPO13. Parts of the alignment are given in Fig.

3.6. PPOs without introns are underlined. Groups formed by the fifth separation at the blue nodes are coloured in blue, groups formed by the sixth separation at the green nodes are marked in green.

The identified PPO family members were found to cluster in six groups. PPO13 is separated and stands apart from the other twelve PPOs. The other five groups consist of group 1 [PPO1/PPO2] and group 2 [PPO3/PPO7], together assembling to an upper-level grouping, group 3 [PPO4/PPO5/PPO6] and group 4 [PPO11/PPO12], both being part of a second upper-level grouping, and group 5 [PPO8/PPO9/PPO10]. PPOs within one group share similar protein properties such as similar isoelectric points and target predictions (Tab. 3.1).

In Fig. 3.6 parts of the MAFFT alignment that was used to establish the tree depicted in Fig.

3.5 are shown.

The alignment revealed that the copper-binding domain CuA consists of exactly 65 aa in all Physcomitrella PPOs and is highly conserved within the PPO family (Fig. 3.6B). Percentage identity of the copper-binding domain CuA within one group was high and ranged from 81 % to 90 % (PPO1 with PPO2). Compared across groups, lower identities were found (e.g., 49 % identity of CuA of PPO10 with CuA of PPO13).

The length of the copper-binding domain CuB was found to be less conserved than CuA and ranged from 41 aa to 59 aa (Fig. 3.6C). Although conservation of CuB within one group was very high and ranged from 72.9 % identity (PPO3 with PPO7) to 95.8 % identity (PPO1 with PPO2), lower identities were determined, when comparing across groups (32.2 % identity of PPO7 CuB with PPO9 CuB).

Fig. 3.6 Multiple sequence alignment of PPO1 to PPO13 of the N-terminus (A.) and the region of the copper-binding domain CuA (B.) and CuB (C.). The alignment was calculated using the MAFFT algorithm and graphically displayed in Jalview 2.4 (2.12.4); sequences were not edited. The predicted signal sequence of PPO1 is underlined. The start of the putative mature form of PPOs, possessing a signal sequence as predicted by TargetP, is indicated by a vertical green line (for PPO5, PPO6 and PPO13 no signal peptides were predicted). In A. the alignment is coloured according to hydrophobicity: hydrophobic amino acids are coloured in red, intermediates in purple and hydrophilic amino acids are coloured in blue (Kyle and Doolittle, 1982). In B. and C.

the colours indicate the BLOSOM62 score: high scores are designated by dark blue, lower scores in light blue.

The regions of the copper-binding domains are framed in yellow; the three histidines within each copper-binding domain are framed in orange.

The subcellular localisations of the thirteen selected PPO gene products from Physcomitrella were predicted using the online applications TargetP and MultiLoc (2.12.3). As indicated in Tab.

3.1, in most cases both applications predicted similar targets. TargetP predicted that all PPOs except PPO5, -6 and -13 enter the secretory pathway. MultiLoc predicted nearly the same targets but specified the organelle that the protein was targeted to, such as Golgi, endoplasmatic reticulum (ER), extracellular, vacuolar or plasma membrane. PPO5 was predicted to be localised in the mitochondria by TargetP, although analysis using MultiLoc predicted the sequence to be targeted to the plasma membrane. No targets were predicted for PPO6 and PPO13 with TargetP, whereas MultiLoc analysis suggested that PPO6 is targeted to the plasma membrane, and that PPO13 is a cytoplasmic protein (likelihood 0.33).

TargetP was further used to determine the length of the putative N-terminal signal sequences, which were found to vary in length from 19 aa (PPO11) to 29 aa (PPO4) (Tab. 3.1).

In Fig. 3.6A the alignment of the N-terminal signal sequences of PPO1 to PPO13 is presented, and amino acids are coloured according to their hydrophobicity. All PPOs with a predicted signal peptide possess a hydrophobic region in their N-terminal sequence consisting of five amino acids with the consensus sequence G[A/L/V]LVL and eleven amino acids with the consensus sequence IV[S/V][F/I/L]ALV[A/E][A/I/Q]VE.

Pairwise alignments revealed that the N-terminal sequences were less conserved across groups (e.g., 13 % identity of PPO10 with PPO13), while percentage identities were higher within the same group (e.g., 57 % for PPO1 with PPO2).