• Keine Ergebnisse gefunden

3.2 Parhyale orthodenticle and other paired-class homeobox genes involved in head development

3.2.2 Parhyale hawaiensis genes of the aristaless group

3.2.2.3 Characterisation of Ph hbn

The comparison of all recovered Ph hbn sequences led to the identification of 44 sites of nucleotide exchange within the complete Ph hbn cDNA (see also A3, table A1). 34 of those occur uniquely and are therefore considered sporadic (5.3.3.2). They are found predominantly in the ORF (21) where they account for 13 amino acid changes (missense alterations) and one frame shift. The remaining seven nucleotide exchanges found within the ORF do not alter the translated amino acid sequence (silent exchanges). These findings go well with the expected randomness of artificial nucleotide exchanges. Ten nucleotide exchanges are present in more than one sequence and therefore considered polymorphic. Of the four polymorphic nucleotide exchanges found within the ORF, three do not alter the amino acid sequence. The remaining one leads to a change of the respective amino acid residue (T415>S415) outside the conserved HD. These observations suggest that the polymorphic nucleotide exchanges represent transcript variations resulting from different wild type Ph hbn alleles (5.3.3.2).

Based on these findings, the sequence of the clone Ph_hbn_cDNA01 was chosen as the source of the Ph hbn reference sequence (Ph hbn_ref). It was extended 5' by adding 60 bp derived from clone Ph_hbn_5Rn25 and 3' by adding 32 bp derived from clone Ph_hbn_3RHH357 (A2.3.1). An Alignment of Ph hbn_ref with all recovered Ph hbn sequences confirms its statistical consensus relevance (A2.3.2). If polymorphic nucleotide exchanges are not taken into account, all recovered Ph hbn sequences show a maximum of 1.1% difference from Ph hbn_ref, confirming that they in fact derive from transcripts of Ph hbn. Ph hbn_ref has been used for phylogenetic studies.

The Ph hbn transcript is 2.7 kb in length and encodes a protein of 493 amino acids. The Parhyale HBN protein has a HD (60 amino acids) that his highly similar to the HD of Drosophila HBN (3.2.2.8). Accordingly, glutamate is found at position 50 of the Parhyale HBN HD. N-terminally, an amino acid motif is present that shows high similarity with octapeptide/GEH domains found in Drosophila HBN and other closely related proteins (Figures 13, 14).

54

5' 3'

Figure 13: Schematic view of Ph hbn transcript. The length of the Ph hbn transcript is 2651 bases. Shown are: 5' UTR in grey (length 171 bases, nucleotide positions 1-171), ORF in black (length 1482 bases, encoding 493 amino acids; nucleotide positions 172-1653) and the 3' UTR in grey (length 998 bases, nucleotide positions 1654-2651). The ORF region encoding the octapeptide/GEH domain is depicted in orange (length 24 bases, encoding 8 amino acids; nucleotide positions 517-540).The ORF region encoding the HD is depicted in blue (length 180 bases, encoding 60 amino acids; nucleotide positions 694-873).

0001 GGAATTTTCACTGTTCCCTTTACCACGTTTTACTGAGTCGAGGAGATACTTACCAAAGTGTAAAAATGAGGAAAAATTGAA 0082 CTTGCAAAATTATGTAAAATGAAACTCCAACGTTTCTACACTTCCTGAAGTTTTTTGACTGGTGACATAAACTCTAGACCGAAATCTACC

Figure 14: Ph hbn cDNA and derived amino acid sequence. The sequence is in FASTA format and represents the Ph hbn cDNA, derived from the mRNA transcript. 5' and 3' UTR are shown in grey, ORF in black. The translated amino acid sequence is printed bold and

55

above the corresponding nucleotide sequence. Individual amino acids are above the central nucleotide of the respective codon. The putative start and stop codons are shown in green and red, respectively. The octapeptide/GEH domain is shown in pink and the HD in blue. The nucleotide sequences that encode these domains are shown in the respective colours. Numbers to the left give the relative nucleotide and amino acid sequence positions and share the font parameters of the corresponding sequence. The ends of the amino acid and the nucleotide sequences are indicated by numbers to the right of the corresponding line.

3.2.2.4 Isolation of Ph al1

From two-step 5' RACE, five independent Ph al1 5' cDNA fragment clones were obtained. Within the sequence they share, several polymorphic sites of nucleotide exchange have been identified (see below). If these are not taken into account, the nucleotide variation between any two of them is well below 1%. All Ph al1 5' cDNA fragment clones show identical transcription starts. These findings strongly support that they derive from transcripts of the same gene.

Ph al1 3' cDNA sequence was fully recovered by two-step 3' RACE. Three independent Ph al1 3' cDNA fragment clones were obtained. The nucleotide variation between any two of their sequences does not exceed 1.7%, with most nucleotide exchanges found within the sequence of clone Ph_al1_3Rn7 (A2.4, A3, table A1). All three cover the same fraction of Ph al1 3' cDNA and show only minor differences in poly(A) tailing. These observations suggest that they derive from transcripts of the same gene.

Sequence information provided by all recovered 5' and 3' Ph al1 RACE clones was sufficient for assembling the complete Ph al1 cDNA sequence in silico. In order to verify the consistency of the established Ph al1 cDNA sequence, coherent Ph al1 cDNA and, independently, fragments covering only the ORF were isolated via long-distance PCR reactions (5.2.3.4), performed on Parhyale cDNA collections (5.2.2). Three independent Ph al1 cDNA sequences and six independent Ph al1 ORF sequences were obtained (A2.4.1).

They are consistent with the findings from 5' and 3' RACE (A2.4.2). Apart from polymorphic nucleotide exchanges (3.2.2.5), the sequences of any two of these vary in less than 1% of nucleotide positions. This strongly suggests that these sequences derive from transcripts of different alleles of the same gene (5.3.3.2, see also A3, table A1).

56 3.2.2.5 Characterisation of Ph al1

The comparison of all recovered Ph al1 sequences led to the identification of 76 sites of nucleotide exchange within the complete Ph al1 cDNA (see also A3, table A1). 39 of those occur uniquely and are therefore considered sporadic (5.3.3.2). Half (15) of those found within the ORF (30) alter the amino acid sequence (missense alterations). The other half of them does not change the amino acid sequence (silent exchanges). These findings suggest that these nucleotide exchanges are artificial. 37 nucleotide exchanges are present in more than one sequence and therefore considered polymorphic. All of them are found within the ORF.

Eleven of these cause amino acid exchanges outside the conserved homeodomain. It is important to note that these amino acid variations appear to be linked, meaning that within individual Ph al1 cDNA and ORF clones, either all of the amino acid alternatives are found or none, in this case resembling the Ph al1 reference sequence (see below, A2.4.2, A3, table A1). The differences in the two resulting proteins are minor, however, arguing against the existence of two closely related Ph al1 paralogs. In two of the recovered Ph al1 5' RACE clones, an additional adenine is present within a stretch of six adenines. This causes a frame shift and would lead to a truncated AL1 protein in vivo. Notably, none of the Ph al1 cDNA and ORF clones shows this single nucleotide insertion. The remaining 25 nucleotide exchanges in the ORF that were identified as polymorphic do not alter the amino acid sequence. These observations suggest that at least two rather diverse Ph al1 alleles exist.

Based on these findings, the sequence of the clone 'Ph_al1_cDNA01 was chosen as the source of the Ph al1 reference sequence (Ph al1_ref). It was extended 5' by adding 35 bp derived from clone Ph_al1_5Rn2 and 3' by adding 62 bp derived from clone Ph_al1_3Rn8.

One sporadic nucleotide exchange was altered in order to represent the statistically prevalent nucleotide residue (1048 T>C, A2.4.1). An Alignment of Ph al1_ref with all recovered Ph al1 sequences confirms its statistical consensus relevance (A2.4.2). Ph al1_ref has been used for phylogenetic studies.

The Ph al1 transcript is 1.8 kb in length and encodes a protein of 510 amino acids. The Parhyale AL1 protein has a HD (60 amino acids) that his highly similar to the HD of Aristaless proteins of other species (3.2.2.8). Accordingly, glutamate is found at position 50 of the Parhyale AL1 HD. C-terminally, a sequence motif is present that is highly similar to OAR/aristaless domains found in Drosophila AL and other closely related proteins (Figures 15, 16).

57

5' 3'

Figure 15: Schematic view of Ph al1 transcript. The length of the Ph al1 transcript is 1849 bases. Shown are: 5' UTR in grey (length 169 bases, nucleotide positions 1-169), ORF in black (length 1533 bases, encoding 510 amino acids; nucleotide positions 170-1702) and the 3' UTR in grey (length 147 bases, nucleotide positions 1703-1849). The ORF region encoding the HD is depicted in blue (length 180 bases, encoding 60 amino acids; nucleotide positions 791-970). The ORF region encoding the OAR is depicted in orange (length 63 bases, encoding 21 amino acids; nucleotide positions 1529-1591).

0001 GGAATTTTCACTGTTCCCTTTACCACGTTTTACTGAATTTGAATAATATGAAACTAATATATCAAATTAACTGCTATTT 0080 TAATTCATTAATAGTCTTCTACTTACTTCTGTTTACTGCTGTATAAAAAAATTATAAGCAGTCAAAATTGAATCATTCAAGTGAAGAAAT

Figure 16: Ph al1 cDNA and derived amino acid sequence. The sequence is in FASTA format and represents the Ph al1 cDNA, derived from the mRNA transcript. 5' and 3' UTR are shown in grey, ORF in black. The translated amino acid sequence is printed bold and above the corresponding nucleotide sequence. Individual amino acids are above the central nucleotide of the respective codon. The

58

putative start and stop codons are shown in green and red, respectively. The HD is shown in blue and the OAR in orange. The nucleotide sequences that encode these domains are shown in the respective colours. Numbers to the left give the relative nucleotide and amino acid sequence positions and share the font parameters of the corresponding sequence. The ends of the amino acid and the nucleotide sequences are indicated by numbers to the right of the corresponding line.

3.2.2.6 Isolation of Ph al2

From single-step 5' RACE, supplemented by two-step 5' RACE, seven independent Ph al2 5' cDNA fragment clones were obtained. Except clone Ph_al2_5Rg1 which appears 5' truncated, they all show identical transcription starts. A variation in the fraction of sequence they share manifests as a 15 bp sequence insertion (A2.5). In addition to that, several polymorphic nucleotide exchanges have been identified within the Ph al2 cDNA fraction they cover. If these are not taken into account, the nucleotide variation between any two of them is well below 1%. These findings suggest that they derive from transcripts of the same gene.

In order to fully recover the Ph al2 3' cDNA sequence, 3' extension RACE was employed to overcome artificial premature poly(A) tailing present in the single clone obtained from an initial round of two-step 3' RACE. Two additional, independent Ph al2 3' cDNA fragment clones were obtained. Their sequences expand 3' to different extends. Within the fraction of sequence they share, the nucleotide variation is well below 1%, suggesting that they derive from transcripts of the same gene.

Sequence information provided by all recovered 5' and 3' Ph al2 RACE clones was sufficient for assembling the complete Ph al2 cDNA sequence in silico. In order to verify the consistency of the established Ph al2 cDNA sequence, coherent Ph al2 ORF fragments were isolated via long-distance PCR reactions (5.2.3.4), performed on Parhyale cDNA collections (5.2.2). Three independent Ph al2 ORF sequences were obtained (A2.5.1). They are consistent with the findings from 5' and 3' RACE (A2.5.2) except that all three lack a nucleotide triplet encoding a threonine residue (T85') found in all Ph al2 5' RACE clones.

Apart from polymorphic nucleotide exchanges (3.2.2.7), the sequences of any two of these vary in a maximum of 1.2% of nucleotide positions. This suggests that these sequences derive from transcripts of different alleles of the same gene. (5.3.3.2, see also A3, table A1).

59 3.2.2.7 Characterisation of Ph al2

The comparison of all recovered Ph al2 sequences led to the identification of 78 sites of nucleotide exchange within the complete Ph al2 cDNA (see also A3, table A1). 34 of those occur uniquely and are therefore considered sporadic (5.3.3.2). Of the 32 sporadic nucleotide exchanges found within the ORF, eight alter the amino acid sequence (missense alterations), two lead to premature translation stops (nonsense alterations) and two manifest as single nucleotide insertions or deletions that lead to frame shifts with regard to the translated amino acid sequence. The remaining 20 of them do not change the amino acid sequence (silent exchanges). The high number of silent nucleotide exchanges suggests that at least some of these might not be artificial, but occur naturally in existing Ph al2 alleles. 44 nucleotide exchanges are present in more than one sequence and therefore considered polymorphic. 35 of them are found within the ORF. Six of these cause amino acid exchanges outside the conserved homeodomain. As mentioned above, all three recovered Ph al2 ORF sequences lack a nucleotide triplet encoding a threonine residue (T85') that is found in all Ph al2 5' RACE clones. A single nucleotide insertion is also found more than once, causing a frame shift. The remaining 20 polymorphic nucleotide exchanges found within the ORF do not alter the amino acid sequence. It is important to note that similar to the situation observed for Ph al1, several of the amino acid variations appear to be linked (see also A3, table A1). The differences in the resulting protein variants are minor, however, arguing against the existence of two closely related Ph al2 paralogs. These observations suggest that several rather diverse Ph al2 alleles exist (5.3.3.2).

Based on these findings, the sequence of the clone Ph_al2_ORF05 was chosen as the source of the Ph al2 reference sequence (Ph al2_ref). It was extended 5' by adding 593 bp derived from clone Ph_al2_5Rn11 and 3' by adding 294 bp derived from clone Ph_al2_3Rfwx05. Several sporadic nucleotide exchanges were altered in order to represent the statistically prevalent nucleotide residues (1103 T>C, 1193 A>G, 1208 T>C, 1232 T>C, 1344 A>T, 1478 G>A, 1723 C>T, 1862 C>G, 1892 A>G, 1898 T>C, 2045 C>T, 2087 T>G, 2243 T>G, 2264 C>T, 2267 A>G, 2270 G>A, A2.5.1). An Alignment of Ph al2_ref with all recovered Ph al2 sequences confirms its statistical consensus relevance (A2.5.2). Ph al2_ref has been used for phylogenetic studies.

The Ph al2 transcript is 2.6 kb in length and encodes a protein of 586 amino acids. The Parhyale AL2 protein has a HD (60 amino acids) that his highly similar to the HD of

60

Aristaless proteins of other species (3.2.2.8). Accordingly, glutamate is found at position 50 of the Parhyale AL2 HD. C-terminally, a sequence motif is present that is highly similar to OAR/aristaless domains found in Drosophila AL and other closely related proteins (Figures 17, 18).

5' 3'

Figure 17: Schematic view of Ph al2 transcript. The length of the Ph al2 transcript is 2594 bases. Shown are: 5' UTR in grey (length 566 bases, nucleotide positions 1-566), ORF in black (length 1761 bases, encoding 586 amino acids; nucleotide positions 557-2327) and the 3' UTR in grey (length 567 bases, nucleotide positions 2328-2594). The ORF region encoding the HD is depicted in blue (length 180 bases, encoding 60 amino acids; nucleotide positions 1464-1643). The ORF region encoding the OAR is depicted in orange (length 63 bases, encoding 21 amino acids; nucleotide positions 2226-2288).

61

Figure 18: Ph al2 cDNA and derived amino acid sequence. The sequence is in FASTA format and represents the Ph al2 cDNA, derived from the mRNA transcript. 5' and 3' UTR are shown in grey, ORF in black. The translated amino acid sequence is printed bold and above the corresponding nucleotide sequence. Individual amino acids are above the central nucleotide of the respective codon. The putative start and stop codons are shown in green and red, respectively. The HD is shown in blue and the OAR in orange. The nucleotide sequences that encode these domains are shown in the respective colours. Numbers to the left give the relative nucleotide and amino acid sequence positions and share the font parameters of the corresponding sequence. The ends of the amino acid and the nucleotide sequences are indicated by numbers to the right of the corresponding line.

62

3.2.2.8 Phylogeny of the Parhyale hawaiensis aristaless group genes Ph hbn, Ph al1