• Keine Ergebnisse gefunden

4.7 Visualization of RNA Secondary Structure Alignments

4.7.1 ASCII Representation

The ASCII representation of an RNA secondary structure alignment extends the sequence alignment representation that arranges the aligned sequences on top of each other. Essentially, it is an alignment of Vienna strings (see Section 2.3) and there is a gap in the structure line iff there is a gap in the sequence line. See the following example where a “*” character highlights sequence or structure conservation:

Input

> alanine

ggggcuauagcucagcugggagagcgcuugcauggcaugcaagaggucagcgguucgaucccgcuuagcuccacca (((((((..((((...)))).(((((...)))))...(((((...))))))))))))....

> leucine

gccgaaguggcgaaaucgguagacgcaguugauucaaaaucaaccguagaaauacgugccgguucgaguccggccu (((((((..(((...))).(((((...))))).(((....)))..(((((...))))))) ucggcacca

)))))....

Output

alanine ggggcuauagcucagcugggag-agcgcuugcauggcaugcaagag--g---u-c leucine gccgaaguggcgaaaucgguagacgcaguugauucaaaaucaaccguagaaauac

* * * ** * ** ** ** *** * * *** * * * * alanine --agcgguucgaucccgcuuagcuccacca

leucine gugccgguucgaguccggccuucggcacca

******** *** * *****

alanine (((((((..((((...)-))).(((((...)))))..--.---.-.

leucine (((((((..(((...))).(((((...))))).(((....)))

************ ******** ********************** * alanine --(((((...))))))))))))....

leucine ..(((((...))))))))))))....

****************************

100 Model

4.7.2 2d-Plot

RNA secondary structures are represented graphically as circle plots, dot plots, mountain plots or plots (refer to Section 2.3). I present a 2d-plot variant for RNA secondary structure alignments that emphasizes both sequence and structure similarity. I follow the strategy of using well estab-lished layout algorithms for 2d-plots of RNA secondary structure [14, 109, 143, 179, 234]5. Therefore, I derive a secondary structure from a structure alignment which is drawn and annotated further. Since bases paired in a structureS1 can be aligned to bases unpaired in a structureS2, the presenta-tion of a common secondary structure leaves some choice. For an alignment A of structures S1 and S2, I draw an RNA secondary structure “S2-at-S1” that highlights the differences as deviations ofS2 fromS1, or vice versa “S1 -at-S2”. Both are alternative visualizations of the same alignment A. The drawings can be annotated using all the information of the alignment, e.g.

show alternative base pairings as dashed lines connecting bases.

Figure 4.10 explains the visualization by an example. The visualization of local similarity follows the same strategy. If suboptimal local alignments are calculated, the local similar regions are highlighted in the original structures.

Figure 4.13 shows local similar regions of the structures in Figure 4.11 and Figure 4.12. I am sure to make comparing RNA secondary structures quite comfortable by using this visualization.

5I use an implementation of Bruccoleri et al.’sNAVIEW algorithm [14].

g c c g a a u g g cg g a aa cu g g

ua g ac g c ag u u g u a u ca aa

a uca

ac c g u

a ga a

u a a cgu

g c c g g u uc

g ga u cc g g cc u u c g g

ca cc a

(a) Leucine

g g g g c u a au c g c u ga uc

g

gg ag a g c gc

u u g a c u

gg ca u g

c a

a

gagguca g c g g u u c

g ua cc c c g uu

a g c u

c ca c c a

(b) Alanine

5’ g gc gc g ca ua uag ag c g ug ca ga a uccu g g gu

20 20

a g aac g c ga cg u u g auca u gc

ga ca a ua gu 40

c 40

a a gcacg

u a g a a

u a a cg u ag gc c g g60

u uc g ug a cu 60

c c cg g uc uc

au gu c ug cg 80

ca c c a

(c) Alanine-at-Leucine

5’ g gc gc g ca ua ag ag u c g caug gaa uccu g

ggu

20 20a gaac g c ga cg

u u g auca u gc

ga ca a ua gu 40

c 40

a a gc ac

g u

a g a a a

u a c u g ag gc c g g60

u uc g ug a cu 60

c c cg g uc uc

au gu c ug

cg 80 ca c c a

(d) Leucine-at-Alanine

Figure 4.10: Secondary structures of E.coli tRNA for leucine (Anticodon CAA) (a) and alanine (Anticodon GGC) (b), taken from the Genomic tRNA Database [116]. 2d-plot of the structure alignment of tRNAs for Alanine-at-Leucine (c) and Leucine-at-Alanine (d). The acceptor stem (red), anticodon stem (green) and TψC stem (blue) have the same length in both structures, but some differences with regard to the sequence. There are also sequence variations in the single-stranded regions, especially at the anticodon position. The visualization emphasizes this automatically by using red letters. For the double-stranded regions an accentuation of compensatory base exchanges is achieved by this presentation. Bases printed in black show structure elements that occur in both structures with the same sequence.

The CCA at the 3’ end is a typical invariant feature of tRNAs and so its printed in black. Structural elements, which can only be found in the first structure are printed in blue. Thus, the fourth base-pair in the D-stem (magenta) of tRNA for alanine is shown with the dashed blue line in the alignment. In contrast, structural elements shown in green occur only in the lysine structure. The extra stem of the leucine tRNA is highlighted that way.

102 Model

1

2

3 5’ au

g u g a u a c c c a u a g cuucc a u g agaa c a g c auggg a g u c u g g u u u c u a g a c u u g u g c u g a u c g u g c u aa ua uu u ac g u a g g gc u a c aa a ca c u g au

g u u aa aa u u c c a u ccc a u c a u c guu g u a c u a c u a g a u ug c u u u a g g c a g c a g c u u u u aa u a c a g g g u ga a u a a c

c u g u a c u uc a

a g u au a a g u g a ua a a c c

a c u u a aa a a a ug

u c c a u g

a u g g a a u a u u c c

c c u a cu u c u a g a a u u u u

a ga gu uc uu g u aa gu gg aa c gu cc u uc u cu uc ug gu u gu uu aa gu a a aa u g cu a g aa a cc ga u uua g gu aa u ag cu cu u cug

a a u c c u a a gg g uc g g cu u uc g c gu aa gg u gu

u a a ug g g uu c g uc u ca u u u ga

gu g au c c u cc

a a cu

cu a u u gu ua g c u a a ua ga ag g au a c c a g ug u g a a aga

ccu cu

cc a a

a u gag a

uuc aa gc uc uu cac u gaa ag ua ug ga ac g g u uu cc uca uuc cu

g aa a ga a a caguua

ac uuu

ca g a aga

gau g gg

c ugu uuu u c u gu cc a a u gag g uc ug a aau

gg a

ggu c c uu

c ug

cu g

g a u

aa a aug a g guu

c a a c u g

uug a

u

ugc

a gga

a ua a g

gc c auu a u a gu uu a a c c cu a ug g cu a uu au u g aa a a ag g gg g ca ac ag a g cc a aa ag uc au g u a u a u u

u u c uu u

u c c u cu g u cc c uu c

ccc c a u aa gc c ucc a uu

ua g u u

c uu u g u ua u u uuug uu uc u

ucc aa a g ca c a u

u ga

aa g a

g a a c c a g u u u c a g g

u g u u u

a g u u g c a g a uc cag uuu

g u c a g a c u u u a aa ag a au ua a gu c u cg ac aa uuuuggcc aaa

guguua au cuu a

g g g g ga a gc uu uc u g u c c uu u u g g c a c u g a ag u au uu au u g u u u ua u u a u c a g u g a c a g a g u u c ca u a u aa au

g ugg u u u u u u u aa u a g a a u a u a a u u a cu g g a a g c a gu gc

cuuc cauaau

uau g a c a ug u a ua

cu g ucg

g u u u u u u u u a a a

u a aa

agc a g c a ucug c

u aa u

aaa accca a

c aga u

a c u g g a a g u u u u g c a uu u a u gg u c a

a ca c u ua a g g g u u u u a g aa a a c ag

c gc cu ga c caa

a u gu aa u

u g a a u a a ga u u g a ag c u aa g a u u u a g a ga

u g aa u ua a a u u u a a uu a g g g g u ug c u aa g a ag c ga g c a c u g a c c a ag u a ag aa

u g c ug g u uu u c

c u a a a u g c a g u g a a u u g u g a c c aa g u u a u a aa u c aa u gu c a c u u a a a gg c u g ug

g u a g u a c u c c u g ca aaa

u uu au u a g c u c a g u u u a uc c a a g g u g

u a a c u c u aau

u c c c a u uu g c aa a a u u u c c a g u a cc u uu g u ca c a a u c c u a a c a c a u u a u cg g g a g c ag u

g cu u u c c a u a a u g u a u a a ga a a ca a g

g u a g u u u uu

a cc u a c c a c a g u gu c uggucau

gag a c a gugau c

ucc a u a u g u u a c a c u a a

g gg u g u a ag u a au

u a u c gg g a a c a g

u g u u u c c c a ua

a u u u u

c u u c a u g

c a a u

g a c a u

c uu c a a a g c uu g a a g a u c g u u ag u a u c au a c a u g au u c c ca acu c cu

a ua a u uc cc

ua uc uu u ua

g u u

uu a g u u g

ca ga

a ac

au u uu

g ug

gu c a

u u aa g ca

uu gg

gu g

g g u a aa

u u c a ca c a uc ug aa a ua g aa a u u a uc ca a aa u u uga a

a a u u u a cg u u g g g uu u uu ug u a ccug guu uau

u uc uc c a

g g ucc cu u a

c u u aa ug a g au ag

c a gc a

u a ca uu ua u aa u ug uu cg ua uu ga

c a a ug ac u u u u aa uu au cu ca ua u au uug c a u

g u u

a c c u cc

u a u aa a uc u a ug cg g ag ca ag u u u u a au cc a g a a

u ug a c c uu u u g a c u u aa a g ac g ga gg a uc u gu u a au ag ga ug uu gg gg g uc ug gg gg a a gg ga a g u c c c c u g a a g g u c u g a c a c g u c u g ccua c c

ca uuc gugg ugau c a au ua aa ug u a g g ua ug a a u aag uu cgaag

c ucc g u ag g gu aa cc a u ac au u aa ca ug g au ug ac agcugu u u

guca u aggg c a guug ga

aa gc g cc u c uc agg g aaaaguu

ca uag gg

ucuc u u c a gg uu c uu ga u ug c ca u au cc au g ua uu ca ga c c cu ca uu g aa gu u ug ac c u a c cu a caguc ucuu uaaucu u

caguuuu aucu

uua a u cu cc u cuu

u au u c u u g g ca gu caauuuagcg

u a g c u a a g u g a a a a g g cu a u a g c u g a g a u u cc u g g u u c g gg u g uu ac g cac

a cg ua

cu ua a a u g a aa

g ca ugu

g gc au

gu uca

u cg

ua u a

a c ca a a u a u g a ua ac ag g g c a u g c ua u u u g ac cg a gu

g a g uc

u cu u c ag

aa a a cc c u uuu

c u a ac g u u a g g g u u g a ug u a c u u c c u a u c aa g c c a g u a c g u cug a ac a g g cu ca au a u u c c u g a a u g a a a u a u ca g a c u ag ug ac

a ga c u c c u g g u c guu ag a u g cu u u cu c g u u a a g g a g u a g g g c c u u u u g g a g ug aa a g g u a u

Figure 4.11: Predicted structure of the human transferrin receptor 3’ UTR. The sequence was taken from the UTR database, accession number 3HSA008842 [154].

The colored regions highlight the local similar parts to the structure in Figure 4.12.

1

2

3 5’ c

c ag

a c ug cu uu gc cc ag a g ug g c c u gg gg uu c u uc cg uu a c a a c g u

g cu u g g a c gg a a c c c

g g cg c u c g u u

c c

c c a c c c cg

gc cg

gc c gc

c c a ua

gc c a g c c

c u c

c ug c a c c

uc u u c ac

c g c a c c c c u g g a

c g u ccccaag g c

c c cc g c cg c cg c uc a c c g g c c g g c c a g c c a c c cg g c cc cg gc cc g

c c u c u c c uu g a u c g

c c cg c

Figure 4.12: Predicted structure of the human ferritin 5’ UTR. The sequence was taken from the UTR database, accession number 5HSA015337 [154]. The colored regions highlight the local similar parts to the structure in Figure 4.11.

104 Model

5’ag ug agg auu u uc acu ugc

gu gu

ac a ga c

a g u20 g c 20cu u ug cg ca ac ug ag

a ua uc ac uc

(red) ferr.-at-trans.

5’ag ug agg auu u uc acu ugc

gu gu

ac a ga c

a g u20 g c cu20 u ug cg ca ac ug ag

a ua uc ac uc

(red) trans.-at-ferr.

5’gc ag c uc ag ucgc

g ac cu

ac acga c ug

c cg uc gc g 20

uc 20 cg

(green) ferr.-at-trans.

5’gc ag c uc

ag ucgc

g ac cu

ac acga c ug

c cg uc gc g 20

uc 20 cg

(green) trans.-at-ferr.

5’gu c ac guug uc ua ac

c u c

ucu cua ac uc

ag ac

a 20 c 20

uc auuc gc ug g ca

(blue) ferr.-at-trans.

5’gu c ac guug uc ua ac

c u c

ucu cua ac uc

ag ac

a 20 c 20

uc auuc gc ug g ca

(blue) trans.-at-ferr.

Figure 4.13: Local alignments of the human transferrin receptor 3’ UTR (Figure 4.11) and the human ferritin 5’ UTR (Figure 4.12). (red) shows the best scoring local alignment which is found at the positions 932 in transferrin and position 26 in ferritin. This motif is the well studied Iron Responsive Element(see 7.1). (green), and (blue) show suboptimal local alignments that were found in the structures.

These were found at the positions 2392 and 147, and 1765 and 104, respectively.

As I focus on the visualization technique here, the putative biological function of these regions is not further discussed.

4.8 Performance of Forest Alignment