• Keine Ergebnisse gefunden

SequenceAnalysis:MultipleSequenceAlignment(MSA)andPhylogeny ExerciseSheet3 SoftwarewerkzeugederBioinformatik

N/A
N/A
Protected

Academic year: 2022

Aktie "SequenceAnalysis:MultipleSequenceAlignment(MSA)andPhylogeny ExerciseSheet3 SoftwarewerkzeugederBioinformatik"

Copied!
3
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Softwarewerkzeuge der Bioinformatik

Prof. Dr. Volkhard Helms

PD Dr. Michael Hutter, Markus Hollander, Marie Detzler

Winter Semester 2020/2021

Saarland University Center for Bioinformatics

Exercise Sheet 3

Sequence Analysis: Multiple Sequence Alignment (MSA) and Phylogeny

Learning objective: The goal is to learn how to generate multiple sequence alignments, how to interpret them e.g. regarding sequence conservation and their usefulness for different types of questions. Additionally, you are going to apply the Sankoff algorithm and learn how to work with phylogenetic trees.

Exercise 3.1: Homologous sequences, conserved domains and phylogenetic trees Tools for generating multiple alignments: http://www.ebi.ac.uk/Tools/msa

a) Save the sequence of proteinQ38856together with 9 homologous sequences in multi–fasta–

format.

b) Find highly conserved parts of the sequences with a tool of your choice.

c) Do all amino acids have to be highly conserved in order to conclude that the proteins are homologous?

d) Let’s assume that you want to locate the active centre of a protein but only have the protein sequence without the corresponding structure. How can a multiple sequence alignment help you to solve this problem?

e) Generate a multiple sequence alignt of 50 homologous sequences with the same tool.

f) What differences do you observe between the two alignments?

g) Look at the phylogenetic tree of the sequences in 3.1.e) and find three biological groups (plants, fungi and animals).

(2)

Exercise 3.2: Comparison of various tools

The following multiple sequence alignments were generated with different tools:

Tool Protein Alignment

ClustalW

FOS Rat MMF S GFNADYEAS S SRCSSASPAGDSL SYYHSPADSF S SMGS PVNTQDFC MMF S GFNADYEAS S SRCSSASPAGDSL SYYHSPADSF S SMGS PVNTQDFC MMYQGFAGEYEAP S SRCSSASPAGDSLTYYPSPADSF S SMGS PVNSQDFC – MFQAFPGDYDS – GSRCSS– SP S AESQ – –YLSSVDSFGS P PTAAASQE –C – MFQAFPGDYDS – GSRCSS– SP S AESQ – –YLSSVDSFGS P PTAAASQE –C

* : . . * . : * : : . ** * ** * * : . : * * * . . * * * . * : .. : * : * FOS MOU

FOS CHIC FOSB MOU FOSB HU

MAFFT

FOS Rat MMF S GFNADYEAS S SRCSSASPAGDSLS YYHSPADSF S SMGS PVNTQDFC MMF S GFNADYEAS S SRCSSASPAGDSLS YYHSPADSF S SMGS PVNTQDFC MMYQGFAGEYEAP S SRCSSASPAGDSLTYYPSPADSF S SMGS PVNSQDFC – MFQAFPGDYD– SGSRCSS– SP S AES– – QYLSSVDSFGS P PTAAASQE –C – MFQAFPGDYD– SGSRCSS– SP S AES– – QYLSSVDSFGS P PTAAASQE –C

* : . . * . : * : . . ** * ** * * : . : * * * . . * * * . * : . . : * : * FOS MOU

FOS CHIC FOSB MOU FOSB HU

MUSCLE

FOS Rat MMF S GFNADYEAS S SRCSSASPAGDSL SYYHSPADSF S SMGS PVNTQDFC MMF S GFNADYEAS S SRCSSASPAGDSL SYYHSPADSF S SMGS PVNTQDFC MMYQGFAGEYEAP S SRCSSASPAGDSLTYYPSPADSF S SMGS PVNSQDFC – MFQAFPGDYD– SGSRCSS– SP S AESQ – –YLSSVDSFGS P PTAAASQE –C – MFQAFPGDYD– SGSRCSS– SP S AESQ – –YLSSVDSFGS P PTAAASQE –C

* : . . * . : * : . . ** * ** * * : . : * * * . . * * * . * : .. : * : * FOS MOU

FOS CHIC FOSB MOU FOSB HU

Clustal Omega

FOS Rat MMF S GFNADYEA S SSRCSSASPAGDSL S YYHSPADSF S SMGS PVNTQDFC MMF S GFNADYEA S SSRCSSASPAGDSL S YYHSPADSF S SMGS PVNTQDFC MMYQGFAGEYEAPSSRCSSASPAGDSLTYYPSPADSF S SMGS PVN S QDFC – MFQAFPGDYDS GS–RCSSS PSA – – –ESQYLSSVDSFGS P PTA– AA S QEC – MFQAFPGDYDS GS–RCSSS PSA – – –ESQYLSSVDSFGS P PTA– AA S QEC

* : . . * . : * : : * * * **: * : * * . * * * . * : : . : * FOS MOU

FOS CHIC FOSB MOU FOSB HU

Compare the MSAs.

a) Are there differences regarding the gap arrangement?

b) Does this change the degree of conservation in the coloured columns?

Exercise 3.3: Conserved motifs

Use Clustal Omega to generate a multiple sequence alignment of the sequences provided on the lecture website (sequences1.fasta). Locate uninterrupted, highly conserved areas of at least length 10 and save them as potential motifs for exercise sheet 4 (based on FOSB MOUSE).

Exercise 3.4: Outgroup

a) Generate an MSA of the sequences provided on the lecture website (sequences2.fasta).

b) Is everything conserved?

c) Which species differs from the rest?

d) Construct a phylogenetic tree.

(3)

Exercise 3.5: Sankoff algorithm

Which base was likely in the ancestor sequence at the given position of the alignment? Use the Sankoff algorithm and the given cost function.

A C G T

G C

A T A C G T

T G C

A A C G T A C G T

r

v10 v11

v7 v8 v9

v1 v2 v3 v4 v5 v6

l1 l2 l3 l4 l5 l6 l7 l8 l9 l10 l11 l12 l13

{G} {A} {T} {A} {C} {T} {G} {A} {T} {A} {A} {G} {G}

A C G T

A C T

A C G T

A C G T

A C G T

A C G T

C G T

−→Base in the ancestor sequence:

Cost function:

A C G T

A 0 2 1 2

C 2 0 2 1

G 1 2 0 2

T 2 1 2 0

Have fun!

Referenzen

ÄHNLICHE DOKUMENTE

• Our approach enables the visual comparison of large sets of al- ternative MSAs on global and local levels in order to assess the alignment quality and the impact of different

Although morphological systems are relatively simple, their properties embody many characteristics of language as a whole: they are systematic (the formation of

We have here studied vinyl benzoate- or vinylpyridine-bridged diruthenium divinyl complexes with the aim of elucidating the identity of the primary redox site and

c) On the website http://www.ebi.ac.uk/Tools/msa you can find different tools for generat- ing multiple sequence alignments. Select a tool and apply it with default parameters to

Figure 3: Some Google Scholar features we would like to highlight: back-references (find all papers that cite a given article) and BibTeX export (you may need to enable this in

• BLOSUM matrices are based on local alignments from protein families in the BLOCKS database. • Original paper: (Henikoff S & Henikoff JG, 1992;

• Answer: align all n sequences to the profile using the Viterbi algorithm most probable state paths for all sequences.. • Characters aligned to the same match state are aligned

513,372 users have no incoming and 970,305 users no outgoing friendship links, while the average number of reciprocal friendships per user is 13.6.. The average number of