• Keine Ergebnisse gefunden

Co-expressionCorrelationandMasterRegulatoryGenes ExerciseSheet7 BioinformaticsIII

N/A
N/A
Protected

Academic year: 2022

Aktie "Co-expressionCorrelationandMasterRegulatoryGenes ExerciseSheet7 BioinformaticsIII"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Bioinformatics III

Prof. Dr. Volkhard Helms Andreas Denger

Winter Semester 2019/2020

Saarland University Chair for Computational Biology

Exercise Sheet 7

Due: Dec 12, 2019 14:15

Submit your solutions on paper, hand-written or printed at the beginning of the lecture. Alterna- tively, you can send an email with a single PDF attachment toandreas.denger@bioinformatik.uni- saarland.de. Your submission should include code listings for programming exercises. Additionally, hand in a .zip file with your source code via email.

Co-expression Correlation and Master Regulatory Genes

Exercise 7.1: Identification of master-regulatory genes (40 points)

A

B

C

D

E

F G

H

I J N

O P

K

L

M

(a) Whichdominating sets exist in the network shown above?

(b) What is theminimum dominating set (MDS) of this network?

(c) List the following sets of nodes and their sizes:

• Largest connected component in the directed graph

• Largest strongly connected component in the directed graph

• Largest connected component in the underlying undirected graph

Find the minimum connected dominating set (MCDS) for each of the three sets.

(d) Compare the MDS and MCDS in terms of size and write a short conclusion.

(2)

Exercise 7.2: Co-expression based on Correlation and Mutual Information (60 points) Mutual information (I) andPearson correlation coefficient (Corr) between two random variables are defined as:

I(X, Y) =X

x∈X

X

y∈Y

p(x, y)×log

p(x, y) p(x)×p(y)

Corr(X, Y) =

Pn

i=1(xi−µX)×(yi−µY) pPn

i=1(xi−µX)2×pPn

i=1(yi−µY)2

, wherep(x, y) is thejoint probability distributionof expression levelsx andy,p(x) is themarginal probability of expression valuex, andµX is the (arithmetic) mean expression for gene X.

(a) Calculate the Pearson correlation coefficient and mutual information for the data given below. Here, the data is comprised of two genes whose expression were measured over 6 time series. An expressed gene is denoted by value 1. Solve this task by hand.

Gene t1 t2 t3 t4 t5 t6

gene1 0 0 1 1 1 0

gene2 0 1 0 0 1 1

(b) Explain the main difference between mutual information and Pearson correlation.

(c) What is the advantage of using rank-based correlation coefficients?

(d) Write a program that reads the time-series gene expression data given in the supplement and calculates the Pearson correlation coefficients for all pairs of genes.

(e) Plot the distribution of correlation coefficients between pairs ofdistinct genes (e.g. by using thedistplotfunction from the Python packageseaborn).

Interpret the shape of the plot and include it in your submission.

(f) Take a look at the correlation scores between the geneMCTS1 and the other genes. Write a function that finds the gene with the:

• Highest correlation toMCTS1

• Lowest correlation toMCTS1

• Correlation toMCTS1 that is closest to zero

Next, for each of these three genes, create a scatter plot with a linear regression model fit between its expression values and those ofMCTS1 (e.g. with theregplotfunction from the Python packageseaborn).

Include the three plots in your submission and describe what you see.

Referenzen

ÄHNLICHE DOKUMENTE

The visibility was impaired by light snowfall when we could observe a layer of grease ice to cover the sea surface.. It was swaying in the damped waves and torn to pieces by

Contrary to most of the previous correlation studies of many assets, we do not use rolling correlations but the DCC MV-GARCH model with the MacGyver strategy proposed by

KS statistic for the candidate distributions (tail). Minimum values in bold indicate the best tting distribu- tion. Asterisks indicate non-rejection of this distribution at the

The algorithm computes an approximation of the Gaussian cumulative distribution function as defined in Equation (1). The values were calculated with the code taken

This modified form of operators preserves constants as well as the exponential function exp A , but loose to preserve the linear functions... Funding Open Access funding

Additionally, missForest provides an OOB imputation error estimate which can be ex- tracted using the same $ notation as with the imputed data matrix:P.

Com base no capítulo introdutório, mais especificamente no Gráfico 1.2, observa-se que entre os anos de 2002 (ano base da matriz de insumo-produto estimada neste trabalho) a 2006

This model, which proposes a balance between various institutional tensions, has been worked out inductively from a longitudinal case study on communication fields within the