• Keine Ergebnisse gefunden

Co-expression Correlation and Master Regulatory Genes

N/A
N/A
Protected

Academic year: 2022

Aktie "Co-expression Correlation and Master Regulatory Genes"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Bioinformatics III

Prof. Dr. Volkhard Helms Andreas Denger

Summer Semester 2021

Saarland University Chair for Computational Biology

Exercise Sheet 7

Due: June 10, 2021 12:00

Submit your solutions toandreas.denger@bioinformatik.uni-saarland.dewith two attachments: (1) A ZIP file containing all your source code files, potential result files, figures and whatever else is needed to generate your solution, (2) a PDF file containing your answers. Subject of the email should be in the following format: BI3 A7LastName1 LastName2.

Co-expression Correlation and Master Regulatory Genes

Exercise 7.1: Co-expression based on Correlation and Mutual Information (60 points) Mutual information (I) andPearson correlation coefficient (Corr) between two random variables are defined as:

I(X, Y) =X

x∈X

X

y∈Y

p(x, y)×log

p(x, y) p(x)×p(y)

Corr(X, Y) =

Pn

i=1(xi−µX)×(yi−µY) pPn

i=1(xi−µX)2×pPn

i=1(yi−µY)2

, wherep(x, y) is thejoint probability distributionof expression levelsx andy,p(x) is themarginal probability of expression valuex, andµX is the (arithmetic) mean expression for gene X.

(a) Calculate the Pearson correlation coefficient and mutual information for the data given below. Here, the data is comprised of two genes whose expression were measured over 6 time series. An expressed gene is denoted by value 1. Solve this task by hand.

Gene t1 t2 t3 t4 t5 t6

gene1 0 0 1 1 1 0

gene2 0 1 0 0 1 1

(b) Explain the main difference between mutual information and Pearson correlation.

(c) What is the advantage of using rank-based correlation coefficients?

(d) Write a program that reads the time-series gene expression data given in the supplement and calculates the Pearson correlation coefficients for all pairs of genes.

(e) Plot the distribution of correlation coefficients between pairs of genes, for example by using the displotfunction from the Python package seaborn. Ignore pairs that contain the same gene twice.

Interpret the shape of the plot and include it in your submission.

(f) Take a look at the correlation scores between the geneITGB2 and the other genes. Write a function that finds the gene with the:

ˆ Highest correlation toITGB2

ˆ Lowest correlation toITGB2

(2)

ˆ Correlation toITGB2 that is closest to zero

Next, for each of these three genes, create a scatter plot with a linear regression model fit between its expression values and those of ITGB2, for example with the regplot function from the Python packageseaborn.

Include the three plots in your submission and describe what you see.

Exercise 7.2: Identification of master-regulatory genes (40 points)

A

B

C

D

E

F G

H

I J N

O P

K

L

M

(a) Whichdominating sets exist in the network shown above?

(b) What is theminimum dominating set (MDS) of this network?

(c) List the following sets of nodes and their sizes:

ˆ Largest connected component in the directed graph

ˆ Largest strongly connected component in the directed graph

ˆ Largest connected component in the underlying undirected graph

Find the minimum connected dominating set (MCDS) for each of the three sets.

(d) Compare the MDS and MCDS in terms of size.

Referenzen

ÄHNLICHE DOKUMENTE

Supplemental Table: Correlation coefficients of all immune cell homing (ICH) genes included in the analysis.. Correlation coefficients sorted in

IR-transmission spectra of amazonites showing changes of the OH-stretching vibrations of amazonites before and after irradiation: (a) Green amazonite after and (b) before

We found that the presence of young adults affected the expression of all three DNMTs encoding genes early in the adult life cycle, whereas the presence of brood seemed to only

(2009) represent climate stations, located in the or near the Blue Nile Basin (in- cluding Addis Ababa, Bahar Dar, Debre Markos, Gon- dar and Gore), therefore correlation

The purpose of the present study was to compare different halitosis detection methods (organoleptic assessment, Hali- meter, Fresh Kiss, Halitox).. The Halimeter was used as a

(e) Plot the distribution of correlation coefficients between pairs of distinct genes (e.g. by using the distplot function from the Python package seaborn ). Interpret the shape of

Neither the identical processing of the PBAN precursor in all SEG neurons nor the specific peptidome of the labial pban cells was reported after mass spectrometric

If this happens, the model predicts that corruption rises. It also predicts that people would demand more regulation — a return to the point where entrepreneurial activity is banned