• Keine Ergebnisse gefunden

Gene Expression Prediction

N/A
N/A
Protected

Academic year: 2022

Aktie "Gene Expression Prediction"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Processing of Biological Data

Prof. Dr. Volkhard Helms Summer Semester 2017

Saarland University Chair for Computational Biology

Exercise Sheet 5

Due: July 11, 2017 10:15

Submit your solutions on paper, hand-written or printed at thebeginningof the lecture or in building E2 1, Room 3.03. Alternatively you may send an email with a single PDF attachment. Additionally hand in all source code via mail tomaryam.nazarieh@bioinformatik.uni-saarland.de.

Gene Expression Prediction

Exercise 5.1: Read data and Normalization (25 points)

The data given in the supplementary comprises of two already data sets for gene expression and histone modification of a mouse cell. The data is divided into two sets of training data and test data. In this assignment, we aim to predict gene expression based on histone modification.

• Read the data into a data matrix where the rows correspond to the set of genes in each sample and columns correspond to the different samples.

• Filter the data, for both expression and methylation data, by removing entries with emp- ty/NA expression and methylation values. If there are several entries with the same gene name, substitute the rows by taking the average mean expression and methylation value for each gene in every sample.

• Submit the final matrices as your solution.

Exercise 5.2: Model Prediction (50 points)

Linear regression is a method for modelling the relationship between a dependent or response variable y and one or more explanatory variables denoted by X. The model comprises a linear combination of the parameters.

Y =α+βX (1)

The above formula describes a line with slopeβ and y-interceptα. Linear regression models are often fitted using the least square approach, where the least square error is computed by the sum of the squares of the differences between the response variable y in the traing data and the best-fit regression line as follow:

SSE=

n

X

i=1

(Yi−Yi)2 (2)

Your task in this assignment is to build a linear regression model from training data (gene expres- sion and histne modification) to predict the gene expression from histone modification in the test data. (You can write a program that calculates best fittedαandβ or alternatively use one of the packages in python or R to do the task.)

• Quantify the strength of the relationship between Y and each of the explanatory variables.

• Determine which variables have no relationship with Y at all.

• Identify which subsets of the X contain redundant information about Y.

(2)

Exercise 5.3: Performance Measurement (25 points)

ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. In ROC curve the sensitivity is plotted in function of the (1−specif icity) for different cut-off points of a parameter. Your task is to evaluate the classifier in the second exercise with an ROC curve.

• Plot the ROC curve for the classifier.

• Provide the area under the curve (AUC) for the classifier.

• Provide the sensitivity and specificity for a chosen cutoff of probability=0.5.

Have fun!

Referenzen

ÄHNLICHE DOKUMENTE

For the identification of differentially expressed genes, DESeq uses a test statistics similar to Fisher‘s

If one only uses the data for a single gene i, its variance can usually not be reliably estimated due to the small number of replicates.. DESeq: detect DE genes in

If one only uses the data for a single gene i, its variance can usually not be reliably estimated due to the small number of replicates.. Bioinformatics 3 – SS 18 V 10

- Differential gene expression (DE) analysis based on microarray data - Detection of outliers.. -

Percentage  of  detected  and  returned  outliers  -­ due  to  functional  similarity  (from   GOSemSim package,  see  V8)  and  common  positions  -­ in  the

RT-PCR analysis showed that BmIDGF is expressed in all developmen- tal stages of silkworm larvae and various larvae tissues, which was further confi rmed by Western

Next, we present the funcExplorer web tool that combines gene ex- pression clustering and functional enrichment analysis using g:Profiler to detect co-expressed gene modules and

Example: If the sequence length of a (human) initial exon state is ` = 100 the first 20 bases are emitted using the translation initiation motif, the next 3 bases are emitted using