Vision - Future Perspectives - Concluding Remarks

6 Concluding Remarks

6.2 Future Perspectives

6.2.2 Vision

This work contributed to the dissection of the RLS genetics. In the future, it might add to the definition of a genetic RLS pathway and as a consequence to the definition of putative genetic RLS subtypes. This might enhance the diagnosis and the treatment of the restless legs syndrome.

Acknowledgements

This doctoral thesis would not have been possible without the support of many people.

At first, I want to thank my supervisors for their constant support: Prof. Dr. Juliane Winkelmann was my supervisor at the Helmholtz Zentrum München and made this work possible. She was always open for enthusiastic discussions, might it be day or night, and even when being on different continents for many months. Prof. Dr. Hans-Rudolf Fries was my advisor at the Technical University of Munich and I have to thank for all of his support. My third advisor Prof. Dr. Bertram Müller-Myhsok aroused my interests in statistics and computer science, and the discussions with him were a gain for the quality of this thesis. It was a pleasure to be a guest in his group for a long period of time during the work on this thesis. I also have to thank Prof. Dr. Thomas Meitinger, my forth advisor, for his special support for the MIPseq project and many interesting discussions.

I also want to thank Dr. Barbara Schormair for many helpful discussions, her expertise, sharing data and results from the parallel meta-GWAS and support for the MIPseq project. The MIPseq project was also supported by Ana Antic as well as Jelena Golic and our technical assistants in the processes of preparing the genomic DNA, MIPseq libraries and pooling of thousands of MIPs. Thanks also have to go to Dr. Aaro Salminen and Ana Antic for assistance in the review of hundreds of sample records for the family project. But I also want to thank Dr. Peter Lichtner, Milena Radivojkov-Blagojevic, Dr. Gertrud Eckstein and Jelena Golic for their expertise and support with regard to genotyping arrays and next generation sequencing.

The MIPseq project was also supported by Prof. Dr. Alexander Hoischen and his team, Marloes Steehouwer, Christian Gilissen Ph.D. and Maartje van de Vorst. I have to thank for a warmly welcome in their laboratory in Nijmegen and supporting the project with sample MIPs and expertise as well as protocols. I also want to mention that the graduate school HELENA supported the week in Nijmegen financially.

I have to thank PD Dr. Tim Strom and his team, especially Dr. Thomas Wieland, Dr. Riccardo Berutti, Elisabeth Graf and Sandy Loesecke, but also all the others, for discussions and support and for sequencing the MIPseq libraries on their machines.

Many collaborators also contributed to the work, e.g. by providing samples. That is why I want to thank them all as well.

Special thanks go to Dr. Benno Pütz, who was member of Prof. Dr. Bertram Müller-Myhsok’s group.

The discussions with him helped to gain the know-how in R and Linux, which was needed to perform many parts of this thesis. I also want to thank all the others from Prof. Dr. Bertram Müller-Myhsok’s group for helpful discussions.

Many thanks go to all the other who supported my thesis, but who might not have been mentioned due to limited space. For example many collaborators provided samples that were used in these studies.

I also thank my family for their support during this work, especially three individuals: First, my brother Dr. Sebastian Tilch. Second, my wife Irene Tilch. Without her love and support, not only this thesis would have been impossible, and I dedicate this work to her. And third, I thank my seven months old son Lucius, who supported me in the final phase of this thesis. He showed how incredible genetics are.

Appendix A – MIPseq Primers

The MIPseq primer sequences [78] are listed in the following table:

Table 31: MIPseq PCR and sequencing primers

Primer Sequence Usage

Appendix A – MIPseq Primers

continued table…

Primer Sequence Usage

SLXA_PE_MIPBC2_REV_377 CAAGCAGAAGACGGCATACGAGATCTAAGTTAACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_378 CAAGCAGAAGACGGCATACGAGATGTTAGCCTACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_379 CAAGCAGAAGACGGCATACGAGATTTCGTGAGACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_380 CAAGCAGAAGACGGCATACGAGATAGTCTCTTACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_381 CAAGCAGAAGACGGCATACGAGATGGATATACACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_382 CAAGCAGAAGACGGCATACGAGATAAGACGCTACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_383 CAAGCAGAAGACGGCATACGAGATGTGTCTGAACACGCACGATCCGACGGTAGTGT PCR SLXA_PE_MIPBC2_REV_384 CAAGCAGAAGACGGCATACGAGATAGAGTGCAACACGCACGATCCGACGGTAGTGT PCR

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

The following functions were written in R [4]. They were used to process mipgen [399] output and MIP design outputs from the Nijmegen platform. In detail, they enable the combination of different MIP designs, their quality control as well as annotation and automated iterative MIP designs. The function “iterative_redesign.fct” needs to specify the path to the mipgen [399] design software, the path to a wrapper (shell script, see appendix) for the mipgen [399] design software, an indexed reference genome and a compressed vcf file of the SNPs to be considered in the MIP design.

# function to check for factors//////////////////////////////////////

data_contains_factor.fct <- function(data.x) { if (nrow(data.x) == 0) {

return(0) }

is_factor.v <- apply(as.matrix(1:ncol(data.x)), 1, function(index.s) { is.factor(data.x[, index.s])

})

if (any(is_factor.v)) {

warning("Data contains factors!", immediate. = TRUE) return(1)

} else { return(0) }

}

# function to collapse a bed matrix/df///////////////////////////////

collapse_bed.fct <- function(bed.df, integer.merge.mode = TRUE, buffer.region = 0, verbose = T, feed = "\n", ...) {

if (verbose) {

cat(feed, "[collapse_bed.fct]: beginning") }

# DESCRIPTION: bed.df should be a df with at least three # columns (col 1 = chr..., col 2 & 3 = positions, col 4 = # annotation)

# if integer merge mode, than merging of

# neighbouring integer positions sould be done e.g range 1 to # 4 and 5 to 6 will be merged to 1 to 6

colnames(bed.df) <- c("chr", "start", "stop", "anno") bed.df$chr <- as.character(bed.df$chr)

bed.df$start <- as.numeric(bed.df$start) bed.df$stop <- as.numeric(bed.df$stop) bed.df$anno <- as.character(bed.df$anno) # perform region merging

# order the entries according to positions if (verbose) {

cat(feed, "[collapse_bed.fct]: sorting positions") }

for (index in 1:nrow(bed.df)) {

positions.v <- bed.df[index, c("start", "stop")]

bed.df[index, c("start", "stop")] <-

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

positions.v[order(positions.v)]

}

bed.df <- bed.df[order(bed.df$chr, bed.df$start, bed.df$stop), ]

# if just one line is supplied, than return the bed.df if (nrow(bed.df) == 1) {

return(bed.df) }

# asign overlaps, get vector hinting to overlapping regions if (verbose) {

cat(feed, "[collapse_bed.fct]: searching overlaps for region ", feed) # important, because a subsequent querried region might # intersect with the maximum extend, but not with the

# previously querried member and a new region might be opened max.stop.current_region.s <- max(bed.df[region.v %in%

zaehler.region.s, "stop"])

max.stop.current_region.s + buffer.region & bed.df[index, "stop"] + buffer.region >= bed.df[index - 1,

cat(feed, "[collapse_bed.fct]: collapsing regions ") }

new.bed.df <- as.data.frame(t(apply(matrix(1:zaehler.region.s), 1, function(region.s) {

auswahl.v <- region.v %in% region.s

minimal.start <- min(bed.df[auswahl.v, "start"], na.rm = T)

maximal.stop <- max(bed.df[auswahl.v, "stop"], na.rm = T) collapsed.chr <- unique(bed.df[auswahl.v, "chr"])

collapsed.anno <- paste(unique(bed.df[auswahl.v, "anno"]), collapse = ",")

# NOTE: Here just characters are returned!!!

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

return(c(collapsed.chr, minimal.start, maximal.stop, collapsed.anno))

})), stringsAsFactors = F)

colnames(new.bed.df) <- c("chr", "start", "stop", "anno") # return result

new.bed.df$start <- as.numeric(new.bed.df$start) new.bed.df$stop <- as.numeric(new.bed.df$stop) if (verbose) {

cat(feed, "[collapse_bed.fct]: done", feed) }

return(new.bed.df) }

# function to compare the coverage of one bed df with another////////

uncovered_regions_of_bed1_by_bed2.fct <- function(bed1.df, bed2.df, verbose = T, feed = "\n", ...) {

if (verbose) {

cat(feed, "[uncovered_regions_of_bed1_by_bed2.fct]: beginning") }

# DATA CHECK:

data_contains_factor.fct(bed1.df) data_contains_factor.fct(bed2.df) if (nrow(bed1.df) == 0) {

warning("bed1.df has no rows", immediate. = TRUE) return(NULL)

}

if (nrow(bed2.df) == 0) {

warning("bed2.df has no rows", immediate. = TRUE) return(NULL)

}

# DESCRIPTION: bed.df should be a df with at least three # columns (col 1 = chr..., col 2 & 3 = positions, col 4 = # annotation) NOTE: This function checks whether regions from # bed1 are completely covered by regions from bed2 and return # a report for each region NOTE: Start and STOP positions # should be ordered

if (verbose) {

cat(feed, "[uncovered_regions_of_bed1_by_bed2.fct]: preparing") }

if (ncol(bed1.df) == 3) { bed1.df$anno <- NA }

colnames(bed1.df) <- c("chr", "start", "stop", "anno") bed1.df$chr <- as.character(bed1.df$chr)

colnames(bed2.df) <- c("chr", "start", "stop", "anno") bed2.df$chr <- as.character(bed2.df$chr)

cat(feed, "[uncovered_regions_of_bed1_by_bed2.fct]:", "calculating ")

}

result.uncovered <- apply(as.matrix(1:nrow(bed1.df)), 1, function(index.s) {

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

sub.bed2.df <- sub.bed2.df[order(sub.bed2.df$start), ]

uncovered.between <- apply(matrix(c(2:nrow(sub.bed2.df))), 1, function(index.s) {

uncovered.between.df <- do.call(rbind, uncovered.between) if (nrow(uncovered.between.df) != 0) {

if (sub.bed2.df[nrow(sub.bed2.df), "stop"] <

data.bed.df$stop) {

uncovered.df <- rbind(rbind(uncovered.start.df,

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

cat(feed, "[uncovered_regions_of_bed1_by_bed2.fct]: done ") }

return(result.uncovered) }

# function for missingness stats for a specific bed /////////////////

coverage_stats.fct <- function(bed.df, uncovered_regions.bed.df, verbose = TRUE, feed = "\n", ...) {

if (verbose) {

cat(feed, "[coverage_stats.fct]: beginning") }

warning("[coverage_stats.fct]: bed.df is null") return(NULL)

}

if (ncol(bed1.df) == 3) {

warning("[coverage_stats.fct]: no annotation supplied...", "treating supplied regions of interest as one gene", immediate. = TRUE)

bed1.df$anno <- NA }

colnames(bed1.df) <- c("chr", "start", "stop", "anno") bed1.df$chr <- as.character(bed1.df$chr)

missingness.df <- as.data.frame(t(apply(

as.matrix(unique(bed1.df$anno)), 1, function(anno.s) { total.target.s <- sum(apply(

warning("[coverage_stats.fct]: no annotation supplied...", "treating supplied uncovered regions of interest as one gene", immediate. = TRUE)

bed2.df$anno <- NA }

colnames(bed2.df) <- c("chr", "start", "stop", "anno") bed2.df$chr <- as.character(bed2.df$chr)

bed2.df$start <- as.numeric(bed2.df$start) bed2.df$stop <- as.numeric(bed2.df$stop) bed2.df$anno <- as.character(bed2.df$anno)

# the input df should contain 4 columns: 1 = chr, 2 = start, # 3 = stop, 4 = anno, data merging is based on anno!!! bed1 # is the desired target, bed2 the missing regions

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

missingness.df <- as.data.frame(t(apply(

as.matrix(unique(bed1.df$anno)), 1, function(anno.s) { total.target.s <- sum(apply(

colnames(missingness.df) <- c("anno", "target_size", "missing_size", "missing_percent")

missingness.df$target_size <- as.numeric(

cat(feed, "[coverage_stats.fct]: done") }

return(missingness.df) }

# Function for redundancy check for a bed file //////////////////////

redundancy_check.fct <- function(bed.df, verbose = T, feed = "\n", verbose_output_information = FALSE, ...) {

if (verbose) {

cat(feed, "[redundancy_check.fct]: beginning") }

# DATA CHECK:

data_contains_factor.fct(bed.df)

# DESCRIPTION: bed.df should be a df with at least three # columns (col 1 = chr..., col 2 & 3 = positions, col 4 = # annotation) if integer merge mode, than merging

# neighbouring integer positions sould be done e.g range 1 to # 4 and 5 to 6 will be merged to 1 to 6 prepare the bed.df

colnames(bed.df) <- c("chr", "start", "stop", "anno") bed.df$chr <- as.character(bed.df$chr)

bed.df$start <- as.numeric(bed.df$start) bed.df$stop <- as.numeric(bed.df$stop) bed.df$anno <- as.character(bed.df$anno)

# order the bed file according to chromosome and start # position

bed.df <- bed.df[order(bed.df$chr, bed.df$start, bed.df$stop), ]

# check the number of overlaps up and downstream of a region nrow.bed.df <- nrow(bed.df)

number.overlaps.x <- apply(

matrix(1:nrow.bed.df), 1, function(index) { if (verbose) {

cat("\r") cat(index) cat("/")

cat(nrow.bed.df)

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

}

# check, how many regions unequal the index region share a # region with the index region. NOTE: just check the

# surrounding 5 regions to speed up the analysis

auswahl.v <- bed.df[-index, "chr"] %in% bed.df[index, "chr"] & bed.df[-index, "start"] <= bed.df[index, "stop"] & bed.df[-index, "stop"] >= bed.df[index, "start"]

if (verbose_output_information) {

# get a vector of whether the overlap is up- or downstream downstream_overlap.v <- bed.df[-index, "start"] >=

bed.df[index, "start"]

upstream_overlap.v <- bed.df[-index, "stop"] <= bed.df[index, "stop"] overlaps_querry_upstream = upstream_overlap.v[auswahl.v], overlaps_querry_downstream =

cat(feed, "[redundancy_check.fct]: done") }

return(number.overlaps.x) } else {

number.overlaps.m <- t(number.overlaps.x)

# create an output dataframe with the input data.frame new.bed.df <- bed.df

new.bed.df$further_overlap_count <- as.numeric(

number.overlaps.m[, 1])

new.bed.df$further_overlap_anno <- as.character(

number.overlaps.m[, 2]) if (verbose) {

cat(feed, "[redundancy_check.fct]: done") }

return(new.bed.df) }

}

# Function for check for steric interaction of MIPs /////////////////

steric_check.fct <- function(mip.df, skip.SNP.based.duplicates.s = TRUE, verbose = TRUE, feed = "\n", ...) { if (verbose) {

cat(feed, "[steric_check.fct]: beginning") }

# DATE CHECK:

data_contains_factor.fct(bed.df) if (nrow(mip.df) == 0) {

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

return(0) }

# DESCRIPTION: NOTE: This function returns the MIPs, which # show steric (thus physical strand) overlap with other MIPs.

# create the maximal extend of a MIP

start_stop.mip.m <- t(apply(as.matrix(mip.df[, c("lig_probe_start", "lig_probe_stop", "ext_probe_start", "ext_probe_stop")]),

1, function(x) {

return(c(min(as.numeric(x)), max(as.numeric(x)))) }))

# for each mip, check overlaps from the same strand

resultat.steric_analysis.list <- apply(as.matrix(1:nrow(mip.df)), 1, function(index.mip.s) {

# select the strand

auswahl.strand.v <- mip.df[, "probe_strand"] %in%

mip.df[index.mip.s, "probe_strand"]

auswahl.steric.v <- auswahl.strand.v & auswahl.chr.v &

auswahl.overlap.v

mip.df[index.mip.s, "mip_target_start_position"] &

mip.df[, "mip_target_stop_position"] %in%

output.df <- data.frame(steric_count = sum(auswahl.steric.v), steric_mips_index = paste(mip.df[auswahl.steric.v,

"X.mip_pick_count"], collapse = ","), stringsAsFactors = FALSE)

return(output.df) })

if (is.list(resultat.steric_analysis.list)) { resultat.steric_analysis.list <- do.call(rbind, resultat.steric_analysis.list)

}

if (verbose) {

cat(feed, "[steric_check.fct]: done") }

return(resultat.steric_analysis.list) }

# Function for removal of far intronic MIPs /////////////////////////

remove_far_intronic_MIPs.fct <- function(mip.df, CNV.mode = FALSE, verbose = TRUE, feed = "\n", ...) {

if (verbose) {

cat(feed, "[remove_far_intronic_MIPs.fct]: beginning") }

# DATA CHECK:

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

data_contains_factor.fct(mip.df) if (nrow(mip.df) == 0) {

return(NULL) }

# DESCRIPTION: input is the MIP design output df

# create a internal feature (= region of interest) identifier ident.feature.v <- apply(

as.matrix(1:nrow(mip.df)), 1, function(index.s) {

paste(mip.df[index.s, c("chr", "feature_start_position", "feature_stop_position")], collapse = "_")

})

# for each feature, check whether its MIPs target region is # overlapping

if (verbose) {

cat(feed, "[remove_far_intronic_MIPs.fct]: filtering feature") }

length.unique.ident.feature.v <- length(unique(ident.feature.v)) selected.mips <- apply(as.matrix(unique(ident.feature.v)), 1, function(feature.s) {

if (verbose) { cat("\r")

cat(feature.s, "/", length.unique.ident.feature.v, ": ")

sub.mip.df$mip_target_start_position,

sub.mip.df$mip_target_stop_position), ] mip.is.upstream.of.feature.logic <- as.numeric(

sub.mip.df$mip_target_start_position) <

as.numeric(sub.mip.df$feature_start_position) &

as.numeric(sub.mip.df$mip_target_stop_position) <

as.numeric(sub.mip.df$feature_stop_position) if (any(mip.is.upstream.of.feature.logic)) { start.index.s <- max(

c(1:nrow(sub.mip.df))[mip.is.upstream.of.feature.logic]) # check for same targeting mips

# and resize the selection (because those might be SNP mips) start.index.s <- (c(1:nrow(sub.mip.df))[

(sub.mip.df$mip_target_start_position ==

sub.mip.df[start.index.s, "mip_target_start_position"]) &

(sub.mip.df$mip_target_stop_position ==

sub.mip.df[start.index.s, "mip_target_stop_position"])]) } else {

start.index.s <- NULL }

mip.is.downstream.of.feature.logic <- as.numeric(

sub.mip.df$mip_target_start_position) >

as.numeric(sub.mip.df$feature_start_position) &

as.numeric(sub.mip.df$mip_target_stop_position) >

as.numeric(sub.mip.df$feature_stop_position) if (any(mip.is.downstream.of.feature.logic)) { stop.index.s <- min(

c(1:nrow(sub.mip.df))[mip.is.downstream.of.feature.logic]) # check for same targeting mips and

# resize the selection (because those might be SNP mips) stop.index.s <- (c(1:nrow(sub.mip.df))[

(sub.mip.df$mip_target_start_position ==

sub.mip.df[stop.index.s, "mip_target_start_position"]) &

(sub.mip.df$mip_target_stop_position ==

sub.mip.df[stop.index.s, "mip_target_stop_position"])]) } else {

stop.index.s <- NULL }

# get mips with target region totally in feature target.in.feature.logic <- (as.numeric(

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

sub.mip.df$mip_target_start_position) >=

as.numeric(sub.mip.df$feature_start_position) &

as.numeric(sub.mip.df$mip_target_stop_position) <=

as.numeric(sub.mip.df$feature_stop_position)) | (as.numeric(sub.mip.df$mip_target_start_position) <=

as.numeric(sub.mip.df$feature_start_position) &

as.numeric(sub.mip.df$mip_target_stop_position) >=

as.numeric(sub.mip.df$feature_stop_position)) index.target.in.feature.v <- c(

1:nrow(sub.mip.df))[target.in.feature.logic]

# select the mips within or just at the border of the feature auswahl.v <- unique(c(start.index.s, index.target.in.feature.v, stop.index.s))

auswahl.v <- auswahl.v[order(auswahl.v)]

if (!CNV.mode) {

selected.mips <- do.call(rbind, selected.mips) }

if (verbose) {

cat(feed, "[remove_far_intronic_MIPs.fct]: done") }

return(selected.mips) }

# Function for getting estimates for MIPs sequencing approaches /////

setup.sequencing.fct <- function(number.mips.s, size.samples.s, gb_per_lane = 45, coverage.s = 400, read.s = 200,

max.multiplex.s = 384,

sequencing.time.s = 4, exact_values.s = FALSE) {

# this function helps to estimate the experimental sequencing # setup after MIP design calculating the multiplex level

level.multiplex.s <- floor((gb_per_lane * 1e+09)/(number.mips.s * coverage.s * read.s))

if (level.multiplex.s > max.multiplex.s) { level.multiplex.s <- max.multiplex.s }

# calc sequencing size

seq_size.s <- level.multiplex.s * read.s * coverage.s * number.mips.s

# calc required number of lanes and flow cells and sequencing # time

number.lanes.s <- if (exact_values.s) { seq_size.s/(gb_per_lane * 1e+09) # (size.samples.s/level.multiplex.s) } else {

ceiling(seq_size.s/(gb_per_lane * 1e+09)) }

number.flow_cells.s <- if (exact_values.s) { (number.lanes.s/8)

} else {

ceiling(number.lanes.s/8) }

number.days.s <- sequencing.time.s * ceiling(number.flow_cells.s) # calc number of lanes with distributing maximal multiplex

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

distributed_multiplex_lanes <- if (exact_values.s) {

(((number.mips.s * coverage.s * read.s * max.multiplex.s)/

(1e+09))/gb_per_lane) } else {

ceiling(((number.mips.s * coverage.s * read.s * max.multiplex.s)/

(1e+09))/gb_per_lane) }

all_samples_distributed_multiplex_lanes <- if (exact_values.s) { distributed_multiplex_lanes * (size.samples.s/max.multiplex.s) } else {

distributed_multiplex_lanes * ceiling(

size.samples.s/max.multiplex.s) }

return(data.frame(Multiplex_Level = level.multiplex.s, sequencing_output_bases = seq_size.s,

Required_total_lanes_at_multiplex_level = number.lanes.s, Required_Total_Flow_Cells = number.flow_cells.s,

Required_Sequencing_Days_at_multiplex_level = number.days.s, number_distributed_lanes_at_full384_multiplex_level =

distributed_multiplex_lanes,

all_samples_required_lanes_at_full384_multiplex_level = all_samples_distributed_multiplex_lanes,

stringsAsFactors = FALSE)) }

# function to create ucsc tracks ////////////////////////////////////

create_feature_track_bed.fct <- function(mip.df = NULL,

stop("No general track name supplied!") }

if (is.null(mip.df)) {

stop("No MIP df supplied!") }

if (is.null(file.s)) {

stop("No destination file supplied!") }

# try to detect a logistic score and output that score rather # than a nijmegen -1 to 5 score

if (!is.null(mip.df$logistic_score)) {

mip.df$rank_score <- mip.df$logistic_score }

# This functon reformates the mip dataframe to an ucsc track # bed file

# create an identfier for each mip including the mip # index, the quality and the feature, the strand & a note feature.v <- apply(

as.matrix(as.numeric(as.factor(apply(as.matrix(mip.df[,

c("chr", "feature_start_position", "feature_stop_position")]), 1, function(x) {

paste(x, collapse = "_") })))), 1, function(x) {

paste(c("feat", "#", as.character(x)), collapse = "") })

id.v <- apply(as.matrix(mip.df$X.mip_pick_count), 1, function(x) { paste(c("ID#", x), collapse = "")

})

rankscore.v <- apply(as.matrix(mip.df$rank_score), 1, function(x) { paste(c("RankScore:", x), collapse = "")

})

strand.v <- apply(as.matrix(mip.df$probe_strand), 1, function(x) { paste(c("Strand:", x), collapse = "")

})

notes.v <- apply(as.matrix(mip.df$notes), 1, function(x) {

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

paste(c("Note:", x), collapse = "") })

ident.v <- apply(matrix(c(feature.v, id.v, rankscore.v, strand.v, notes.v), ncol = 5), 1, function(x) {

as.matrix(1:length(unique.feature.v)), 1, function(x) {

paste(as.character(sample(1:200, 3, replace = TRUE)), collapse = ",")

})

colors.feature.v <- apply(as.matrix(feature.v), 1, function(feature.s) { return(colors.unique.feature.v[unique.feature.v %in%

feature.s]) })

# create output df

chr.v <- apply(as.matrix(mip.df$chr), 1, function(chr.s) { paste("chr", as.character(chr.s), sep = "")

})

start_stop.m <- t(apply(as.matrix(mip.df[, c("ext_probe_start", "ext_probe_stop", "lig_probe_start", "lig_probe_stop")]), 1, function(x) {

return(c(min(x), max(x))) }))

# NOTE: coordinates have to be one-based and the ‘end’

# coordinate always hints to the first base, NOT being of the # specific interest

output.df <- data.frame(chr = chr.v, start = start_stop.m[,

1] - 1, stop = start_stop.m[, 2], ident = ident.v, unknown = 100, strand = mip.df$probe_strand, target_start =

mip.df$mip_target_start_position -

1, target_stop = mip.df$mip_target_stop_position, color = colors.feature.v)

# create the plus track

auswahl.plus.v <- mip.df$probe_strand %in% "+"

spec.track.s <- paste(c("track name=", name.s, "_plus itemRgb=on\n"),

collapse = "") sink(file = file.s) cat(spec.track.s) sink()

write.table(file = file.s, output.df[auswahl.plus.v, ], col.names = F,

row.names = F, sep = "\t", quote = FALSE, append = T) # and for the minus track

spec.track.s <- paste(c("track name=", name.s, "_minus itemRgb=on\n"),

collapse = "")

sink(file = file.s, append = TRUE) cat(spec.track.s)

sink()

write.table(file = file.s, output.df[!auswahl.plus.v, ], col.names = F, row.names = F, sep = "\t", quote = FALSE,

remap_based_on_mip_target = FALSE, verbose = TRUE, feed = "\n", ...) {

if (verbose) {

cat(feed, "[remap_mips_to_bed.fct]: beginning") }

# DATA CHECK:

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

# nijmegen like mip dataframe) to the desired regions in a # bed file (col 1= chromosome, 2 = start, 3 = stop, 4 =

# annotation). NOTE: bed regions should be collapsed together # before send to this function

if (ncol(bed.df) < 4) { bed.df$anno <- NA }

# CAUTION: Check whether both df are based on the same # coordinate’s start (either 1- or 0-based)

# create a MIP

# start and stop matrix

if (remap_based_on_mip_target) {

start_stop.mip.m <- as.matrix(mip.df[, c(

"mip_target_start_position", "mip_target_stop_position")]) } else {

start_stop.mip.m <- t(apply(as.matrix(mip.df[, c(

"lig_probe_start",

"lig_probe_stop", "ext_probe_start", "ext_probe_stop")]), 1, function(x) {

return(c(min(as.numeric(x)), max(as.numeric(x)))) }))

}

# go through each bed region total.index.s <- nrow(bed.df) if (verbose) {

cat(feed, "[remap_mips_to_bed.fct]: mapping mips to feature", feed)

}

nrow.bed.df <- nrow(bed.df) filtered.mips.x <- apply(

as.matrix(1:nrow.bed.df), 1, function(index.bed.s) { auswahl.v <- mip.df$chr %in% bed.df[index.bed.s, 1] &

ident.index.bed.s <- paste(as.character(bed.df[index.bed.s, ]), collapse = "_")

# get the feature suffix

suffix.feature.s <- which(ident.auswahl.bed.v ==

Appendix B – R Functions for Processing MIP Designs and Setting Up MIPseq

feature.s <- (paste(as.character(c(prefix.feature.s, suffix.feature.s)), collapse = "_"))

sub.mip.df$feature_mip_count <- feature.s # reset the feature start and stop positions

sub.mip.df$feature_start_position <- bed.df[index.bed.s, 2]

sub.mip.df$feature_stop_position <- bed.df[index.bed.s, 3]

filtered.mips.x <- do.call(rbind, filtered.mips.x) }

if (verbose) {

cat(feed, "[remap_mips_to_bed.fct]: done") }

return(filtered.mips.x) }

# Function to get MIPs with overlapping arms at different strands////

get_arm_overlaps.fct <- function(mip.df, threshold.filter.s = 0,

Im Dokument Genetics of Restless Legs Syndrome (Seite 125-0)