Bioinformatic tools and methods - Materials and methods

4. Materials and methods

4.2 Methods

4.2.6 Bioinformatic tools and methods

4.2.6.1 Determination of the dMi-‐2 enriched regions in the heat shocked ChIP-‐sequencing experiment

To identify differential dMi-‐2 enrichment between HS and NHS conditions, DESeq has been used with the size parameter set to the number of aligned reads. When DESeq reported an adjusted value of p≤0.05 between the NHS and the HS alignments, a region was assigned to the condition with the higher read count.

4.2.6.2 dMi-‐2 reads distribution around the TSS

Custom python scripts have been used to extract ChIP-‐sequencing reads within 3 kb around the transcription start sites. Reads were enlarged to 200 bp.

The read coverage relative to the transcription start sites was sumed up. Transcription start sites were extracted for the Ensembl transcript annotations to include internal transcription start sites.

4.2.6.3 Distribution of the chromatin-‐associated proteins around the dMi-‐2 binding sites

ChIP-‐sequencing read counts at the 850 robust dMi-‐2 binding sites were averaged, normalized to 1 million reads and aligned at position 0 bp. The modENCODE ChIP-‐chip data sets (Pol II: data set 329, H1:

data set 3300, H4: data set 3304, Ez: dataset 284, Gaf: data set 285, RPD3: data set 946, MBD: data set 3057) were average and aligned to dMi-‐2 binding sites in a window of 16 kb. Alignment was done using bowtie 0.12.3, allowing two mismatches in seed and a mismatch quality sum of 70.

The read signal intensity is given in arbitrary units (AU).

4.2.6.4 Genomic distribution of the dMi-‐2 binding sites

ChIP-‐sequencing reads have been classified accordingly to a genomic location using a custom python scripts. The Ensembl revision 65 has been used to identify genomic location.

4.2.6.5 dMi-‐2 distribution over the hsp and the RpS gene bodies

ChIP-‐sequencing reads were treated as described in 4.2.6.3, except that reads coverage was set around and within the hsp or the RpS genes only. dMi-‐2 reads were shifted 95 bp downstream to the approximate binding site (estimated from fragment lengths via MACS) and binned into 50 bins per subregion. Bin reads counts were normalized to one million reads.

4.2.6.6 Chromatin states distribution in dMi-‐2 binding sites

The 850 robust dMi-‐2 binding sites were visualized in the genome browser of the modMINE website that contains the chromatin states data set for the S2 cells. The proportion of each chromatin states was determined. The average of each chromatin states present in the 850 robust dMi-‐2 binding sites was calculated. The genomic proportions of the chromatin states were taken from Kharchenko et al. (2011) Nature.

4.2.6.7 Co-‐occurrences between the dMi-‐2 binding sites and the chromatin-‐associated protein binding sites The 850 robust dMi-‐2 binding sites have been converted into BED file. A co-‐occurrence was defined as an overlap of at least 1 bp between binding sites of the different data sets (Gaf: data set 285, RPD3: data set 946, MBD: data set 3057, H3K4me3: data set 914, H3K9ac: data set 309, H3K4me1: data set 304, H3K18ac: data set 292, H3K27ac: data set 296, H3K36me3: data set 303, H4K16ac (L): data set 319, H4K16ac (M): data set 320, H3K27me3:

data set 298, H3K9me2: data set 311, H3K9me3: data set 313, CTCF: data set 283, CP190 HB: data set 925, CP190

VC: data set 280, Beaf-‐32 HB: data set 274, Beaf-‐32 70:

data set 922, Su(Hw) HB: data set 330, Su(Hw) VC: data set 331, Mod(mdg4): data set 2674). The co-‐occurrences were analyzed by a visual inspection in the Generic Genome Browser v.2.52 view.

4.2.6.8 Identification of the DNA sequences enriched in dMi-‐2 binding sites

DREME (Meme version 4.8.1) has been used to identify de novo DNA motifs that were enriched in the 850 robust dMi-‐2 binding sites (Bailey (2011) Bioinformatics).

Confident DNA motifs have (1) a threshold ending with a support value equal to 400 or more, (2) most of the threshold (2/3) has a support value ≥ 600 and (3) a relative stable support value. The confident DNA motifs were then compared to the Jaspar database to find transcription factors associated to de novo DNA motifs (Sandelin et al. (2004) Nucleic Acids Res).

To assess the enrichment of TATA boxes in the robust dMi-‐2 sites, TATA boxes sequences have been defined via the motif matrix of the TATA binding protein (TBP) on regions covering the 35 bp before the genome-‐

wide TSSs. The co-‐occurrences of TATA and non-‐TATA promoters with the robust 850 dMi-‐2 binding sites were then analyzed with custom Python scripts.

The enrichments of TATA boxes and InR have also been investigated on a subset of dMi-‐2 binding sites (rhoGap93B, mep1, dco, ttk, e2f, kismet, mnt, hairy, for, CG1832, InR, lanA, bnl, cdk4 and dm). An intergenic region and a promoter that were not bound by dMi-‐2 were used as negative control regions. The CRE motifs were recognized with jPREdictor v1.0 in each investigated region (Fiedler (2008) Dissertation, University of Bielefld;

Fiedler and Rehmsmeier (2006) Nucleic Acids Res). The

CRE frequencies were calculated on the length of the dMi-‐

2 bound region. Enrichment was defined by a CRE frequency lower in the dMi-‐2 binding site relatively to the negative region (either the promoter or the intergenic region) in the majority of the investigated dMi-‐2 regions (≥50%).

4.2.6.9 Gene ontology analysis of the dMi-‐2 associated genes

The gene ontology of the closest genes associated to the 850 robust dMi-‐2 binding sites was analyzed using the DAVID bioinformatic database (DAVID Bioinformatics Resources 6.7, National Institute of Allergy and Infectious diseases, NIH) (Huang et al. (2009) Nat Protoc; Dennis et al. (2003) Genome Biol). The dMi-‐2 associated genes were compared to the Drosophila melanogaster background.

The gene ontology terms were ranked based on their p-‐

values and only the ten most significant gene ontology terms were considered.

4.2.6.10 dMi-‐2 association with gene expression level

Custom python scripts were used to calculate fragments per kilobase (FPK) of exon transcripts for each genes of the Drosophila genome. The relative distributions of genes associated to a dMi-‐2 binding site or devoid of it (no association) were plotted within 1 FPK wide bins.

Each FPK were normalized on the sum of every bins of the associated condition (either dMi-‐2 associated or no association).

4.2.6.11 Gene regulation by dMi-‐2

Genes associated to dMi-‐2 bindings sites were compared to genes regulated in dMi-‐2 knocked down S2 cells (RNA-‐sequencing performed by Eugenia Wagner). A

gene was up regulated by dMi-‐2 when the gene expression showed a fold change equal or inferior to -‐2.00, upon dMi-‐

2 depletion. Inversely, when a gene had a fold change equal or superior to 2.00 upon dMi-‐2 knockdown, the gene was down regulated by dMi-‐2.

4.2.6.12 Identification of the dMi-‐2 containing complexes in dMi-‐2 binding sites

To determine if dNuRD could be present in the robust dMi-‐2 binding sites, the co-‐occurrence between the robust dMi-‐2 binding sites (in bed file) and the two dNuRD subunit data sets available in the modENCODE website (MBD: data set 3057, RPD3: data set 946) was determined. An overlap of at least 1 bp between dMi-‐2, MBD and RPD3 was needed to consider that it could be a dNuRD binding site.

Im Dokument Genome-wide analysis of dMi-2 binding sites (Seite 95-101)