• Keine Ergebnisse gefunden

Bioinformatic  tools  and  methods

4.   Materials  and  methods

4.2   Methods

4.2.6   Bioinformatic  tools  and  methods

 4.2.6.1  Determination  of  the  dMi-­‐2  enriched  regions  in  the   heat  shocked  ChIP-­‐sequencing  experiment  

 

To   identify   differential   dMi-­‐2   enrichment   between   HS   and   NHS   conditions,   DESeq   has   been   used   with   the   size   parameter   set   to   the   number   of   aligned   reads.  When  DESeq  reported  an  adjusted  value  of  p≤0.05   between   the   NHS   and   the   HS   alignments,   a   region   was   assigned  to  the  condition  with  the  higher  read  count.    

 

4.2.6.2  dMi-­‐2  reads  distribution  around  the  TSS    

  Custom  python  scripts  have  been  used  to  extract   ChIP-­‐sequencing   reads   within   3   kb   around   the   transcription   start   sites.   Reads   were   enlarged   to   200   bp.  

The  read  coverage  relative  to  the  transcription  start  sites   was  sumed  up.  Transcription  start  sites  were  extracted  for   the   Ensembl   transcript   annotations   to   include   internal   transcription  start  sites.  

 

4.2.6.3   Distribution   of   the   chromatin-­‐associated   proteins   around  the  dMi-­‐2  binding  sites  

  ChIP-­‐sequencing   read   counts   at   the   850   robust   dMi-­‐2   binding   sites   were   averaged,   normalized   to   1   million   reads   and   aligned   at   position   0   bp.   The   modENCODE  ChIP-­‐chip  data  sets  (Pol  II:  data  set  329,  H1:  

data  set  3300,  H4:  data  set  3304,  Ez:  dataset  284,  Gaf:  data   set   285,   RPD3:   data   set   946,   MBD:   data   set   3057)   were   average  and  aligned  to  dMi-­‐2  binding  sites  in  a  window  of   16  kb.  Alignment  was  done  using  bowtie  0.12.3,  allowing   two  mismatches  in  seed  and  a  mismatch  quality  sum  of  70.  

The  read  signal  intensity  is  given  in  arbitrary  units  (AU).  

 

4.2.6.4  Genomic  distribution  of  the  dMi-­‐2  binding  sites    

  ChIP-­‐sequencing   reads   have   been   classified   accordingly  to  a  genomic  location  using  a  custom  python   scripts.  The  Ensembl  revision  65  has  been  used  to  identify   genomic  location.    

 4.2.6.5  dMi-­‐2  distribution  over  the  hsp  and  the  RpS  gene   bodies    

 

  ChIP-­‐sequencing  reads  were  treated  as  described   in  4.2.6.3,  except  that  reads  coverage  was  set  around  and   within   the   hsp   or   the  RpS   genes   only.   dMi-­‐2   reads   were   shifted  95  bp  downstream  to  the  approximate  binding  site   (estimated   from   fragment   lengths   via   MACS)   and   binned   into   50   bins   per   subregion.   Bin   reads   counts   were   normalized  to  one  million  reads.    

 

4.2.6.6  Chromatin  states  distribution  in  dMi-­‐2  binding   sites    

    The   850   robust   dMi-­‐2   binding   sites   were   visualized   in   the   genome   browser   of   the   modMINE   website  that  contains  the  chromatin  states  data  set  for  the   S2   cells.   The   proportion   of   each   chromatin   states   was   determined.  The  average  of  each  chromatin  states  present   in  the  850  robust  dMi-­‐2  binding  sites  was  calculated.  The   genomic   proportions   of   the   chromatin   states   were   taken   from  Kharchenko  et  al.  (2011)  Nature.  

 

4.2.6.7   Co-­‐occurrences   between   the   dMi-­‐2   binding   sites   and  the  chromatin-­‐associated  protein  binding  sites       The   850   robust   dMi-­‐2   binding   sites   have   been   converted  into  BED  file.  A  co-­‐occurrence  was  defined  as  an   overlap   of   at   least   1   bp   between   binding   sites   of   the   different  data  sets  (Gaf:  data  set  285,  RPD3:  data  set  946,   MBD:  data  set  3057,  H3K4me3:  data  set  914,  H3K9ac:  data   set   309,   H3K4me1:   data   set   304,   H3K18ac:   data   set   292,   H3K27ac:  data  set  296,  H3K36me3:  data  set  303,  H4K16ac   (L):  data  set  319,  H4K16ac  (M):  data  set  320,  H3K27me3:  

data  set  298,  H3K9me2:  data  set  311,  H3K9me3:  data  set   313,   CTCF:   data   set   283,   CP190   HB:   data   set   925,   CP190  

VC:   data   set   280,   Beaf-­‐32   HB:   data   set   274,   Beaf-­‐32   70:  

data   set   922,   Su(Hw)   HB:   data   set   330,   Su(Hw)   VC:   data   set   331,   Mod(mdg4):   data   set   2674).   The   co-­‐occurrences   were   analyzed   by   a   visual   inspection   in   the   Generic   Genome  Browser  v.2.52  view.  

 

4.2.6.8   Identification   of   the   DNA   sequences   enriched   in   dMi-­‐2  binding  sites  

 

DREME   (Meme   version   4.8.1)   has   been   used   to   identify  de  novo  DNA  motifs  that  were  enriched  in  the  850   robust  dMi-­‐2  binding  sites  (Bailey  (2011)  Bioinformatics).  

Confident  DNA  motifs  have  (1)  a  threshold  ending  with  a   support   value   equal   to   400   or   more,   (2)   most   of   the   threshold   (2/3)   has   a   support   value   ≥   600   and   (3)   a   relative   stable   support   value.   The   confident   DNA   motifs   were   then   compared   to   the   Jaspar   database   to   find   transcription   factors   associated   to  de   novo   DNA   motifs   (Sandelin  et  al.  (2004)  Nucleic  Acids  Res).      

 To   assess   the   enrichment   of   TATA   boxes   in   the   robust   dMi-­‐2   sites,   TATA   boxes   sequences   have   been   defined  via  the  motif  matrix  of  the  TATA  binding  protein   (TBP)  on  regions  covering  the  35  bp  before  the  genome-­‐

wide   TSSs.   The   co-­‐occurrences   of   TATA   and   non-­‐TATA   promoters   with   the   robust   850   dMi-­‐2   binding   sites   were   then  analyzed  with  custom  Python  scripts.    

 

The  enrichments  of  TATA  boxes  and  InR  have  also   been   investigated   on   a   subset   of   dMi-­‐2   binding   sites   (rhoGap93B,   mep1,   dco,   ttk,   e2f,   kismet,   mnt,   hairy,   for,   CG1832,  InR,  lanA,  bnl,  cdk4  and  dm).  An  intergenic  region   and  a  promoter  that  were  not  bound  by  dMi-­‐2  were  used   as   negative   control   regions.   The   CRE   motifs   were   recognized   with   jPREdictor   v1.0   in   each   investigated   region  (Fiedler  (2008)  Dissertation,  University  of  Bielefld;  

Fiedler   and   Rehmsmeier   (2006)   Nucleic   Acids   Res).   The  

CRE  frequencies  were  calculated  on  the  length  of  the  dMi-­‐

2   bound   region.   Enrichment   was   defined   by   a   CRE   frequency  lower  in  the  dMi-­‐2  binding  site  relatively  to  the   negative   region   (either   the   promoter   or   the   intergenic   region)   in   the   majority   of   the   investigated   dMi-­‐2   regions   (≥50%).    

 

4.2.6.9  Gene  ontology  analysis  of  the  dMi-­‐2  associated   genes  

    The  gene  ontology  of  the  closest  genes  associated   to  the  850  robust  dMi-­‐2  binding  sites  was  analyzed  using   the  DAVID  bioinformatic  database  (DAVID  Bioinformatics   Resources  6.7,  National  Institute  of  Allergy  and  Infectious   diseases,  NIH)  (Huang  et  al.  (2009)  Nat  Protoc;  Dennis  et   al.  (2003)  Genome  Biol).  The  dMi-­‐2  associated  genes  were   compared   to   the  Drosophila   melanogaster   background.  

The   gene   ontology   terms   were   ranked   based   on   their   p-­‐

values   and   only   the   ten   most   significant   gene   ontology   terms  were  considered.  

 

4.2.6.10  dMi-­‐2  association  with  gene  expression  level    

Custom   python   scripts   were   used   to   calculate   fragments  per  kilobase  (FPK)  of  exon  transcripts  for  each   genes  of  the  Drosophila  genome.  The  relative  distributions   of  genes  associated  to  a  dMi-­‐2  binding  site  or  devoid  of  it   (no   association)   were   plotted   within   1   FPK   wide   bins.  

Each  FPK  were  normalized  on  the  sum  of  every  bins  of  the   associated   condition   (either   dMi-­‐2   associated   or   no   association).  

 4.2.6.11  Gene  regulation  by  dMi-­‐2  

  Genes   associated   to   dMi-­‐2   bindings   sites   were   compared   to   genes   regulated   in   dMi-­‐2   knocked   down   S2   cells  (RNA-­‐sequencing  performed  by  Eugenia  Wagner).  A  

gene  was  up  regulated  by  dMi-­‐2  when  the  gene  expression   showed  a  fold  change  equal  or  inferior  to  -­‐2.00,  upon  dMi-­‐

2   depletion.   Inversely,   when   a   gene   had   a   fold   change   equal   or   superior   to   2.00   upon   dMi-­‐2   knockdown,   the   gene  was  down  regulated  by  dMi-­‐2.    

 

4.2.6.12   Identification   of   the   dMi-­‐2   containing   complexes   in  dMi-­‐2  binding  sites  

 

To   determine   if   dNuRD   could   be   present   in   the   robust  dMi-­‐2  binding  sites,  the  co-­‐occurrence  between  the   robust   dMi-­‐2   binding   sites   (in   bed   file)   and   the   two   dNuRD   subunit   data   sets   available   in   the   modENCODE   website   (MBD:   data   set   3057,   RPD3:   data   set   946)   was   determined.     An   overlap   of   at   least   1   bp   between   dMi-­‐2,   MBD  and  RPD3  was  needed  to  consider  that  it  could  be  a   dNuRD  binding  site.