• Keine Ergebnisse gefunden

Sequence motifs and transcription factor binding in normal cells correlate with CpG

5.3 Global profiling of cancer-associated CpG island hypermethylation using MCIp

5.4.5 Sequence motifs and transcription factor binding in normal cells correlate with CpG

To study the correlation between motif appearance, transcription factor binding in normal cells and aberrant DNA methylation in the tumor cell lines, ChIP-on-chip (chromatin immunoprecipitation combined to microarray) analyses with antibodies for the transcription factors Sp1, NRF1 and YY1 in normal peripheral blood monocytes were performed. The distribution of binding events was analyzed based on their genomic location (promoter, intergenic and intragenic regions) (for ChIP-on-chip peak calling and motif annotation see section 4.5.2). As their consensus sites, these three general factors preferentially bound to promoter regions (Figure 5-35A). Enrichments or depletions at the three position classes were highly significant (hypergeometric test: P<0.001). Furthermore they often bound in the vicinity (± 250 bp) of each other as illustrated in Figure 5-35B. Using the bound regions defined by ChIP-on-chip experiments, de novo motif analysis revealed enriched consensus sequences for general transcription factors at a peak size of 200-500 bp (Figure 5-35C). In a distance of 100 bp to the Sp1-bound motifs all the other four motifs for the general transcription factors (NFY, GABP, YY1 and NRF1) as well as the unknown motif were significantly enriched. At NRF1-bound peaks, motifs for Sp1, NFY and GABP showed an enrichment with high significance in a radius of 100 bp around the bound motif. At YY1-bound peaks also with a peak size of 200 bp, consensus sites for YY1 and GABP and the unknown motif were enriched. Within a distance of ± 250 bp around the Sp1-bound motif, in addition to the other motifs, the consensus site for CREBP1 was enriched with high significance (P value: 2.7×10-101) and the YY1-bound peaks were additionally co-enriched with motifs for NRF1 and vJUN within the greater distance.

Figure 5-35 Basic analysis of ChIP

(A) The distribution of binding events was analyzed

depletions at the three position classes were highly significant (hypergeomet

diagram illustrates the overlap of bound regions between the three studied transcription factors. (Maximum distance between two peaks: 250

ChIP-on-chip experiments. Shown are enriched motifs and corresponding TRANSFAC motifs for each transcription factor analyzed at a peak size of 200 or 500

Basic analysis of ChIP-on-chip experiments for Sp1, NRF1 and YY1

(A) The distribution of binding events was analyzed dependent on their genomic location. Enrichments or depletions at the three position classes were highly significant (hypergeometric test: P<0.001). (B) The Venn diagram illustrates the overlap of bound regions between the three studied transcription factors. (Maximum distance between two peaks: 250 bp). (C) De novo motif analysis using the bound regions defined by riments. Shown are enriched motifs and corresponding TRANSFAC motifs for each

ed at a peak size of 200 or 500 bp.

on their genomic location. Enrichments or ric test: P<0.001). (B) The Venn diagram illustrates the overlap of bound regions between the three studied transcription factors. (Maximum motif analysis using the bound regions defined by riments. Shown are enriched motifs and corresponding TRANSFAC motifs for each

Plotting the enrichment of one of the six consensus sites for known transcription factors against the distance to a bound motif (NRF1,

preferences in terms of orientation or distance to each other as demonstrated in In general, motif distances show periodical preferences in most cases whi with sterical features caused by the helical str

preferentially associate with NRF1,

enriched at -20 bp upstream of bound Sp1 sites, and the un located 30 bp upstream or downstream of bound NRF1 sites (

Figure 5-36 Distribution of transcription factor motifs relative to the thre bound sites

YY1-bound motifs preferentially associate with NR enriched at -20 bp upstream of bound Sp1

or downstream of bound NRF1 sites.

lotting the enrichment of one of the six consensus sites for known transcription factors bound motif (NRF1, Sp1, YY1) reflects that some motifs showed preferences in terms of orientation or distance to each other as demonstrated in

In general, motif distances show periodical preferences in most cases which comes along caused by the helical structure of DNA. For example, YY1

preferentially associate with NRF1, Sp1, GABP or NFY site in 5’-direction,

bp upstream of bound Sp1 sites, and the unknown motif is preferentially bp upstream or downstream of bound NRF1 sites (Figure 5-36).

Distribution of transcription factor motifs relative to the three motifs for NRF1,

bound motifs preferentially associate with NRF1, Sp1, GABP or NFY site in 5’-direction, Sp1 sites, and the unknown motif is preferentially located 30

lotting the enrichment of one of the six consensus sites for known transcription factors ome motifs showed preferences in terms of orientation or distance to each other as demonstrated in Figure 5-36.

ch comes along sites, and the unknown motif is preferentially located 30 bp upstream

The next question to be addressed was how the binding of specific factors to their consensus motif influences transcription of the respective gene. Comparing the expression data of CD34+, CD14+ and U937 cells with the ChIP-on-chip data revealed that genes associated with transcription factor-bound CpG islands generally showed significantly higher mRNA levels in CD34+ cells, CD14+ cells or the leukemia cell line as compared to all genes. The box plots showing the distribution of mRNA expression ratios are illustrated in Figure 5-37A.

Moreover, the expression data were analyzed according to the number of bound transcription factors. Figure 5-37B demonstrates that binding of more factors generally increased overall expression levels of associated genes.

Figure 5-37 Expression status dependent on the binding of general transcription factors

(A) The box plots show the distribution of mRNA expression ratios (CD34+ progenitor cells, CD14+ normal blood monocytes, U937 cells) conditional on the binding status at individual, gene-associated peaks. The red lines denote medians, boxes the interquartile ranges, and whiskers the 5th and 95th percentiles. Pair wise comparisons of total mRNA expression ratios (all genes) and transcription factor-bound regions are significant (P<0.001, Mann–Whitney U test, two-sided). (B) The box plots show the distribution of mRNA expression ratios (CD34+ progenitor cells, CD14+ normal blood monocytes, U937 cells) conditional on the binding status (binding of one, two or three factors) at individual, gene-associated peaks. The red lines denote medians, boxes the interquartile ranges, and whiskers the 5th and 95th percentiles. Pair wise comparisons of total mRNA expression ratios (all genes) and transcription factor-bound regions are significant (P<0.001, Mann–Whitney U test, two-sided).

To directly compare transcription factor binding patterns in normal cells with aberrant methylation profiles of leukemia cell lines, the signal intensity ratios of ChIP enrichment for each transcription factor was plotted against the MCIp enrichment of the leukemia cell lines (THP-1 and U937) versus normal human blood monocytes. Figure 5-38 demonstrates that both events were mutually exclusive for all three transcription factors in U937 as well as THP-1 cells. This demonstrates that transcription factor binding protects from de novo methylation in leukemia cells.

Figure 5-38 Correlation between transcription factor binding in normal cells and aberrrant methylation in leukemia cells

The three transcription factors Sp1, NRF1 and YY1 were island arrays. In the diagrams the signal intensity ratios of ChIP against the MCIp enrichment of the leukemia cell line (

We also observed that transcription factor binding was not detected at every motif.

ChIP-on-chip data, the motifs for

not bound and those that are actually bound by the correspond

About 35% of the Sp1 motifs, 25% of the NRF1 motifs and 16% of the YY1 motifs bound by the respective transcription factor

bound and non-bound motifs were compared revealing th transcription factors – were highly significantly

“protective” motifs within the distance of ± Sp1-bound motifs compared to the unbound NFY, NRF-1, GABP, CRE-BP1 and YY1. NRF1

NRF1-motif were co-enriched with motifs for GABP, vJUN, YY1-bound motif versus the unbound YY1

E-Box, Sp1, GABP and an unknown factor. P values range Furthermore, ratios of observed versus expected motif occurrences sequence motifs that are either bo

number of additional consensus sequences in their close vicinity. As shown in a sequence motif was more likely bound, if it contained at least

in close proximity (± 250 bp). Enrichment in the bound fraction and depletion in the unbound fraction were highly significant (in most cases hypergeometric test: P<0.001)

Correlation between transcription factor binding in normal cells and aberrrant

, NRF1 and YY1 were analyzed using ChIP-on-chip on human 244K CpG island arrays. In the diagrams the signal intensity ratios of ChIP enrichment of each transcription fact

enrichment of the leukemia cell line (THP-1) versus normal human blood mo

observed that transcription factor binding was not detected at every motif.

chip data, the motifs for Sp1, NRF1 and YY1 could be subdivided into those that are not bound and those that are actually bound by the corresponding factor in CpG islands.

25% of the NRF1 motifs and 16% of the YY1 motifs

bound by the respective transcription factor. Using the de novo motif discovery algorithm bound motifs were compared revealing that motifs - only if bound by highly significantly co-enriched for consensus motifs of the other within the distance of ± 250 bp around each motif

mpared to the unbound Sp1 motifs were co-enriched with motifs for BP1 and YY1. NRF1-bound motifs compared to the unbound enriched with motifs for GABP, vJUN, Sp1 and the unknown factor. The e unbound YY1-motif showed co-enrichment for motifs of an , GABP and an unknown factor. P values ranged from 10

of observed versus expected motif occurrences were calculated sequence motifs that are either bound by the corresponding factor or not, dependent on the number of additional consensus sequences in their close vicinity. As shown in

sequence motif was more likely bound, if it contained at least one or better two

Enrichment in the bound fraction and depletion in the unbound fraction were highly significant (in most cases hypergeometric test: P<0.001)

Correlation between transcription factor binding in normal cells and aberrrant de novo

on human 244K CpG enrichment of each transcription factor are plotted

nocytes.

observed that transcription factor binding was not detected at every motif. Based on , NRF1 and YY1 could be subdivided into those that are ing factor in CpG islands. number of additional consensus sequences in their close vicinity. As shown in Figure 5-39B ne or better two other motifs Enrichment in the bound fraction and depletion in the unbound fraction were highly significant (in most cases hypergeometric test: P<0.001). Moreover,

genes associated with transcription factor-bound motifs showed significantly higher mRNA levels as compared to genes that were associated with non-bound motifs. The mRNA expression levels in CD34+ progenitor cells, CD14+ normal blood monocytes and U937 cells conditional on the binding status of the associated motif (NRF1, Sp1, YY1) are demonstrated in Figure 5-39C. The data suggest that the stable binding of these general transcription factors (as measured by ChIP) to their consensus motif depends on the presence of neighboring motifs that are cooperatively bound by other general transcription factors. Thus, the combinatorial presence of two or more of the identified consensus sequences may serve to stabilize transcription factor binding and to confer the resistance of certain CpG islands (preferably those acting as promoters) to aberrant methylation.

Figure 5-39 Properties of consensus sequences that are bound or not bound by the corresponding transcription factor

(A) Based on ChIP-on-chip data, the motifs for Sp1, NRF1 and YY1 could be subdivided into those that are not bound and those that are actually bound by the corresponding factor in CpG islands. De novo motif searches of bound motifs against non-bound motifs revealed a highly significant association of bound motifs with consensus sites for other general factors within the range of ± 250 bp around each motif. (B) Ratios of observed versus expected motif occurrences are shown for sequence motifs that are either bound by the corresponding factor (blue bars) or not bound (green bars) and had at least one (top panel) or two other consensus sites (bottom panel) within a 250 bp distance. Enrichment in the bound fraction and depletion in the unbound fraction were highly significant (hypergeometric test: P<0.001) except for the cases marked with a hash. (C) The box plots show the distribution of mRNA expression ratios (CD34+ progenitor cells, CD14+ normal blood monocytes, U937 cells) conditional on the binding status of the associated motif. The red lines denote medians, boxes the interquartile ranges, and whiskers the 5th and 95th percentiles. Pair wise comparisons of mRNA expression ratios associated

5.4.6 Properties of CpG island-associated genes in conjunction with