• Keine Ergebnisse gefunden

7.6 Analysis of GxE interaction effects

7.6.2 Comparison of different GxE methods by their top SNPs

Comparing the top 100 GxE interacting SNPs of the different GxE methods within study and smoking model, we see for all four lung cancer studies quite similar trends.

In tables7.12 and 7.13the data for the GLC and CE-IARC are shown exemplarily. For all studies, the overlap of top 100 SNPs between the different methods is in general larger for NE as for MH. While analyzing never vs. ever smokers the overlap of the top 100 SNPs between the different methods in Central Europe is always larger than for the GLC, we see a reverse trend in moderate vs. heavy smokers. SLRI generally tends to less common SNPs for both analyses, for MDACC no clear trend can be observed.

Table 7.12: Comparison of the top 100 SNPs between the different G×E interaction methods for GLC. CC: case-control, CASES: case-only, TWO: intuitive two-step, MUK:

Mukherjee’s, MUR: Murcrays, EHB: empirical hierarchical Bayes, HBP-GxE: hierarchi-cal Bayes prioritization based on GxE interaction effects (see section 7.7), EHB-PW:

empirical hierarchical Bayes integrating pathway information (see section 7.7)

moderate vs. heavy

HBP-GxE CC CASES TWO MUK MUR EHB EHB-PW

HBP-GxE 59 9 4 15 0 9 2

CC 24 9 43 15 0 9 1

CASES 13 30 64 60 16 100 5

TWO 22 59 66 56 11 64 3

MUK 18 31 77 63 4 60 6

MUR 0 0 2 1 0 16 5

EHB 13 30 100 66 77 2 5

nevervs.ever

EHB-PW 12 28 96 65 78 2 96

Table 7.13: Comparison of the top 100 SNPs between the different G×E interaction methods for CE-IARC. CC: case-control, CASES: case-only, TWO: intuitive two-step, MUK: Mukherjee’s, MUR: Murcrays, EHB: empirical hierarchical Bayes, HBP-GxE:

hierarchical Bayes prioritization based on GxE interaction effects (see section7.7), EHB-PW: empirical hierarchical Bayes integrating pathway information (see section 7.7)

moderate vs. heavy

HBP-GxE CC CASES TWO MUK MUR EHB EHB-PW

HBP-GxE 57 7 32 13 0 7 8

CC 6 8 45 12 0 8 8

CASES 2 41 60 65 15 97 91

TWO 3 63 72 11 48 50 58

MUK 2 46 88 71 1 67 64

MUR 0 0 3 2 1 13 13

EHB 2 41 99 72 89 3 94

nevervs.ever

EHB-PW 2 43 94 72 88 3 95

Focusing on our new empirical hierarchical Bayes method, its results are nearly the same as for the case-only test in all analyses. For both models of the smaller studies SLRI and GLC, the top 100 SNPs are even identical, for CE-IARC and MDACC, 1-3 discordant SNPs occurred. We also observe a really high correlation of both tests with the simple two-step method and the approach of Mukherjee. For MUK the concordance in NE is stronger than in MH. The simple two-step method also tends to the same effect, but not as strongly as MUK. Around 75-90 of our empirical hierarchical Bayes top 100 SNPs for never vs. ever smokers are in the top 100 of MUK, while we have 60-70 for moderate vs. heavy smokers. For TWO, we observe 48-72 common SNPs with EHB for NE and 41-64 common ones for MH.

In particular, when comparing the empirical hierarchical Bayes approach to the tradi-tional case-control test of interaction, we see a strong difference between never vs. ever and moderate vs. heavy. While the similarity is even lower than 10% for never vs. ever smokers, we have 20-40 common SNPs in the moderate vs. heavy model, constituting

Figure7.17:ListcomparisonplotsoftheEHBSNPrankingwithrankingsofdifferentotherGxEinteractionmethodsforGLC. They-axisshowstheproportionofcommonSNPsforaparticularnumberoftopSNPsgivenonthex-axis.

a higher consistency. Notably, in SLRI we see a strongly decreased number of common SNPs for both models compared to the other three data sets. A possible reason for that may be the low number of cases and controls. The similarity between Mukherjee and Chatterjee’s (2008) method and the case-control test is only slightly enhanced in comparison to the empirical hierarchical Bayes method. On the contrary, the simple two-step method shows a much larger overlap of top 100 SNPs with case-control of 40-60 SNPs for moderate vs. heavy smokers and around 40-60 SNPs for never vs. ever.

Although Murcray’s method has shown to be more powerful than other GxE interac-tion methods (Murcray et al., 2009; Mukherjee et al., 2012) for none of the analyses a significant result occurs. This is not surprising, since even the very powerful but biased case-only test showed nearly no such results. We have no common top SNPs of that method with case-control for any of the analyses. For never vs. ever, the overlap is limited to only a few SNPs with other methods as well. However, for moderate vs.

heavy at least up to 17 common SNPs with empirical hierarchical Bayes are observed.

Hence, while comparing the other methods with each other, never vs. ever shows the larger overlap, the effect is reversed for Murcray’s method.

Taking a closer look at the top 100 SNPs of case-only and empirical hierarchical Bayes method, the ranking order stayed nearly constant. Only some single neighbor entries switched their ranking positions (results not shown). We went further and took a look not only at the top 100 SNPs of the different methods, but considered the overall rank-ing as well. In figure 7.17 we see the list comparison plot of case-only, case-control, Mukherjee’s and the simple two-step method with our new EHB for the top 1000, top 5000 and all SNPs exemplarily for GLC. For never vs. ever smokers we observed a higher consistency of EHB and CC for the different studies than for moderate vs. heavy smokers within the top 5,000. Note, this holds for case-control and case-only as well, since EHB and CASES are highly correlated. While we start with 10-30% common SNPs with never vs. ever and go up to 30 to 40% within first 1,000 and even 40-50%

for the top 5,000, moderate-heavy starts with 0-20%, stabilized at around 20% for the top 1,000 in GLC, MDACC and CE-IARC and increases only slightly up to 30% within top 5,000. For SLRI, the consistency reached only 5% within top 1,000 only around 15% for top 5,000. For all analyses, a strong increase of consistency is only seen in the plots considering all SNPs. The case-only and EHB overlap is from the beginning at around 100% and keeps that level with slight deviations only for all of the studies and both models. The consistency of MUK as well as TWO and the EHB lies somewhere between case-only and case-control. In all analyses, Mukherjee starts with a higher consistency to EHB than the two-step method. However, since the overlap with EHB ranking increases stronger for TWO than for MUK, we see a reverse of that effect in each case. In general, this reversing is earlier seen in moderate-heavy than for never-ever. For CE-IARC and SLRI, we see that switch for moderate-heavy already after the top 1,000 and 4,000 SNPs, for the other studies it is somewhere around 10,000. For Mukherjee’s method we observe for GLC, that for the top 50 SNPs we have a slightly higher consistency to EHB (around 10% more), that then decreases a little bit, before it increases slowly again. This is in particular outstanding in GLC analyzing moderate vs. heavy smokers. Mukherjee starts here at around 70% and goes then back a little bit, before it stabilizes at 60-65%. In comparison to all other analyses, we see this effect for the two-step method in this analysis as well and even more strongly. The 9 of the

Table 7.14: Comparison of top 100 genes between smoking models and studies for case-control test (CC) (upper triangle) and HBP-GxE (hierarchical Bayes prioritization based on GxE interaction effects, see section7.7) (lower triangle).

CC

GLC CE-IARC MDACC SLRI

MH NE MH NE MH MH NE

MH - 1 0 2 2 3 0

GLC NE 1 - 2 2 1 0 1

MH 3 0 - 4 4 3 3

CE-IARC

NE 1 3 8 - 2 2 3

MDACC MH 2 0 2 6 - 1 1

MH 2 7 4 0 0 - 3

HBP-GxE

SLRI NE 2 6 5 10 1 2

-10 first ranked SNPs are identical and even 80% of the top 30 are the same. However, considering a larger amount of top SNPs, the consistency decreases to around 50%, where it nearly stays for the top 1,000.

7.6.3 Comparison of top genes between studies

When comparing the top 100 genes per GxE method between the different lung cancer studies, we see a low number of common genes not exceeding 5 in all cases. Furthermore, the consistency between the results of the studies is similar for all different methods.

Hence, we do not see any method harmonizing the different study results. For the tradi-tional case-control test of GxE interaction and our new empirical hierarchical modeling approach, the results are shown in the upper triangles of tables 7.14 and 7.15.

The number of genes occurring for at least two studies within the top 100 genes varies per method between 14 and 26. In total we have 73 such genes across the different meth-ods. 40 of these are replicated by one method (CC, TWO, MUK or MUR), 18 by two different methods (mainly CASES and EHB or CC and TWO), 7 genes by CASES, EHB and MUK or TWO, 1 gene by CC, MUK and TWO. We observe 6 genes supported by four different methods. One gene occurred in the top gene lists for two different studies for all methods with exception of Murcray. Five genes were identified with one method only (MUR,2xCC,2xMUK), but for three different studies.

7.6.4 Resulting SNPs and genes

Taking a look at the case-control GxE test results with a p−value ≤ 10−5, we have a noticeable signal of 3 SNPs (rs4563628, rs7708669, rs4392618) on chromosome 5 for CE-IARC MH. Two of these SNPs are within 500kb +/- of the gene TAG (tumor antigen gene, miscellaneous RNA), that interacts with TP53, the third SNP is close to CTNND2, involved in cell adhesion. Another signal of two SNPs (rs145910, rs4939359) for the same analysis is identified in gene OR4C15 of chromosome 11. For never vs.

ever smokers, two SNPs (rs404074 and rs403746) on chromosome 21 between micRNA geneLOC100506471 and protein coding genePSMG1 had p-values≤10−5. In the SLRI