• Keine Ergebnisse gefunden

7.5 Analysis of main effects integrating pathway information

7.5.3 Comparison of top pathways between studies

When comparing the pathway rankings according toβ- orµ-coefficients, the correlation between the different studies is high. In particular, the analyses split into two groups, within which we see higher similarities on the top of the ranking lists than expected by chance. The first group involves model 1 of GLC and model 2 of CE-IARC and SLRI (group A), the second group compasses model 1 of CE-IARC, MDACC, SLRI and model 2 of MDACC and GLC (group B). In all cases the correlation according to β is higher than with respect toµ. In figure7.9we can see list comparison plots between the models of GLC and CE-IARC representative for the correlation within both groups and between them. The corresponding numbers of overlapping top 10 pathways between the different studies and pathway models can be seen in table 7.4.

The possible reasons for this particular grouping may be the role of smoking as a con-founding factor in the given context and the differences in populations underlying the four studies. MDACC includes no never smokers, so that M1 and M2 differ only in the adjustment for the amount of smoking. This results in a very similar pathway ranking for both models, so that they fall into one group. For GLC, SLRI and CE-IARC the pathway ranking for both models differs more severe, since the differentiation between never and ever smokers is additionally relevant. For model M1, smoking status as a con-founder is not considered at all, resulting in top pathways that may be rather related to smoking than to lung cancer directly. M2 however accounts for the smoking status, leading to different, lung cancer relevant pathways. Therefore, both models split to the two groups. The list comparison plot comparing the pathway ranking of M1 and M2 for GLC and MDACC can be seen in figure 7.10. The fact that GLC involves only young individuals may result to the contrary distribution of the models to the groups, since the importance of smoking may be different at younger ages.

Figure 7.9: List comparison plots of pathway rankings according to β and µ of HBP between different studies. The y-axis shows the proportion of common pathways for a particular number of top pathways given on the x-axis. The stars indicate a significant overlap.

Figure 7.10: List comparison plots of pathway rankings according to β and µ of HBP between the different pathway models in GLC and MDACC. The y-axis shows the pro-portion of common pathways for a particular number of top pathways given on the x-axis.

The stars indicate a significant overlap.

Table 7.4: Numbers of common top 10 pathways between the different studies and pathway models usingβ regression coefficients as ranking criterion on the upper triangle or µ regression coefficients on the lower triangle.

β

GLC CE-IARC MDACC SLRI

M1 M2 M1 M2 M1 M2 M1 M2

M1 2 2 9 2 2 2 9

GLC M2 0 7 2 10 9 9 2

M1 0 5 2 7 6 7 2

CE-IARC

M2 9 1 0 2 2 2 9

M1 0 9 4 1 9 9 2

MDACC

M2 0 2 6 0 2 9 2

M1 0 9 4 1 9 1 2

µ

SLRI M2 6 1 0 7 1 0 1

Table 7.5: Numbers of common top 10 pathways between the different studies and pathway models usingβ regression coefficients as ranking criterion or µ regression coef-ficients. Both pathway models (M1 and M2) were combined for this comparison.

one study two studies three studies four studies

β 6 2 9 7

µ 10 7 11 5

Appendix tables B.2 and B.3 give lists of the pathways that belong to the top 10 for at least two different studies. These are 16 pathways using β as pathway ranking criterion and 15 usingµ. All of these pathways were not even for 2 but at least 3 of the studies in the top 10. Even 7 (β) and 4 (µ) of the pathways were in the top 10 for all 4 studies. In table 7.5 the numbers of pathways occurring in only one study, two, three or four different studies regardless of the corresponding pathway model (M1 and M2) are given. Comparing the top pathways forβ and µ, 2 pathways occurred in both lists.

Gene set enrichment analysis

The gene set enrichment analysis identified overall only one pathway as significant according to FDR≤ 0.05 in CE-IARC. The corresponding enrichment score was driven by 35 of the 123 genes totally involved. However, this pathway not even reached a nominal p-value ≤0.05 for any of the other studies.

Several pathways reached nominal significance for each of the studies (pnominal ≤0.05).

The number of significant pathways for each of the 8 analyses as well as the overlap between model 1 and 2 per study is shown in table 7.6. The overlap between the two different pathway models was significant for the two larger studies MDACC and CE-IARC.

Table 7.6: Number of nominal significant pathways for both smoking models of the different lung cancer studies with GSEA and SUMSTAT. The gene set analysis is based on single SNP main effects.

GSEA SUMSTAT

GLC CE-IARC MDACC SLRI GLC CE-IARC MDACC SLRI

model 1 8 14 5 10 26 51 15 16

model 2 8 15 10 14 5 49 14 19

model 12 1 8 5 2 4 39 12 9

model 12 15 21 10 22 27 61 17 26

In figure 7.11 we see the overlap of nominal significant pathways between the four different studies for GSEA. In total, 8 pathways were identified in two different studies and one pathway in three of the studies (CE-IARC M1+M2, SLRI M1, GLC M2). In the appendix table B.4 a list of these 9 pathways and their corresponding nominal p-values can be found. The overlap is significant for none of the study pairs considering the sum of pathways for model 1 and model 2. However, considering the two different models and separately looking at the common pathway per model, we have a significant overlap for CE-IARC M1 and MDACC M1 with 2 common pathways, CE-IARC M1 and SLRI M2 with 3 shared pathways, SLRI M1 and GLC M2 with an overlap of 2.

SUMSTAT method

As for GSEA, the SUMSTAT method identified only for CE-IARC M1 significant pathways according to FDR (≤ 0.05). These were the pathway that was found with GSEA as well and two additional ones. Only one of the latter had a nominal significant result in one of the other studies (SLRI M2).

Several pathways reached nominal significance for each of the studies (pnominal ≤0.05).

Their number is shown in table 7.6. We clearly see that the numbers are much higher than for GSEA with exception of GLC M2. In particular, CE-IARC showed a really high number of significant pathways, making more than 20% of all considered pathways. The overlap between the two models was significant for all four studies (p≤0.05). Comparing the identified pathways between the different studies, only SLRI and CE-IARC M2 had a significant overlap with 8 common pathways. This leads to a significant overlap of the sum of pathways over model 1 and model 2 for SLRI and CE-IARC as well. The consistency between the different studies is illustrated in figure 7.11b. In total, 26 pathways were identified in at least two different studies. Two of these were detected in three of the studies and one pathway had a p-value ≤ 0.05 for all four. The latter was found with both models for the larger studies CE-IARC and MDACC, and with M1 only for SLRI and GLC. Most of the 26 pathways were found with at least one of the models for the CE-IARC data. Appendix table B.5 gives the 26 pathways and their corresponding nominal p-values.