Linear regression models - Computational methods

6.2 Computational methods

6.2.2 Linear regression models

For the analysis of the cytokine effects on the STAT phosphorylation, we computed the relative GMI (rGMI) of each pSTAT (pSTAT1, pSTAT4 and pSTAT6) by dividing the GMI of the pSTAT staining by the GMI of the corresponding total STAT staining after removing the baseline of pSTAT expression:

rGMIⁱ_pSTAT,x= GMIⁱ_pSTAT,x−min(

GMIpSTAT,x

) GMIⁱ_STAT,x

Furthermore, in order to be able to compare the weights of the regressors of the linear models, the data for each protein used as an explanatory variable in the linear modelling was further normalised to have a mean of 0 and a standard deviation of 1:

GMIⁱ_TF= GMIⁱ_TF−mean(

GMITF) stdev(GMITF)

6.2.2 Linear regression models

The linear modelling was done with Wolfram Mathematica 10. TheLinearModelFit function was used to fit the parameters to the data as well as to compute R², Akaike’s Information Criterion (AIC), the statistical significance of the parameters, their confidence intervals, the residuals and Cook’s distance. Unless specified otherwise, the models were fitted to the data generated by the IFN-γ, IL-12 and IL-4 titrations performed in the presence and absence of the other cytokines with Ifng-/- (IFN-γand IL-12 titrations) orIl4-/- (IL-4 titrations) cells.

The linear regression models were compared using several statistics. The coefficient of determination

R²= 1−SSt

SSr

where SStis the total sum of squares and SSrthe residual sum of squares, informs about the goodness of fit. The Akaike information criterion, in the case of linear regression models,

AIC= 2k+nln(SSr)

wherekis the number of parameters andnthe sample size penalises the com-plexity of the model while rewarding goodness of fit, and the significance of the parameters according to theF-statistic in ANOVA calculations¹⁸².

pSTATs as functions of cytokine concentrations

The family of linear models explaining the pSTAT values in conditionias functions of the cytokine concentrations was build as follows:

rGMIⁱx=αx,0+ pSTAT on each day, seven models with different explanatory variables were fitted to the relative pSTAT values: IFN-γonly, IL-12 only, IL-4 only, IFN-γand IL-12, IFN-γand IL-4, IL-12 and IL-4 or all three cytokines were used as explanatory variables. The models were compared using theR², the AIC and the significance of the parameters. The best model was the one having the lowest AICs and only significant parameters.

Master transcription factors as functions of pSTATs and each other

The family of linear models explaining the expression of the transcription factors T-bet and GATA-3 was built similarly to the previous family, but weighted sums over time until dayTof the active transcription factors (pSTAT1, pSTAT4, pSTAT6 and GATA-3 or T-bet, respectively) were used as explanatory variables for T-bet and GATA-3 in each conditionion dayT. The transcription factor modelled could not be included in the explanatory variables due to the nature of linear regression analysis.

{GATA-3, pSTAT1, pSTAT4, pSTAT6} x=T-bet, {T-bet, pSTAT1, pSTAT4, pSTAT6} x=GATA-3 andT ={1,2,3,4,5}. Similarly to the method used with the models explaining STAT phosphorylation, fifteen models with all possible combinations of the four explanatory variables were fitted to the GMIs of T-bet and GATA-3 independently for each day. λwas determined first by comparing the R²of the fitted model including all explanatory variables for increasingλvalues and choosing theλ value leading to the highest R2. The models with fixedλvalues were then fitted again and compared using the AIC, R²and the significance of the parameters. The best model was the one having the lowest AIC and only significant parameters.

6.2. Computational methods 115

STATs as functions of activated transcription factors

The family of linear models explaining the expression of the three STAT transcrip-tion factors was build like the one explaining T-bet and GATA-3 expression, using the sums over time until dayTof the active transcription factors (T-bet, GATA-3, pSTAT1, pSTAT4, pSTAT6) as explanatory variables for total STAT expression on dayTin each conditioni. pSTAT1, pSTAT4, pSTAT6}. Again, thirty-one models with all possible combina-tions of the five explanatory variables were fitted to the GMIs of STAT1, STAT4 or STAT6 independently for each day.λwas determined first by comparing the R²of the fitted model including all explanatory variables for increasingλvalues and choosing theλvalue leading to the highest R2. The models with fixedλvalues were then fitted again and compared using the AIC, R²and the significance of the parameters. The best model was the one having the lowest AIC and only significant parameters.

Cytokine producers as functions of T-bet and GATA-3

The percentage of cytokine producers during the recall response after five days of differentiation was expressed as a function of the T-bet and GATA-3 GMIs on day 5 before TCR restimulation in each conditioni. Linear functions were fitted to the flow cytometry data of cytokine expression:

Pⁱx=α_x,0+

∑

j=1

α_x,i·GMIⁱj

where P is the percentage of cytokine producers,x={IFN-γ, IL-13, IL-4, TNF-α, IL-2, IL-10}, and j ={T-bet, GATA-3}. Three different models were fitted for each cytokine to the data of the IL-12/IL-4 cross-titration performed in wild-type cells, taking both T-bet and GATA-3, only T-bet or only GATA-3 as explanatory variables. The models were compared using the AIC, R²and the significance of the parameters. The best model was the one having the lowest AIC and only significant parameters. These linear models were used as a basis to empirically find better-fitting non-linear models using theNonLinearModelFitfunction.

The standardized residuals were examined for each model, as well as Cook’s distance, both provided by theLinearModelFitfunction. Three data points were

excluded after analysis of Cook’s distance because of biologically aberrant STAT4 or STAT1 GMI due to abnormally low isotope control stainings.

Im Dokument Systematic inference of regulatory networks that drive cytokine-stimulus integration by T cells (Seite 117-120)