3. RESULTS
3.1 Clustering of different lymphocyte group: proof of data quality and the
To prove the data quality, PCA and HC was first performed using data from all samples available (n=91), including NK, T, and B lymphocytes, and monocytes. When the limit of detection (LoD) 1 was used, 20858 genes among the complete transcriptome were included in the analysis. PCA and hierarchical cluster analysis were performed using top 100 of the mostly differentially expressed genes (Appendix 1). Based on this expression data, both on PCA score plot and clustered heatmap (Figure 1, 2) the following sample groups were observed: lymphocyte precursors (stage 1 and 2), NK cell precursors (stage 3), NK cells stage 4 and 5, T lymphocytes; B lymphocytes, and monocytes. Thus, the sequencing data is complete enough and its quality is sufficient for the analysis.
-20 0 20 40
Figure 1 PCA score plot of analyzed samples based on expression levels of the top 100 differentially expressed genes.
Genes were selected from the total number of 20858, LoD=1. 1 - lymphocyte precursors; 2 – immature NK cells (stage 3); 3 – NK stage 4 and 5; 4 – monocytes; 5 - B lymphocytes; 6 – T lymphocytes.
37
As can be seen on the clustered heatmap (Figure 2), for most of these groups specific expression patterns were observed (Table 2). Furthermore, known functions of most of these specifically expressed genes were consistent to observed expression patterns.
In particular, among genes expressed in the group of all committed NK cells (stage 3 to stage 5) were classical NK cell markers such as NCR1, NCR3, NCAM1, and KLRC1.
Furthermore, genes encoding NK effector molecules, such as PRF1, IFNG, CST7, GNLY and granzymes (GZMA, GZMB, GZMH, GZMM), and other molecules typical for mature
SLC16A10
pbBcbBcbMonopbMonocbNKst3bmNKst1_1cbCD34pos_2cbCD34pos_1 daNKst2_2daNKst2_1cbCD34fe_negcdD34fe_posbmNKst3_1bmNKst3_2bmNKst2_1bmNKst2_3cbNKst2bmNKst1_2pbTKIRneg_3pbTKIRpos_7pbTKIRpos_2pbTKIRpos_1cbTCD8pos_2cbTCD8pos_1cbTCD4pos_1cbTCD4pos_2pbTKIRpos_6pbTKIRpos_5pbTKIRneg_4pbTKIRneg_5pbTCKIRneg_2pbTKIRpos_4pbTKIRneg_1toNKst2toNKst3_7toNKst3btoNKst3_4toNKst3_5toNKst3_6toNKst3_1toNKst3_2toNKst3adaNKst3_3daNKst3_2daNKst3_1daNKst4_3daNKst5_2daNKst4_1daNKst5_1daNKst4_2 bmNKst4_1toNKst4_2bmNKst4_2pbNKst5nonl_3pbNKst5mem_6pbNKst5mem_5pbNKst5lic_12pbNKst5mem_4liNKCXCR6ptoNKst5toNKst4_1cbNKst4_2pbNKst4_3cbNKst4pbNKst4_2pbNKst4_1bmNKst5_1bmNKst5_2cbNKst5_2pbNKst5_1cbNKst5liNKCXCR6n_2liNKCXCR6n_1pbNKst5_3pbNKst5lic_2pbNKst5lic_1pbNKst5_2pbNKst5lic_3pbNKst5lic_11pbNKst5lic_9pbNKst5lic_4pbNKst5lic_5pbNKst5lic_7pbNKst5lic_6pbNKst5nonl_2pbNKst5nonl_1pbNKst5lic_10pbNKst5mem_3pbNKst5mem_1pbNKst5mem_2pbNKst5lic_8
Heatmap of Expression (Gene Z-Score)
-3 -2 -1 0 1 2 3
Gene Z-Score
Figure 2 HC heatmap of analyzed samples based on expression levels of the top 100 differentially expressed genes
Genes were selected from the total number of 20858, LoD=1.
SLC16A10
pbBcbBcbMonopbMonocbNKst3bmNKst1_1cbCD34pos_2cbCD34pos_1daNKst2_2daNKst2_1cbCD34fe_negcdD34fe_posbmNKst3_1bmNKst3_2bmNKst2_1bmNKst2_3cbNKst2bmNKst1_2pbTKIRneg_3pbTKIRpos_7pbTKIRpos_2pbTKIRpos_1cbTCD8pos_2cbTCD8pos_1cbTCD4pos_1cbTCD4pos_2pbTKIRpos_6pbTKIRpos_5pbTKIRneg_4pbTKIRneg_5pbTCKIRneg_2pbTKIRpos_4pbTKIRneg_1toNKst2toNKst3_7toNKst3btoNKst3_4toNKst3_5toNKst3_6toNKst3_1toNKst3_2toNKst3adaNKst3_3daNKst3_2daNKst3_1daNKst4_3daNKst5_2daNKst4_1daNKst5_1daNKst4_2bmNKst4_1toNKst4_2bmNKst4_2pbNKst5nonl_3pbNKst5mem_6pbNKst5mem_5pbNKst5lic_12pbNKst5mem_4liNKCXCR6ptoNKst5toNKst4_1cbNKst4_2pbNKst4_3cbNKst4pbNKst4_2pbNKst4_1bmNKst5_1bmNKst5_2cbNKst5_2pbNKst5_1cbNKst5liNKCXCR6n_2liNKCXCR6n_1pbNKst5_3pbNKst5lic_2pbNKst5lic_1pbNKst5_2pbNKst5lic_3pbNKst5lic_11pbNKst5lic_9pbNKst5lic_4pbNKst5lic_5pbNKst5lic_7pbNKst5lic_6pbNKst5nonl_2pbNKst5nonl_1pbNKst5lic_10pbNKst5mem_3pbNKst5mem_1pbNKst5mem_2pbNKst5lic_8
Heatmap of Expression (Gene Z-Score)
-3 -2 -1 0 1 2 3
Gene Z-Score
38
NK cells (FCRL3, FCRL6, CX3CR1, FCGR3A, FGFBP2, S1PR5) were expressed by NK cells stage 4 and stage 5, but not by stage 3 NK cells.
Table 2 Expression patterns of 100 top differentially expressed genes among all samples, LoD1
Lymphocyte
precursors T lymphocytes T lymphocytes
and NK stage 5 NK stages 3-5 NK
stages 4 and 5
ALDH1A1 AC010468.2 BCL11B AC017104.6 ATP8B4
ANGPT1 CACNA1I CD247 ATP8B4 BNC2
BCAT1 CAMK4 KZF3 B3GNT7 CLIC3
CPA3 CCR4 PYHIN1 CCNJL CMKLR1
CTSG CD28 SH2D1A CD300A COL13A1
EGFL7 CD5 TENM1 FAT4 CX3CR1
FLT3 CTC-499J9.1 FGR DTHD1
IGLL1 LEF1-AS1 GNLY FCGR3A
LMO2 MAL GRIK4 FCRL3
LPCAT2 NELL2 IL18RAP FCRL6
MAP7 PKIA-AS1 ITGAM FGFBP2
MPO SLC16A10 KLRC1 GZMA
MSRB3 TRAT1 KRT86 GZMB
MYB LINC00299 GZMH
PLD4 MLC1 GZMM
PRTN3 NCAM1 IFNG
SPINK2 NCR1 ITK
TNS3 NCR3 KLRD1
NMUR1 L3MBTL4
PDGFD LINGO2
PTGDR NKG7
RASSF4 PDGFRB
RGS9 PDZD4
RNF165 PRF1
RP11-104L21.3 PRSS30P
RP11-121A8.1 S1PR5
RP11-705C15.5 SLAMF7
SH2D1B TBX21
SIGLEC17P
SIGLEC7
TRDC
TRDJ1
TRGV9
39
TYROBP
XCL1
XCL2
Notably, CD56dimCXCR6- hepatic NK cells were clustered together with stage 5 NKs from other tissues and shared a mature expression profile, while the CD56dimCXCR6+ sample clustered together with stage 4 NK cells. In comparison to stage 5 NK cells, CXCR6- samples had lower expression levels of markers of mature NK cells such as GNLY, GZMB, GZMH, TBX21, CX3CR1, CMKLR1, COL13A1 and FGFBP2, as well as of KRT86, LINGO2, PDGFRB PRSS30P. Furthermore, the expression of some genes (e.g.
RGS9, FGFBP2, GZMB and GZMH) was lower than in both stage 4 and stage 5 NK cells.
In the group of lymphocyte precursors hematopoiesis-related genes were expressed, such as MYB, CPA3, FLT3, LMO2, MPO, IGLL1 and genes involved in angiogenesis (EGFL7 and ANGPT1) (Su et al., 2004; Surmiak et al., 2012; Yang et al., 2002). Notably, stage 3 samples from bone marrow and cord blood clustered together with stages 1 and 2 and shared with them a specific cluster of 18 highly expressed genes (Figure 2).
Among the genes that are expressed mainly in mature T lymphocytes were classical T-cell markers like CD28, CD5, TRAT1 and other gens known being expressed in T lymphocytes such as CAMK4, CCR4, and MAL (Illario et al., 2008).
To prove whether the obtained data are reproducible, a correlation analysis of the complete transcriptomic data was performed on sample pairs that were most similar considering the sorting strategy and the tissue origin, and were derived from the same donor. Two such pairs were present in the sample set: pbTKIRpos_1 and pbTKIRpos_2, pbNKst5lic_6 and pbNKst5lic_7. In both cases a high similarity of gene expression data was observed on linear regression plots (Figure 3) and corresponding correlation coefficients were 0.91 and 0.92 respectively, proving that the obtained data is reproducible.
To prove that functional enrichment of immune system-related genes was significant, Panther GO overrepresentation test was performed. Out of the top 100 most differentially expressed genes, 88 were mapped to particular biological processes (Table 3); among the significantly overrepresented ones were genes involved into immune system processes (GO:0002376), in particular NK activation (GO:0030101) and B cell mediated immunity (GO:0019724).
40
The analogical analysis, including PCA, HC and Panther GO overrepresentation was also performed using top 400 differentially expressed genes (data not shown). As no differences in distinguishing between sample groups and clustering was observed in comparison to analysis based on top 100 genes, it was concluded, that 100 of genes are sufficient for analysis.
Table 3 Gene ontology overrepresentation among top 100 differentially expressed genes
In summary, a successful differentiation between blood cell populations is possible based on the available transcriptomic data and top 100 genes are sufficient for this. The groups observed upon PCA and HC correspond to FACS sorting strategies and most of the
PANTHER GO-Slim Biological Process
Gene number
Fold
enrichment P-value
Cellular process (GO:0009987) 45 1.59 4.02E-02
Response to stimulus
(GO:0050896) 33 3.6 4.24E-09
Immune system process
(GO:0002376) 27 4.59 2.43E-09
Immune response
(GO:0006955) 18 8.2 1.35E-09
B cell mediated immunity
(GO:0019724) 6 10 7.20E-03
Natural killer cell activation
(GO:0030101) 6 14.28 9.93E-04
41
genes that were differentially expressed between the groups and defined the specific clustering are well-known and typical for the corresponding cell populations.
3.2 Identification of specific expression patterns between mature PBMC