Abstract
The mononuclear phagocyte system (MPS) is a family of cells including progenitors, circulating blood monocytes, resident tissue macrophages and dendritic cells (DC) present in every tissue in the body. To test the relationships between markers and transcriptomic diversity in the MPS, we collected from NCBI-GEO >500 quality RNA-seq datasets generated from mouse MPS cells isolated from multiple tissues. The primary data were randomly down-sized to a depth of 10 million reads and requantified. The resulting dataset was clustered using the network analysis tool Graphia. A sample-to-sample matrix revealed that MPS populations could be separated based upon tissue of origin. Cells identified as classical DC subsets, cDC1 and cDC2, and lacking Fcgr1 (CD64), were centrally-located within the MPS cluster and no more distinct than other MPS cell types. A gene-to-gene correlation matrix identified large generic co-expression clusters associated with MPS maturation and innate immune function. Smaller co-expression gene clusters including the transcription factors that drive them showed higher expression within defined isolated cells, including macrophages and DC from specific tissues. They include a cluster containing Lyve1 that implies a function in endothelial cell homeostasis, a cluster of transcripts enriched in intestinal macrophages and a generic cDC cluster associated with Ccr7. However, transcripts encoding many other putative MPS subset markers including Adgre1, Itgax, Itgam, Clec9a, Cd163, Mertk, Retnla and H2-a/e (class II MHC) clustered idiosyncratically and were not correlated with underlying functions. The data provide no support for the concept of markers of M2 polarization or the specific adaptation of DC to present antigen to T cells. Co-expression of immediate early genes (e.g. Egr1, Fos, Dusp1) and inflammatory cytokines and chemokines (Tnf, Il1b, Ccl3/4) indicated that all tissue disaggregation protocols activate MPS cells. Tissue-specific expression clusters indicated that all cell isolation procedures also co-purify other unrelated cell types that may interact with MPS cells in vivo. Comparative analysis of public RNA-seq and single cell RNA-seq data from the same lung cell populations showed that the extensive heterogeneity implied by the global cluster analysis may be even greater at a single cell level with few markers strongly correlated with each other. This analysis highlights the power of large datasets to identify the diversity of MPS cellular phenotypes, and the limited predictive value of surface markers to define lineages, functions or subpopulations.
INTRODUCTION
The mononuclear phagocyte system (MPS) [1] is a family of cells present in every tissue in the body including progenitors, circulating blood monocytes and resident tissue macrophages [2-5]. Within each tissue, resident macrophages occupy territories with a regular distribution, commonly associated with epithelial and endothelial surfaces (reviewed in [5]). The proliferation, differentiation and survival of most resident macrophage populations depends upon signals from the macrophage-colony-stimulating factor receptor (CSF1R) initiated by one of two ligands, CSF1 or IL34 [6, 7]. Based upon detection of macrophage-restricted mRNA, including Csf1r, the relative abundance of resident macrophages in most organs in mice was shown to reach a maximum in the first week of postnatal life and remains stable thereafter during postnatal growth [8]. Lineage-trace studies in the C57BL/6 strain suggest that many macrophage populations established in the mouse embryo are maintained in adults mainly by self-renewal, whereas others are replaced progressively to differing extents by blood monocytes derived from bone marrow progenitors throughout life [9-11]. Most if not all tissue macrophage populations can be generated and maintained in the absence of blood monocytes due to the intrinsic homeostatic regulation by circulating CSF1 [12]. The precise details of ontogeny, turnover and homeostasis of resident macrophages may not be conserved across mouse strains or species [5]. However, regardless of their steady-state turnover, all resident macrophages including the microglia of the brain can also be rapidly replaced by blood monocytes following experimental depletion ([3-5, 12] and references therein).
Within individual tissues, resident macrophages acquire specific adaptations and gene expression profiles [2, 5, 13-15]. These adaptations contribute to survival as well as function and involve inducible expression of transcription factors and their downstream target genes. At least some of these transcription factors act by regulating Csf1r expression. Deletion of a conserved enhancer in the mouse Csf1r gene leads to selective loss of some tissue macrophage populations, whereas others express Csf1r normally [16]. In the mouse embryo, where abundant macrophage populations are engaged with phagocytosis of apoptotic cells [17], the macrophage transcriptome does not differ greatly between organs. Tissue-specific macrophage adaptation occurs mainly in the postnatal period as the organs themselves exit the proliferative phase and start to acquire adult function [8, 15].
Classical dendritic cells (cDC) are commonly defined functionally on the basis of a proposed unique ability to present antigen to naïve T cells, a concept that requires a clear distinction between DC and macrophages [18]. It remains unclear as to whether cDC should be considered part of the MPS and the extent to which they can be defined by surface markers [12]. The situation is confused by the widespread use of the term DC to describe any antigen-presenting cell (APC) including cells that are clearly derived from blood monocytes [19]. An attempt at consensus proposed an MPS nomenclature classification based upon ontogeny, and secondarily upon location, function and phenotype [20]. The proposal separates monocyte-derived APC from cDC subsets: cDC1, dependent on the transcription factor BATF3, and cDC2, dependent upon IRF4. Some support for this separation came from analysis of an Ms4a3 reporter transgene, which labelled cells derived from committed granulocyte-macrophage progenitors and distinguished monocyte-derived cells from tissue DC [10]. Secondary classification is based upon cell surface markers that are presumed to be linked in some way to ontogeny. The proposed development pathway of these DC subsets from a common myeloid progenitor, via a common DC progenitor (CDP), has been reviewed recently [21].
Even within tissues resident macrophages are extremely heterogeneous [22, 23]. Since the advent of monoclonal antibodies and later development of transgenic reporter genes [24] numerous markers have been identified that segregate the MPS into subpopulations. Amongst the recent suggestions, LYVE1 was proposed as a marker of macrophages associated with the vasculature [25], CD64 (Fcgr1gene) and MERTK as markers that distinguish macrophages from classical DC [26, 27] and CD206 (Mrc1 gene) as a marker of so-called M2 macrophage polarization [28]. Several surface markers have also been identified that are encoded by genes expressed only in macrophages in specific tissues (e.g. Clec4f, Tmem119, Siglecf [22, 23]. Other markers define macrophages in specific locations within a tissue, for example CD169 (encoded by Siglec1) in the marginal zone of spleen and hematopoietic islands in bone marrow [29]. In the case of blood monocytes, the subpopulations are clearly a differentiation series in which short-lived LY6Chi “classical” monocytes give rise in a CSF1R-dependent manner [30] to long-lived LY6Clo non-classical monocytes via an intermediate state [11, 30, 31]. This is likely also the case in tissues such as the liver [32] and intestine [33, 34]. More recently, mouse tissue macrophage heterogeneity has been analysed using multiparameter flow cytometry and single cell RNA-seq [35]. Mechanistically, the association between marker expression and cellular function depends upon coordinated transcriptional regulation. One way to identify coregulated sets of transcripts is to cluster large transcriptomic datasets. This approach was used to create transcriptional atlases in multiple species and identify lineage-specific transcription factors and their target genes [36-40]. It enabled the extraction of a generic tumour-associated macrophage signature from multiple large cancer datasets [41]. Previous meta-analysis of large microarray datasets [36, 37, 40] as well as a reanalysis of data from the ImmGen Consortium [42] indicated a clear separation between mouse MPS cells and other leukocyte lineages but did not support the basic premise that markers can separate macrophages from DC or define lineages within the MPS.
Over the past 5 years, RNA sequencing (RNA-seq) has supplanted microarrays as an approach to expression profiling. The recent cascade of interest in tissue-specific macrophage adaptation has produced RNA-seq data for MPS cells isolated from most major organs of C57BL/6 mice. To enable comparative analysis of datasets from multiple laboratories, we devised an automated informatics pipeline employing random sampling of RNA-seq data to a common depth and quantification using the pseudo-aligner Kallisto. Robust transcriptional atlases for the chicken [43] and pig [44] were generated using datasets from numerous divergent sources. Using the same basic pipeline, we identified a total of around 500 RNA-seq libraries generated from isolated macrophage and cDC populations from 24 different studies that sample mouse MPS transcriptional diversity (Table 1). Here we apply transcriptional network clustering to this large dataset to analyse transcriptional adaptation across the entire mouse MPS.
METHODS
The RNA-seq datasets from within the BioProjects shown in Table 1 were downloaded from the European Nucleotide Archive (ENA). Table S1 contains all the SRA and NCBI accessions and sample descriptions. Individual BioProjects differ in methods of mRNA isolation, library preparation and sequencing methods, length, depth and strandedness, but previous analysis in other species [43, 44] indicated that they can still produce comparable expression level estimates. We initially included data from the large ImmGen UL1 project (GSE127267l; GSE124829; see [45]) but this project uses a novel ultra-low input RNA-seq pipeline based upon 1000 sorted cells and the single cell RNA sequencing (scRNA-seq) platform Smartseq2. Our analysis revealed a large batch effect relative to all other samples, and we therefore excluded these data. To reduce possible effects of sampling depth, and generate a common normalisation, the sequences were each randomly down-sampled to a depth of 10 million reads per library as described [43] and requantified using Kallisto v0.44.0 [46]. Kallisto quantifies expression at the transcript level, as transcripts per million (TPM), by building an index of k-mers from a set of reference transcripts and then ‘pseudo-aligning’ reads to it, matching k-mers in the reads to k-mers in the index. The selected BioProjects include subsets of resident tissue macrophages defined using surface markers and isolated by FACS from one tissue as well as temporal profiles of adaptation from monocytes to tissue macrophages. The purpose of this analysis was to identify clusters of transcripts that are robustly correlated. For this purpose, the diversity of transcriptomic space sampled is an asset.
Prior to network analysis, transcripts that were not detected at an arbitrary threshold of 10 TPM in at least one sample were removed to minimise stochastic sampling noise intrinsic in RNA-seq data. Given the nature of the samples, this also helps to reduce the low-level representation of transcripts derived from contaminating cells of non-myeloid origin. Of the 18,175 transcripts that met this minimum threshold, 11,578 were detected in at least 90% of RNA-seq datasets and 6,901 had a median expression >10 TPM. The TPM estimates for the 18,175 transcripts in all of the datasets included are provided in Table S1.
Network analysis was performed using the program Graphia Professional (https://kajeka.com/graphia-professional/). Pairwise Pearson correlations (r) were calculated between all samples to produce a sample-to-sample correlation matrix and inversely between all pairs of genes to produce a gene-to-gene correlation matrix. Gene co-expression networks (GCNs) were generated from the matrix, where nodes represent genes and edges represent correlations between nodes above a defined correlation threshold. For the sample-to-sample analyses an initial screen at the r value which entered all samples was performed, followed by subsequent analyses with higher r value to remove outliers and reveal more substructure in the networks. For each gene-to-gene analysis the r value was adjusted to retain the maximum number of transcripts with the minimum number of edges [43].
RESULTS AND DISCUSSION
Expression profiles of individual transcripts
To overview the heterogeneity of the macrophages and the effectiveness of normalisation we first considered the expression profiles of selected individual transcripts to explore the housekeeping genes and surface markers used in studies of MPS cells. The choice of appropriate reference genes for qRT-PCR is a significant issue in many studies, including macrophage differentiation [47]. Figure 1A shows candidate house-keeping genes (Hprt, Actb, B2m, Gapdh, Ppia) that are commonly used in qRT-PCR as reference genes. These transcripts vary between datasets and BioProjects but in pairwise analysis were only weakly correlated with each other (Figure 1B).
Figure 2A shows the expression pattern of transcripts encoding surface markers used to separate some of the subpopulations herein: Adgre1 (F4/80), Cd4, Cd74 (Class II MHC), Csf1r (CD115), Cx3cr1, Fcgr1 (CD64), Icam2, Itgax (CD11C), Lyve1, Mertk, Mrc1 (CD206), and Tnfrsf11a (RANK). Figure 2B shows a summary of the correlations between them. Consistent with studies using Csf1r reporter transgenes [48, 49], Csf1r mRNA was universally expressed in MPS cells albeit with significant variation in level, being highest in microglia and lowest in cDC1. Csf1r was correlated (r>0.5) with Adgre1, Fcgr1, Cx3cr1, Mertk and Tnfrsf11a but these transcripts were less correlated with each other. Mrc1 was reported to be correlated with expression of Lyve1 and inversely with MHCII [25, 50]. Across the entire spectrum of macrophage transcriptomes, Mrc1 was correlated with Lyve1 but was more widely-expressed (Figure 2A). There was no evidence of an inverse correlation between Mrc1 and Cd74 or other MHCII-associated transcripts.
Network analysis of relationships of MPS populations and expressed transcripts
To determine whether any transcripts encoding surface markers were correlated with cellular phenotype we used the graph-based network analysis tool Graphia Professional. Figure 3 presents a sample-to-sample correlation matrix generated using the Fruchterman-Rheingold algorithm in Graphia, showing the clear segregation of the tissue-specific macrophage populations (Figure 3A). Consistent with previous analysis of microarray datasets [37, 39, 40, 42] the isolated spleen, lung and lymph node DC subpopulations clustered together in the middle of the graph (red nodes in Figure 3B). Based upon their overall transcriptomic profile, the DC were no more divergent from other MPS populations than the isolated macrophages purified from different tissues were from each other. The apparent relationship to BioProject (Figure 3C) occurs mainly because most studies were focussed on a particular tissue or cell type. There may also be impacts from differing methods of extracting and processing RNA and low depth and single end libraries compared to high depth/paired end libraries but these did not produce obvious outliers.
The gene-centred network (GCN) for the same dataset was developed at an r value of 0.75 chosen based on the graph of network size vs correlation threshold shown in Figure S1. Figure 4A shows the whole network and Figure 4B highlights the tissue specific clusters and those that contain markers of other cell types, as discussed below. Table S2 summarises the coexpressed gene clusters and the average gene expression profiles of the clusters containing at least 10 nodes (transcripts). The graphs are colour-coded to indicate the tissue origin and cell-type as in Figure 1 (samples are listed in the Readme sheet of Table S2). An additional sheet in Table S2 provides GO term enrichment of the larger clusters. For ease of visualisation relative to sample information, profiles of surface markers and transcription factors discussed below are provided as an additional sheet in Table S1. Table 2 provides an overview of the major functional clusters discussed in more detail below. It is beyond the scope of this study to analyse and cite published evidence related to every transcript in detail. In Table 2, individual genes from within the cluster have been included based their candidate role as transcriptional regulators and upon known associations with mononuclear phagocyte biology determined by PubMed search on Genename AND macrophage or dendritic cell. On the principal of guilt-by-association [36-40] there are hundreds of other genes within these clusters that have inferred functions in innate immunity and mononuclear phagocyte biology.
Major macrophage-enriched co-regulated clusters
At the chosen r value of 0.75, the GCN approach using the normalised data from multiple laboratories identified many co-regulated clusters of transcripts that are consistent with knowledge inferred from smaller datasets.
Cluster 1 is a generic MPS cluster which drives the relatively close association between all of the samples, including the different subclasses of DC, in the sample-to-sample network (Figure 2A) and distinguishes MPS cells from other leukocytes. It includes Csf1r, Fcgr1, Cd68, Sirpa, Tnfrsf11a and the core myeloid transcription factor gene Spi1 alongside many other known macrophage-enriched transcription factors [51, 52]. One notable inclusion is the glucocorticoid receptor gene, Nr3c1, which mediates transcriptional activation of a wide-range of anti-inflammatory genes in macrophages [53]. As one might expect from the known endocytic and secretory activity of MPS cells, the cluster is enriched for GO terms related to endosome/lysosome and intracellular transport/secretion that are major constitutive functions of mononuclear phagocytes [36] (Table S2). Transcripts in Cluster 3 were also expressed widely in MPS cells but the cluster has a distinct average expression profile. Cluster 3 includes genes encoding several forkhead transcription factors (Foxo3, Foxo4, Foxk1 and Foxk2), the key transcriptional regulators of autophagy [54-56], and Nfat5 which controls macrophage apoptosis [57]. This cluster also contains Mertk, the perforin-like immune effector gene Mpeg1, Aim2, which encodes a sensor for cytoplasmic DNA [58] and transcripts for numerous DEAH- and DEAD box helicases all implicated in DNA sensing in innate immunity [59]. There are also members of the NAIP family of inflammasome regulators (Naip2, 5, 6); reviewed in [60]). We infer that this cluster of transcripts reflects an independently-regulated capacity for innate immune recognition of internalised pathogens. Other than Mertk, there is no obvious plasma membrane marker associated with this set of candidate innate immune effector genes.
Genes in Cluster 4 were strongly expressed in samples from brain and include microglia-enriched markers that are depleted in brains of Csf1r-deficient mice and rats, such as Cx3cr1, Tmem119, P2ry12 and the key transcription factor genes Sall1, Sall2 and Sall3 [16, 61]. Cluster 9 contains the S phase transcription factor gene Foxm1 and numerous cell cycle-associated transcripts [62] and the GO term enrichment supports a cell cycle association. The cell cycle cluster was expressed in all isolated MPS populations at various levels consistent with evidence that they are capable of self-renewal in the steady state [5, 12]. The separation of this cluster indicates that proliferative activity is not tightly-linked to any MPS differentiation state or surface marker.
Identification of a capillary-associated expression cluster
Most macrophages and DC included in this analysis were purified by FACS based upon their expression of specific markers including those shown in Figure 1B (see Table 1). Chakarov et al. [25] identified a population of pericapillary cells in the lung that expressed LYVE1 and extended their analysis to FACS-separated cells from fat, heart and dermis. Their RNA-seq results are included in our dataset. Based upon analysis of differentially expressed genes, the authors identified a set of genes with high expression in sorted LYVE1hi macrophages relative to LYVE1lo macrophages across the four tissues, including Mrc1,Timd4, Cd5l, Fcna and Vsig4 [25, 50]. The GCN reveals that there is, indeed, a set of transcripts (Cluster 22) that is strongly correlated with Lyve1 expression across a larger spectrum of tissues. The cluster includes Mrc1 but excludes Timd4, Cd5l, Fcna, and Vsig4, which were associated with distinct tissue-specific clusters (Table 2). The correlation between Lyve1 and Mrc1 is actually lower than the cluster threshold of 0.75 (r=0.62, Figure 2B). The two genes were included in Cluster 22 because of shared links to other genes. In fact, Mrc1 was only marginally-enriched in the purified LYVE1hi macrophages from fat, heart, lung and skin [25] and it was highly-expressed in isolated cells from adipose, brain, intestine, kidney and liver that lack Lyve1 mRNA (see Table S1/selected transcripts). We conclude that most LYVE1hi macrophages express Mrc1, but the reciprocal relationship does not hold.
The set of co-expressed genes in Cluster 22 suggests a function for LYVE1hi macrophages in control of endothelial biology and vascular permeability. It includes genes for two of the sphingosine-1-phosphate receptors (S1pr1/2) which have been implicated in many aspects of inflammation, lymphangiogenesis and angiogenesis [63, 64], the vanilloid receptor (Trpv4), which controls capillary barrier function and inflammation [65, 66] and neuropilin 1 (Nrp1) which controls endothelial homeostasis [67]. Cluster 22 also contains the erythropoietin receptor gene (Epor), which was shown to synergise with S1P to promote apoptotic cell clearance by macrophages [68] and the EGF receptor gene (Egfr) which has also been shown to regulate macrophage function in a range of inflammatory models [69]. Indeed, the co-expressed genes might support the known functional association of macrophages with lymphatic as well as blood vessels [70]. The Lyve1-associated cluster contains genes for three novel candidate transcriptional regulators Etv1, Nfatc2 and Tcf4. Etv1 expression in macrophages has been implicated in functional polarization in vitro and the response to altered mitochondrial membrane potential [71]. Nfatc2 is required for osteoclast differentiation in vitro [72] but roles in macrophage differentiation/function have not been explored. Tcf4 encodes a transcription effector of Wnt/β-catenin pathway, implicated in responses to E-cadherin and other effectors in macrophage differentiation [73].
Mrc1 is commonly referred to as a marker for alternative or M2 macrophage polarization [74]. Another putative marker of M2 polarization is the somatic growth factor insulin-like growth factor 1 (Igf1 gene) [75]. Igf1 was correlated with Mrc1 (r=0.67) but did not form part of a co-expression cluster. It was absent from monocytes and DC but was highly-expressed in most resident tissue macrophages (see Table S1/selected transcripts). Igf1 is CSF1-inducible and of particular interest because of the profound impact of Csf1r mutations in multiple species on postnatal growth and development [7]. Unlike hepatocytes and mesenchymal cells, tissue macrophages did not express transcripts encoding the growth hormone receptor (Ghr), Igf1r, or the Igf1 binding protein genes (Igfals, Igfbp1,2,3,5,6). The exception is Igfbp4 which was highly-expressed in most macrophage populations and did form part of the Lyve1/Mrc1-associated Cluster 22. Interestingly, Igfbp4 knockout in mice mimics impacts of Csf1r deficiency on somatic growth and adipose formation [76, 77].
The intimate association of macrophages with capillaries was evident from the first localization of the F4/80 antigen [78]. Adgre1 expression was also correlated with Mrc1 (r=0.64; Figure 2B), but it was more widely-expressed than either Mrc1 or Lyve1 and therefore not within Cluster 22. Adgre1 was not enriched in any of the purified LYVE1hi macrophage populations relative to LYVE1lo cells from the same tissue [25]. It was high in most isolated tissue macrophages and induced during differentiation of monocytes in situ as in the liver dataset [32] and the intestinal developmental series [33, 34]. F4/80 was proposed as a marker of macrophages of embryonic origin [79] but Adgre1 was equally high in intestinal macrophages, which turn over rapidly from monocytes [80, 81] and in cDC2. It was also strongly induced during monocyte differentiation to occupy a vacant Kupffer cell niche [32]. Whatever the association with ontogeny, the pattern is rodent-specific. Adgre1 is a rapidly-evolving gene and the expression pattern also varies across species [82].
Tissue-specific macrophage clusters
Several co-expressed clusters were associated with MPS cells isolated from a single tissue. Aside from the large brain-enriched expression cluster (Cluster 4) that contains many microglia markers, Cluster 10 was lung-enriched and contains the alveolar macrophage marker Siglecf and key transcription factor Pparg [15]. Cluster 12 was shared amongst liver Kupffer cells (KC), peritoneal macrophages and splenic macrophages and includes the transcription factors Id3, Nr1h3 and Smad6 and markers Cd5l, Clec4f and Vsig4 [15, 32, 34]. A novel finding was the strong coexpression (r = 0.81) between Nr1h3 and Rxra, the gene encoding its promiscuous heterodimerisation partner, which is also implicated in control of KC lipid and iron metabolism [83] and may have independent function in innate immune regulation [84].
The average expression of Cluster 12 increased progressively in the monocyte-KC differentiation series [32] included in this dataset (see profile in Figure S2). Cluster 12 also reveals the regulated and coordinated expression of the thyroid hormone receptor (Thrb), likely mediating the many impacts of thyroid hormones in innate immune function [85]. One other novel candidate regulator identified in this cluster is Zbtb4 which encodes an epigenetic regulator with a high affinity for methylated CpG. Zbtb4-/- mice are viable and fertile but growth retarded compared to littermates [86]. Impacts on myeloid differentiation have not been reported. The transcription factor SPIC is implicated in splenic red pulp macrophage differentiation and iron homeostasis [87, 88]. Although Spic mRNA was highest in red pulp macrophages, KC and bone marrow macrophages, it was detected in other macrophage and DC populations and therefore has a unique expression profile. Cluster 21 contains transcripts most highly-expressed in resident peritoneal macrophages and includes the genes for the transcription factor Gata6 and the retinoic acid receptor (Rarb) which control peritoneal macrophage survival and adaptation [89, 90]. The data confirm the specific high expression of the enigmatic plasminogen activator inhibitor encoded by Serpinb2 in resident peritoneal macrophages, first described >20 years ago [91] and still seeking a function [92].
Genes in Cluster 15, including the monocyte-specific chemotactic receptor Ccr2, were highly expressed in classical monocytes. Genes in Cluster 43 were expressed specifically in Langerhans cell (LC). They include the marker Cd207 (langerin), used in the purification of LC [93] but also expressed at lower levels in many other tissue macrophage populations. This cluster did not include the gene for another LC marker, Epcam [93]. It was highly-expressed in LC but also detected in one set of intestinal macrophage samples, most likely a contamination with epithelial cells (Cluster 5, see below). Epidermal LC have at times been considered as DC-like because of their migratory and APC properties but are now considered to be specialized resident tissue macrophages [94]. Unlike most classical DC in lymphoid tissue, they are clearly CSF1R-dependent and share with several other macrophage populations dependence on the conserved enhancer in the Csf1r locus [16]. Cluster 43 did not include a transcriptional regulator specific to LC. In common with several other macrophage populations, LC differentiation is regulated by TGFβ signaling, involving transcription factors RUNX3 and ID2 [94]. Both transcription factor genes were highly-expressed in LC but also present in several other tissue macrophage populations.
Intestinal macrophage-enriched gene expression profiles, which have not previously been identified, emerge in Cluster 38. Two large separate datasets of intestinal macrophages were included here [33, 34], both likely reflecting a differentiation series of adaptation from blood monocytes to resident intestinal tissue macrophages [5]. In one case, CD4 and TIM4 were used as markers [34] but each of these markers is shared with other macrophage populations. Cd4 mRNA expression was shared uniquely with lung, skin and kidney macrophage subpopulations (see Figure 2B). A third dataset tracks the adaptation of transferred blood monocytes to the intestinal niche [95]. Cluster 38 identified Cxcr4 as a candidate intestinal macrophage marker consistent with their continuous derivation from CXCR4+ monocytes. The high expression of Wnt4 in lamina propria macrophages was recently confirmed by IHC. Conditional deletion of Wnt4 using Itgax-cre led to dysregulation of immunity against an intestinal parasite [96]. WNT4 is a candidate mediator of the key trophic role of lamina propria macrophages in the intestinal stem cell niche [97]. Fosb, Hes1 and Hic1 encode identified potential transcriptional regulators of intestinal macrophage differentiation and adaptation. HES1 inhibits inflammatory responses in macrophages and contributes to gut homeostasis [98, 99]. FOSB has not previously been implicated in macrophage adaptation to any niche. Unfortunately, we were not able to include data from a microarray analysis of resident colonic macrophages which identified a set of 108 genes >2-fold higher in the colon relative to other macrophage populations in the ImmGen database [100]. However, Cluster 38 confirmed the gut macrophage-specific expression of several of these transcripts including Dna1l3, Fgl2, Gpr31b, Hes1, Mmp13, Ocstamp, Pgf and Tlr12.
There were no unique expression profiles enriched in macrophages isolated from any other major tissues including adipose, brain (non-microglia), heart, kidney, pancreas or skin. The abundant resident macrophages of adipose are especially topical in light of the obesity epidemic. The literature on adipose macrophages focusses on “M2-like” markers [101]. Amongst resident macrophage populations, Apoe and Retnla, both detected in most tissue macrophages and not included in a cluster, were highest in adipose-derived macrophages. RETNLA (aka RELMα) has been referred to as an adipokine, regulated by food intake and controlling lipid homeostasis [102]. Kumamoto et al. [103] claimed that Retlna was co-expressed with Mgl2 (another putative M2 marker) in many mouse tissues including adipose and attributed it a role in maintenance of energy balance. The two transcripts were not correlated in this larger dataset. In fact, Mgl2 was part of a small cluster (Cluster 83) with Cd14. Like Retnla, mRNA for the related lectin, MGL1 (Clec10a gene), also considered an M2 macrophage marker [101] was highest in the adipose-associated macrophages but also expressed in macrophages from other tissues including dura, heart, lung and skin (Cluster 101).
Dendritic cell co-expression clusters
Despite evidence that it is expressed by many resident tissue macrophages (reviewed in [24]), CD11C (ITGAX) is still widely-used as a surface marker in mouse DC purification. Ongoing studies of the impacts of conditional mutations using Itgax-cre continue to be interpreted solely in terms of DC specificity (e.g. [96, 104, 105]. Consistent with the literature, Itgax was expressed in multiple macrophage populations (Figure 2A) at levels at least as high as in purified DC, and correlated only with Cd22, Cd274 (encoding PD-L1), Csf2rb, Csf2rb2, Slc15a3, Tmem132a and the transcription factor gene Prdm1 (Cluster145). Class II MHC is also commonly used as a marker to purify DC and expression is obviously a prerequisite for antigen presentation to T cells. The ImmGen consortium compared DC from multiple sources with various macrophage populations to identify transcripts that distinguish DC from macrophages [26, 27]. Since the macrophages used for comparison were MHCIIlo, the DC signature included class II MHC genes. In our meta-analysis, one small cluster (Cluster 165) contained the transcription factor gene Ciita and its targets Cd74, H2-Aa, H2Ab1, H2-DMa/b1, H2-Eb1. The genes in this cluster were clearly highly-expressed in many tissue macrophages, (see profile for Cd74 in Figure 2A) but regulated independently of any other markers and expressed no higher in cells annotated as DC than in cells annotated as macrophages from intestine, lung, heart and kidney. Interestingly, again highlighting the issue with a definition of DC based upon unique APC function, isolated lung MHCIIhi interstitial macrophages were as active as cDC2 in antigen-presentation assays in vitro [25].
The GCN analysis did identify three separate DC-associated co-expression clusters that are consistent with current knowledge of putative DC subsets and adaptation in mice [20, 21, 106]. Cluster 13 includes Ccr7 and transcription factors Spib and Stat4, Cluster 28 includes Flt3, Kit and the transcription factor Relb and Cluster 49 includes cDC1 markers Itgae (CD103) and Xcr1. CCR7 is associated with DC migration [107] and the transcript was abundant in both cDC1 and cDC2 isolated from spleen and lymph node (LN). By contrast, the expression was much lower in isolated lung DC and in kidney DC from a separate dataset (see below), similar to levels in isolated macrophages from multiple tissues. Several putative DC markers were excluded from DC-specific clusters. The transcription factor gene Batf3, implicated in cDC1 differentiation [108] did not form part of a cluster and was detected in most macrophage populations (consistent with[15]). Similarly, IRF4 has been attributed a specific function in cDC2 differentiation [105]. Irf4 mRNA was more abundant in cDC2 compared to cDC1 but was also expressed in monocytes and monocyte-derived macrophage populations. Transcripts encoding NFIL3 and IRF8, which interact in the regulation of cDC1 differentiation [109] were also highly-expressed in cDC2 and in monocytes and many tissue macrophages. Although the transcription repressor gene Zbtb46, encoding a putative DC lineage marker [110] was highest in DC it was also detectable in most isolated tissue macrophages, notably in kidney and lung. Another putative DC marker gene, Clec9a [111] also clustered independently because of expression in isolated intestine, kidney, liver and lung macrophages.
Interestingly, tissue macrophages may contribute to homeostatic regulation of cDC differentiation. The transcript encoding the FLT3 ligand (Flt3l) was expressed constitutively to varying degrees in all of the MPS populations studied. Fujita et al. [104] showed that FLT3L is cleaved from the cell surface of expressing cells by ADAM10. Conditional deletion of Adam10 using Itgax-cre led to reduced differentiation of cDC2. Adam10 is also expressed by CD11C+ macrophages; it forms part of Cluster 3, low in monocytes and expressed by all resident macrophages at higher levels than in DC.
Aside from CLEC9A, many other lectin-like receptors have been proposed as DC markers and inferred to have a function in antigen uptake. Figure 5 shows the profiles for the 12 members of the so-called dendritic cell immunoreceptor (DCIR) family. The original member of this family, Clec4a2, the likely ortholog of the single CLEC4A gene in humans, encodes a lectin with a broad binding specificity for mannose and fucose [112]. Studies on knockout mice lacking Clec4a2 continue to be based upon the claim that the lectin is mainly expressed by DC [113] but the global analysis showed that it is more highly-expressed in most isolated macrophage populations. Two of the DC-associated clusters contained other members of the family, Clec4a4 and Clec4b2. Clec4a4 has been attributed a specific role in cDC1 dendritic cell function [114] but it was equally expressed in cDC2 and forms part of Cluster 28. Most of the Clec4 genes in the mouse genome are in a single location on Chromosome 6. They also include macrophage-inducible C-type lectin (Mincle) encoded by Clec4e, which mediates innate immune responses to Candida [115]. The related Clec4f (Kupffer cell marker) and Cd207 (langerin) are located together in a separate locus on Chromosome 6. Each of the Clec4 genes had a unique expression profile in tissue macrophage populations. Analysis of the entire dataset reveals that “DCIR” is a misnomer for this family. The DC designation has also been misapplied to other putative markers, including DC-SIGN (CD209 in humans), DEC205 (Ly75) and DC-HIL (Gpnmb). In mice there are multiple Cd209 paralogs. Cd209b was highly-expressed in marginal zone macrophage populations in spleen and is Csf1r-dependent [61]. These cells have not been successfully isolated by tissue disaggregation. Four members of the CD209 family (Cd209a,d,f,g) were co-expressed in a unique pattern (Cluster 100) together with Cbr2, Ccl24 and Clec10a. Ly75 was detected in both cDC subpopulations but was most highly-expressed in lung macrophages (Cluster 10).
CD64 was used as an exclusion criterion to remove or separate macrophages from DC or to enrich macrophages in all of the datasets included herein based upon the earlier studies of the ImmGen Consortium [26]. This exclusion was clearly successful in that all the purified DC have very low Fcgr1 (Figure 2A and Table S1), but the expression of this gene in macrophage populations was also highly variable. As a simple screen for additional markers that distinguish all “macrophages” from all “DC”, we averaged expression across all macrophage and DC samples and compared them (see Table S1). Amongst the transcripts that were robustly-expressed and highly-enriched in macrophages to at least the same extent as Fcgr1, those encoding surface markers were also variably-expressed amongst macrophage populations. However, we identified three transcription factor genes, Cebpb, Mafb and Klf10, that were apparently excluded from all of the cDC. The role of Cebpb in macrophage differentiation is well-recognised [116-118] and one of the datasets includes progenitors from Cebpb-/- mice [118]. There is evidence of a negative feedback relationship with Irf8 in monocyte-derived DC [119]. Cebpb was detected in most tissue macrophages but uniquely excluded from some populations, notably the heart and intestinal muscularis. Mafb has been proposed previously as a lineage marker separating macrophages from DC [120, 121]. The literature on Klf10 is more limited, with evidence that it participates in TGFβ-induced macrophage differentiation [122].
Resident macrophage activation during isolation
Cluster 41 contains numerous immediate early genes (IEG) encoding transcription factors and feedback regulators (e.g. Fos, Egr1, Dusp1) consistent with evidence that isolation of cells from tissues produces cell activation, from single cell sequencing of disaggregated cells [123, 124]. In many samples, IEG were amongst the most highly-expressed transcripts. The majority of isolated macrophage populations also had high levels of macrophage-specific LPS-inducible genes. Cluster 224 contains Ccl2, Ccl7, Ccl12, Cxcl1 and Il6, Cluster 329 includes Il1b and Ptgs2 (Cox2), and Cluster 485 contains Tnf and inducible chemokines Ccl3 and Ccl4. The anti-inflammatory cytokine Il10, which is also LPS-inducible, formed part of the intestinal macrophage cluster (Cluster 38). IL10 is essential to intestinal homeostasis [80] but Il10 mRNA was detected in only one of the three intestinal macrophage datasets [34] alongside very high expression of IEG and proinflammatory cytokines (e.g. IL1b, Tnf). Inflammation-associated transcripts were highlighted as evidence of activation in vivo in sensory neuron-associated macrophages [125]. Similarly, Chakarov et al.[25] highlighted selective expression of Il10 in interstitial lung macrophages, and differential expression in the LYVE1hi/MHCIIlo subpopulation. They did not comment upon the reciprocal pattern observed with Tnf and Ilb, both more highly-expressed in the LYVE1lo macrophages. Both populations of interstitial lung macrophages (and all the samples from other tissues in this BioProject) expressed very high levels of all of the IEG transcripts in Cluster 41. Whereas macrophage-expressed transcripts such as Adgre1 are readily detected in total tissue mRNA, and are CSF1R-dependent, inflammatory cytokines and IEG transcripts are not [16,61]. Accordingly, in each of these studies, the expression of IEG and inducible cytokines is most likely an artefact of tissue disaggregation. Consistent with that conclusion, the clear exception in which IEG were not detected is peritoneal macrophages, which are not subjected to the stress of enzymic digestion during isolation.
Interestingly, Acod1, which was massively-induced within 1 hour by LPS in mouse macrophages in vitro (see http://biogps.org) was only detected at low levels in a small subset of samples and not correlated with IEG or any other inflammatory activation markers. Induction of this gene has attributed functions in adaptive immunometabolism and accumulation of TCA cycle intermediates in activated macrophages [126]. The lack of detection in the isolated macrophages suggests either that induction is specific to recruited inflammatory macrophages or that inducible expression is purely an in vitro phenomenon. The Acod1 expression pattern was correlated only with Il23a (encoding a subunit of the cytokine IL23) at the stringency used here (r ≥ 0.75).
Contamination of macrophage populations with other cell types
Table 3 and Figure 4B highlight other clusters that were tissue-specific and contained markers and transcription factors associated with organ/tissue-specific differentiation, with corresponding enrichment for GO terms associated with specific tissues (Table S2). There are three ways in which mRNA from purified macrophage/DC populations may be contaminated with mRNA from unrelated cells. The most straightforward is poor separation of macrophages from unrelated contaminating cells by FACS for purely technical reasons. A second source derives from active phagocytosis by macrophages of senescent cells, where RNA from the engulfed cell may be detected. Finally, there is a phenomenon that arises from the extensive ramification of macrophages and their intimate interactions with other cells. Gray et al. [127] found that cells purified from lymph nodes with the surface marker CD169 were in fact lymphocytes coated with blebs of macrophage membrane and cytoplasm. Similarly, Lynch et al. [128] found that all methods to isolate Kupffer cells (KC) for flow cytometry produced significant contamination with CD31+ endothelium tightly adhered to remnants of KC membrane.
Cluster 2 appears to be generic “rubbish” cluster, containing transcripts detected at relatively low levels only in specific BioProjects and unrelated to tissue of origin. Other clusters were driven by a single RNA-seq result from within one BioProject. These clusters most likely represent technical noise as well as contamination.
Consistent with the proposal from Lynch et al. [128] three endothelial-associated transcripts, Cdh5, Pecam1 and Stab 2, were contained with the KC-enriched cluster (Cluster 12) and apparently increased in expression during KC differentiation. However, other endothelial transcripts were. Bonnardel et al. [129] generated RNA-seq data from purified liver sinusoidal endothelial cells (EC). We examined the profiles of the most highly-expressed EC genes in the macrophage dataset. Many of them were detectable in isolated KC but at much lower levels than Cdh5, Pecam1 and Stab2. They contributed to a separate liver-specific endothelial cluster (Cluster 76). So, whilst there is evidence that EC contaminate KC preparations reflecting the close apposition in the sinusoids, Chd5, Pecam1 and Stab2 are likely also genuine KC-expressed transcripts.
The detection of mature red cell transcripts encoding haemoglobin (Hba, Hbb), which are quite abundant in many macrophage populations, most likely reflects ongoing erythrophagocytosis. Macrophages isolated from the intestinal lamina propria in one of the two large datasets from small intestine [33] were heavily contaminated with markers of intestinal epithelium (Clusters 5/17). This might be a separation artefact but could also reflect an active role of macrophages in clearance of senescent epithelial cells [80]. Cluster 18 and Cluster 45 were restricted to samples from a study of pancreatic islet and peri-islet macrophage populations [130]. The authors noted the expression of insulin (Ins1) mRNA in their islet macrophage populations and attributed it to an intimate interaction with β-cells. Contamination or β-cell-macrophage fusion was said to be excluded on the basis that β-cell markers such as Pdx1 were not detected. However, many other islet-associated transcripts were abundant and formed part of Cluster 18, notably transcription factors Isl1, Foxa2, Nkx6.1, Nkx2.2 and other abundant islet-specific transcripts, Inhba, Chga/b, Iapp, Gipr and Gcg. Similarly, Cluster 45 was relatively enriched in the peri-islet macrophages and contains transcripts encoding many pancreatic enzymes. Cluster 65 includes Acta2 and other smooth muscle markers which selectively contaminated macrophages isolated from the intestinal muscularis [33].
The bone marrow contains several populations of macrophages [29] including those associated with hematopoietic islands expressing CD169 (Siglec1 gene) and VCAM1. One of the datasets included in the present study profiled the transcriptome of macrophages associated with erythroblastic islands, based upon isolation using an Epor-EGFP reporter gene [131]. A second bone marrow dataset separated macrophages based upon their engagement in phagocytosis of blood-borne material [132]. The putative erythroblastic island macrophages did not actually express increased Epor mRNA (although Epor was detected in other macrophage populations as reported recently [68] and fell within the Cluster 22). However, in the isolated bone marrow macrophages, Siglec1 was correlated with high levels of both immature neutrophil (Cluster 32) and erythroid-associated (Cluster 33) mRNAs. The separation of these two clusters implies that the contamination occurs in distinct macrophage populations, enriched selectively in each preparation and perhaps derived from separate hematopoietic islands [29]. Cluster 32 also contains the myeloid progenitor transcription factor Myb and the GM-progenitor marker Ms4a3. Given the extensive ramification of marrow macrophages and their intimate interactions with progenitors [29], this contamination likely reflects the same isolation artefact reported in lymph node [127], namely haemopoietic progenitor cells cloaked in macrophage clothing.
There are separate clusters including B cell and NK cell-specific markers. The B cell cluster, Cluster 87, shows highest average expression in intestine, bone marrow, lung and spleen and likely reflects close association between macrophages and B cells in lamina propria and germinal centres [33]. The cluster containing NK cell markers, Cluster 67, had highest average expression in one of the DC preparations. Those DC came from a study that proposed a further subdivision of cDC2 based upon expression of transcription factors T-bet (Tbx21) and RORγT [133] and separated cDC2 based upon expression of a Tbx21 reporter allele. Tbx21 was detected in all purified splenic cDC preparations presented on http://biogps.org, but at much lower levels than in NK cells. NK cells also express Itgax, used in purification of the cDC. Accordingly, it seems likely that apparent Tbx21 expression in DC is due to NK cell contamination.
Clustering of transcription factor (TF) expression
Most of the co-regulated clusters identified above contain genes encoding transcriptional regulators that are known to be essential for tissue-specific adaptation. These represent only a small subset of the transcriptional factors detected in MPS cells. The r value of 0.75 was chosen empirically for the analysis of the whole dataset to maximise the number of genes included while minimizing the number of edges between them (Figure S1) and aimed at assessing the predictive value of markers including those shown in Figure 2. To test the effect of reducing the stringency we focused on annotated transcription factors [134] to reduce the complexity and remove noise. 1103 transcriptional regulators were detected above the 10 TPM threshold in at least one macrophage population. The sample-to-sample matrix (Figure 6) shows that populations sourced from different tissues could be distinguished based upon TF expression alone. It also shows that TF expression in the DC populations was similar to that in macrophage cells. We generated GCN at three different Pearson correlation coefficient thresholds, 0.5, 0.6 and 0.7. The results are provided in Table S3. As the cut-off was reduced, more TF transcripts were included in the network. At the highest stringency r value ≥ 0.7, the largest cluster includes Spi1 alongside many of the transcription factors identified in the largest generic MPS co-expression clusters above (Clusters 1, 3 and 4). We conclude that the basic shared identity of MPS cells involves coordinated expression of around 100-150 transcription factors. Even at the lowest r value (≥ 0.5), transcription factor genes identified as specific to particular tissue-specific MPS populations made few additional connections, indicating that local adaptation is dependent on highly-correlated and regulated expression of a small cohort of TF. Nevertheless, associations that become evident at lower r value may identify combinatorial interactions in particular cell populations; Mycl, associated with DC fitness ([135] was weakly-correlated with Irf8 and Zbtb46; Cebpb with Nfil3 and interferon-related transcription factors (Batf2, Irf1/7/9, Stat1/2) were connected at the threshold of 0.5 (Table S3).
Expression of solute carriers and metabolism genes in macrophage populations
The burgeoning field of immunometabolism has focused on regulation of intermediary metabolism in recruited monocytes and macrophages in various states of activation or polarization [126]. Amongst emerging concepts is the view that M1 polarization (classical activation) is associated with aerobic glycolysis and mitochondrial dysfunction, whereas M2 polarization requires an active tricarboxylic acid cycle [126]. Cluster 7 contains mitochondria-associated transcripts and transcripts encoding ribosomal subunits, with variable expression across all samples, even from the same tissue, indicating the resident tissue macrophages vary in their dependence upon mitochondrial oxidative phosphorylation irrespective of surface markers or differentiation state.
In many cases metabolic pathways are regulated at the level of solute transport [126]. There were > 400 members of the large solute carrier (SLC) family expressed in mononuclear phagocytes above the 10 TPM threshold. Some were more highly expressed in intestine and kidney epithelial cells and clustered with tissue-specific epithelial markers. However, many contributed to macrophage-enriched expression clusters. One such gene, Slco2b1, which encodes an organic anion transporter of unknown function, has been proposed as a marker gene to distinguish macrophages from DC subsets and the promoter was used in an inducible macrophage depletion strategy [25]. The larger dataset does not support this dichotomy. Slco2b1 is part of Cluster 4, enriched in microglia and absent from multiple other macrophage populations as well as both cDC subsets.
Macrophages depend to varying degrees upon glutamine, glucose and fatty acids as fuels [136] and glutamine is an important immune regulator [137]. 14 different solute carriers from 4 families have been shown to transport glutamine [138]. Of the genes encoding these carriers, Slc38a1 was widely expressed in MPS cells and did not fall within a cluster, whereas Slc7a5, Slc7a7, Slc7a8 and Slc38a7 were part of distinct macrophage-enriched clusters. Consistent with the importance of glutamine as a fuel for MPS cells, transcripts encoding enzymes of glutamine metabolism (Gls, Glud1, Glul, Slc25a11) were also highly-expressed and part of Clusters 1 and 3. By contrast, resident MPS cells apparently have limited expression of glucose transporters. Slc2a1 (encoding glucose transporter GLUT1) was low, highly variable and idiosyncratic amongst tissues. A myeloid-specific conditional knockout of Slc2a1 confirmed that GLUT1 is the major glucose transporter in macrophages but the loss of glucose as a fuel had remarkably little impact on macrophage function [139]. The expression of Slc2a1 in cells isolated from tissues is difficult to interpret since the transporter is induced by hypoxia [140], which might arise during isolation, and Slc2a1 was barely detectable in peritoneal macrophages, which are less likely to undergo stress during isolation. In the absence of Slc2a1, macrophages increase oxidation of fatty acids [139]. The Slc27a1 gene, encoding the fatty acid transporter FATP1 which also contributes to functional regulation in macrophages [141, 142], was widely-expressed in tissue macrophages and, with carnitine acyl transferase genes (Crat, Crot), formed part of Cluster 1. Slc2a5 (found in Cluster 4) encodes a fructose-specific transporter [143] and was expressed primarily in microglia. Slc2a6 is a lysosome-associated glucose transporter that was recently knocked out in the mouse genome [144]. It also has a novel expression profile being highest in monocytes and cDC2.
One of the best known functional solute carriers in macrophages is natural resistance associated membrane protein 1 (NRAMP1; Slc11a1 gene), which is associated with genetic resistance to intracellular pathogens. SLC11A1/NRAMP1 is expressed in lysosomes and contributes to pathogen resistance by restricting available iron [145]. The role in iron metabolism is reflected by its presence in Cluster 12, alongside Slc40a1, encoding ferriportin, the macrophage-enriched iron exporter [146]. One other prominent class of solute carriers highly expressed in macrophages (Slc30a6, Slc30a7, Slc30a9; Slc39a3, Slc39a7, Slc39a9 in Cluster 1; Slc39a12 in Cluster 4) is involved in transport of zinc, which is a component of antimicrobial defense [147, 148]. Two further zinc transporters, Slc39a2 and Slc39a11, were enriched in lung macrophages (Cluster 10). This lung macrophage-enriched cluster also contains Slc52a3, encoding a riboflavin transporter, Slc6a4 (sodium and chloride dependent sodium symporter) and 2 members of the Slc9 family of sodium-hydrogen exchange (NHE) transporters (Slc9a4 and Slc9a7) which are more traditionally associated with epithelial function [149].
Validation of co-expression clustering with an independent kidney dataset
The abundant macrophage populations of the kidney were first described in detail using F4/80 as a marker in situ [150]. There has been considerable debate about the relationships between resident macrophages, monocyte-derived macrophages and cDC subsets in the kidney [151]. The main cluster analysis did not reveal a separate kidney resident macrophage-enriched profile. The kidney dataset in the analysis included F4/80+, CD64+ macrophages isolated from control and ischemic kidneys, further subdivided based upon expression of CD11B and CD11C [152]. Salei et al. [111] recently produced RNA-seq data for isolated populations of resident macrophages, monocyte-derived cells, cDC1 and cDC2 from kidney compared to similar populations from spleen. The primary data were not available for download by our automated pipeline through the ENA at the time we pooled and froze our dataset (February 2020). We therefore obtained the processed data directly from the authors and carried out network analysis using the 33 samples and 9,795 genes with normalized expression of at least 10 in at least one sample. The macrophages of the kidney are intimately associated with the capillaries [153] but Lyve1 was not detectable in resident macrophages in this dataset or in [152]. Published IHC on mouse kidney reveals that LYVE1 is restricted to lymphatic vessels [154].
Figure 7 illustrates the way in which the sample-to-sample matrix revealed relationships between the cell populations with increasing correlation coefficient threshold. Even at the lowest correlation cut-off, used in the main atlas (0.75), the splenic red pulp macrophages separated from all kidney and DC samples. As the cut-off was made more stringent, the cDC1 from both spleen and kidney separated, but the resident kidney macrophages, cDC2 and monocyte-derived macrophages remained closely connected until r ≥ 0.98 when the spleen cDC2 separated from the monocyte-derived macrophages and kidney cDC2. At r ≥ 0.99 the kidney cDC2 and monocyte-derived macrophages were still not separated indicating that the expression profiles of these cell types are very similar. Salei et al. [111] performed a principal components analysis based upon the 500 most variable transcripts and also identified the close relationship between cDC2 and monocyte-derived cells. Our analysis further emphasizes their conclusion that the main axis of difference is between spleen macrophages and all other cells. cDC1 from both tissues were more similar to each other than to the other cells but spleen cDC2 were only separated from kidney cDC2 and monocyte derived macrophages at the highest stringencies. We also performed a gene-to-gene analysis on these data. The profiles of kidney myeloid cells other than cDC1 were very similar and differed by only a small number of genes. Consistent with this conclusion, the two largest clusters in this analysis (see Table S4) were shared between all of the isolated populations and contain Spi1 as well as many of the DC-enriched markers identified in the main analysis. However, Ccr7 and many of the genes associated with it in the main dataset (Cluster 13, Table 2; e.g. Spib, Stat4, Vsig10, Cd200, Itgb8) were expressed at low levels in isolated kidney DC as in lung DC. Cluster 3 of the kidney analysis was specific to splenic red pulp macrophages and contains the known transcriptional regulators Pparg, Spic and Nr1h3. Transcripts in Cluster 4 were enriched in the resident kidney macrophages compared to both splenic macrophages and other kidney myeloid populations. Interestingly, the resident kidney macrophage cluster includes many genes that are also highly-expressed in microglia and depleted in the brain in Csf1r mutant mice and rats, including Cx3cr1, C1qa/b/c, Csf3r, Ctss, Fcrls, Hexb, Laptm5, Tgfbr1, Tmem119 and Trem2 [16, 61]. These were also detected in the isolated kidney macrophages in Table S1. Both microglia and resident F4/80hi kidney macrophages are selectively lost in a mouse line with a mutation in a conserved enhancer of the Csf1r locus [16]. Runx1, which regulates the activity of the Csf1r enhancer [155] and has also been implicated in the establishment of microglial cells during development [156] was within this cluster. Csf1r mRNA was expressed at high levels in cells defined as cDC2 as well as monocyte-derived cells and resident macrophages. All cells expressing a Csf1r-EGFP reporter in the kidney were depleted by treatment with anti-CSF1R antibody[30]. This suggests that despite their expression of FLT3, renal cDC2 are CSF1R-dependent. Cluster 6 of the kidney analysis, including Itgam, was enriched in the selected CD11B+ populations from kidney but highly-expressed in all of the populations. This cluster includes all of the co-regulated immediate early genes identified in Cluster 41 in the extended MPS dataset above, suggesting that recent monocyte-derived cells may be more susceptible to activation during isolation.
The relationship between single cell and bulk RNA-seq data
The advent of scRNA-seq has been heralded as a revolution promising new approaches to classification of myeloid heterogeneity [35, 157, 158]. Single cell (sc) RNA-seq is intrinsically noisy, non-quantitative stochastic sampling of a subset of the most abundant mRNAs in individual cells ([159, 160]). Algorithms that support non-linear dimensional reduction (e.g. t-distributed stochastic neighbour embedding [t-SNE] or Uniform Manifold Approximation and Projection (UMAP)) [161] followed by some form of clustering are then used to join together groups of cells in which the members share detection of an arbitrarily-defined set of markers. There is an implicit assumption in this approach that defined cell types, with approximately identical transcriptomes, actually exist and that sampling noise can be overcome by pooling the transcripts detected in a sufficiently large number of cells to create meta-cells. Based upon scRNA-seq analysis of interstitial lung macrophages, Chakarov et al.[25] inferred the existence of a subpopulation that expressed LYVE1. They then generated bulk RNA-seq data from separated LYVE1hi and LYVE1lo subpopulations. Their data allow a critical comparison of the two approaches. For this purpose, the primary scRNA-seq data were downloaded, reanalysed and expressed as TPM using the Kallisto pseudoaligner. Table S5 contains these reprocessed data, alongside the bulk RNA-seq data for the lung macrophage subpopulations, with the level of expression ranked based upon the bulk RNA-seq data for the purified LYVE1hi interstitial macrophages.
Consistent with Zipf’s law, the power-law distribution of transcript abundance [162, 163], the top 200 expressed transcripts in the bulk RNA-seq data contribute around 50% of the total detected transcripts in the scRNA-seq data and it is clear that these are the only transcripts detected reliably (Table S5). The abundant transcripts from bulk RNA-seq that are also detected in scRNA-seq samples include many cell type-specific surface markers which explains the ability to use scRNA-seq to discover such markers. These abundant transcripts also include IEG such as Dusp1, Egr1, Fos, Ier2 and Junb, indicative of the activation that occurs during isolation as discussed above. The inducible cytokines including Ccl2, Tnf, Il1b, Il6 and Il10 were each detected in a subset of the cells. Of the most highly-expressed transcripts only a very small subset (e.g. Actb, Apoe, B2m, Ccl6, Cd74, Ctsb, Fth1, Ftl1, Lyz2) had non-zero values in all cells. The average expression of the top 500 transcripts in the single cells was similar to the bulk RNA-seq but the detected expression level varied over 4 orders of magnitude among individual cells. Fcgr1 and Mertk mRNAs, encoding markers used to purify the interstitial macrophages for scRNA-seq, were actually detected in only a small subset of the cells and were not correlated with each other. Both this study and a subsequent study [50] state that Mrc1 and Lyve1 expression is shared by overlapping populations of lung interstitial macrophages. That conclusion is not supported by the data. The separation of these two markers was evident from the separate study of lung interstitial macrophage populations [164] included in our analysis and has been discussed above. Even in the bulk RNA-seq data from lung interstitial macrophages, the expression of Mrc1 was only marginally-enriched in purified LYVE1hi cells relative to LYVE1lo cells (Table S1). Consistent with that conclusion, in the scRNA-seq data the two are not strictly correlated with each other, with Mrc1 being detected in many more cells than Lyve1 (Figure 8A) despite similar absolute levels of expression in the total RNA-seq data.
To identify whether any robust correlations actually exist in the scRNA-seq data, the top 500 expressed transcripts in the scRNA-seq samples were used for network analysis. The sample-to-sample network (r≥0.53) is shown in Figure 8B and the gene-to-gene network (r≥0.5) in Figure 8C. The cluster list and average expression profiles are provided in Table S6. One clear-cut finding is the co-expression of genes involved in APC activity (H2-Aa, H2-Ab1, H2-Eb1, Cd74 and Ctss; Cluster 13 of the scRNA-seq analysis), which were effectively present or absent in individual cells. Chakarov et al. [25] defined two subpopulations as LYVE1hi/MHCIIlo and LYVE1lo/MHCIIhi but only six of the scRNA-seq samples expressed Lyve1 and half of those also expressed MHC II genes (Figure 8A). This is consistent with the lack of any inverse correlation between Lyve1 and Cd74 in Figure 2B. Even at this low r value known highly-expressed markers segregated from each other. Lyve1 forms a cluster with Mgl2, Cd209 and Cd302 (Cluster 7 in the scRNA-seq analysis; Figure 8C). Adgre1 is in a co-expression cluster that includes Lyz2 and Msr1 (Cluster 4 of the scRNA-seq analysis), Csf1r is co-expressed with Mrc1 and Cd163 (Cluster 2) and Lgals3 with Retnla and Fcrls (Cluster 1). The co-regulation of MHC-related genes, and genes located in the same chromosomal region (e.g. C1qa, C1qb, C1qc; Cluster25 of the scRNA-seq analysis) as well as the relatively uniform detection of genes such as Actb (Figure 8A) suggest that a significant proportion of the all-or-nothing differences in expression between cells in the scRNA-seq data is real.
Implications of transcriptional network analysis for the utility of surface markers
As discussed in the previous section, scRNA-seq provides an ambiguous view of cellular heterogeneity. Non-linear dimensionality reduction algorithms can hide or emphasise diversity by grouping cells in which an overlapping set of markers is detected. The number of populations defined depends upon the parameters applied and different approaches do not always give the same answers [161]. In the lung interstitial population we reanalysed, the detected expression of transcripts encoding plasma membrane proteins was essentially all-or-nothing. That conclusion may be considered a reflection of the limitations of the technology, but it is actually supported by other evidence. Tan and Krasnow [165] defined subpopulations of interstitial lung macrophages based upon expression of F4/80, Mac-2 (Lgals3) and MHCII, and tracked the changes in their relative abundance during development. Interestingly, they did not detect LYVE1 on adult lung interstitial macrophages by IHC. Consistent with their data, in the scRNA-seq data most lung interstitial macrophages expressed high levels of either Adgre1 or Lgals3, but some expressed both or neither. Protein expression at a single cell level clearly does not vary to the same extent as mRNA since proteins have different rates of turnover and decay [166]. Markers such as F4/80 and CD11C, and transgenes based upon macrophage-enriched promoters such as Csf1r and Cx3cr1 do appear to label the large majority of MPS cells in most tissues. The disconnect between scRNA-seq and cell surface markers may partly reflect the nature of transcription. At the single cell level transcription occurs in pulses interspersed by periods of inactivity and mRNA decay, which can manifest as random monoallelic transcript expression [167]. If gene expression is genuinely probabilistic at the level of individual loci [166] the assumption of transcriptomic homogeneity in definable cell types upon which scRNA-seq analysis is based is clearly invalid. The number of macrophage subpopulations that can be defined in any scRNA-seq dataset becomes a matter of choice and model. As an extreme example, one recent scRNA-seq study identified 25 distinct myeloid cell differentiation “states” in a mouse lung cancer model [168].
The RNA-seq data included as representative of cDC subsets [133] was from cells purified using CD64 as a marker to exclude macrophages. Despite this choice, an unbiased assessment of the sample-to-sample matrix in Figure 3B (based on all genes) and Figure 6 (based on transcription factor genes) would class all of these DC as part of the same family as macrophages. The use of CD64 as a definitive marker distinguishing macrophages from DC was criticized when it was proposed [42] and it remains untenable. It is actually a curious choice as a marker to define a cell as a macrophage since the protein FCGR1 (CD64) has been implicated functionally in APC activity [169]. From our analysis, the clear separation of DC from all other members of the MPS based upon APC function, surface markers, transcription factors or ontogeny [20] remains problematic. The one criterion that remains is location. We suggest that the only defensible definition of a cDC is a mononuclear phagocyte that is adapted specifically to the T cell areas of secondary lymphoid organs, responding to a specific growth factor (FLT3L) and the chemokine receptor CCR7. The kidney data [111] suggest that there is also tissue-specific adaptation of “cDC2” which may remain more “macrophage-like”. In that sense cDC1 and cDC2 are no more unique than a peritoneal macrophage adapting to signals from retinoic acid via induction of Gata6.
A critical view of markers of macrophage polarization states
The concept of M1/M2 polarization derives from analysis of classical and alternative activation of recruited monocytes by Th1 (γIFN) and Th2 (IL4/IL13) cytokines [74]. Previous meta-analysis indicated that proposed M2 markers defined by others [74] correlate poorly with each other in isolated inflammatory macrophages and are not conserved across species [28]. The M1/M2 concept was also challenged in a recent comparative analysis of in vitro and in vivo data on macrophage gene expression [170] which concluded that “valid in vivo M1/M2 surface markers remain to be discovered”. We would suggest that they do not exist. The pro-inflammatory cytokines that we identified as inducible in all the resident MPS cells during isolation would be considered indicators of M1-like activation. Aside from proposed M2 markers already mentioned that each have idiosyncratic expression (Mrc1, Retnla, Igf1, Mgl2), Chil3 (aka Ym1) was highly-expressed in lung macrophages (Cluster 10 of the whole dataset analysis; Table S2), Arg1 and Alox15 were restricted to peritoneal macrophages (Cluster 21) and Cd163 was part of a small cluster of 4 transcripts (Cluster 312). The cluster analysis indicates that the detection of M2 markers on resident tissue macrophages has little predictive value. Detection of CD206 cannot imply that the cells have been stimulated with IL4/IL13, nor that they share any functions with alternatively-activated recruited monocytes. Nevertheless, IL4/IL13/STAT6 signalling could contribute to resident MPS cell differentiation. The IL13 receptor (Il13ra) is part of the generic MPS Cluster 1 and Il4ra is also highly and widely expressed. IL4 administration to mice can drive resident tissue macrophage proliferation beyond levels controlled homeostatically by CSF1[171].
How do transcriptional networks contribute to understanding macrophage heterogeneity in situ?
An important question that arises from the transcriptomic analysis of subpopulations of resident MPS cells is precisely where are they located and how do they relate to each other? A significant concern with analysis of the cells isolated by tissue digestion and analysed here is whether recovered cells are representative of the tissue populations. Analysis of the lung has suggested that interstitial macrophages are a minority of the lung macrophage population [164]. That conclusion is not compatible with our own studies visualising Csf1r reporter genes, where the stellate interstitial populations are at least as abundant as alveolar macrophages [48, 49]. The description of subpopulations is not often linked to precise location with the tissue. One exception is the apparent location of LYVE1hi macrophages with capillaries in the lung [25]. On the other hand, it is unclear where the putative long-lived CD4+, TIM4+ population in the gut [34] is located. In broad overview, macrophages in every organ, detected with Csf1r reporter transgenes that are expressed in all myeloid cells including DC, have a remarkably regular and uniform distribution. The concept of a macrophage territory [5] or a niche [172, 173] has been proposed. But despite their apparent homogeneity in location and morphology, multicolour localisation of macrophage surface markers suggests that they are almost infinitely heterogeneous (reviewed in [23]). Most of the datasets analysed here suggest that monocytes and macrophages in each organ are a differentiation series. We take the view that macrophages in tissues have a defined half-life such that some cells survive by chance and continue to change their gene expression [5]. Each macrophage that occupies a new territory, either following infiltration as a monocyte or self-renewal by cell division, starts a life history that involves changes in gene expression and surface markers with time. In that view, MPS subpopulations are no more than arbitrary windows within a temporal profile of adaptation.
Conclusion
The transcriptional network analysis confirms that using our unique approach to down-sizing and a common quantification pathway, the RNA-seq data from different laboratories can be merged to provide novel insights. The network analysis indicates the power of large datasets to detect sets of co-regulated transcripts that define metastable states of MPS adaptation and function. The merged dataset we have created provides a resource for the study of MPS biology that extends and complements resources such as ImmGen (http://www.immgen.org). It can be readily expanded to include any new RNA-seq data for comparative analysis. Clusters of transcripts that are robustly correlated give clear indications of shared functions and transcriptional regulation. However, our analysis also reveals two important artefacts in the study of isolated tissue macrophages, the clear evidence of inflammatory activation during isolation and the extensive contamination of isolated preparations with transcripts derived from other cell types.
A discussion review of MPS heterogeneity in 2010 [174] suggested that in order for the field of immunology to advance and communicate “all cells have to be called something”. This Linnaean view continues to drive efforts to classify MPS cells into subsets based upon markers. The analysis we have presented shows that surface markers are poorly associated with each other and have very limited predictive value. Aside from MHCII, there are no markers that can be correlated with predicted APC activity. Resident tissue MPS cells, including cells that are currently defined as DC, belong to a closely-related family of cells in which the transcriptomic similarities are much greater than the differences. The cumulative function of the population of MPS cells acting together within each tissue is likely to be more important to homeostasis and immunity than the individual heterogeneity.
ACKNOWLEDGEMENTS
We would like to thank Dr Barbara Schraml, Ludwig-Maximilians-University of Munich, for providing access to data not yet available in the public domain. Mater Research Institute-University of Queensland is grateful for support from the Mater Foundation, Brisbane, Australia. The Translational Research Institute receives funding from the Australian Government.
Footnotes
Minor text changes in response to feedback from colleagues. Format for journal submission
References
- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵
- 163.↵
- 164.↵
- 165.↵
- 166.↵
- 167.↵
- 168.↵
- 169.↵
- 170.↵
- 171.↵
- 172.↵
- 173.↵
- 174.↵
- 175.
- 176.
- 177.
- 178.
- 179.
- 180.
- 181.
- 182.