Abstract
Background Genome wide association studies (GWAS) have contributed significantly to the field of complex disease genetics. However, GWAS only report signals associated with a given trait and do not necessarily identify the precise location of culprit genes. As most association signals occur in non-coding regions of the genome, it is often challenging to assign genomic variants to the underlying causal mechanism(s). Topologically associating domains (TADs) are primarily cell-type independent genomic regions that define interactome boundaries and can aid in the designation of limits within which a GWAS locus most likely impacts gene function.
Results We describe and validate a computational method that uses the genic content of TADs to assign GWAS signals to likely causal genes. Our method, called “TAD_Pathways”, performs a Gene Ontology (GO) analysis over all genes that reside within the boundaries of all TADs corresponding to the GWAS signals for a given trait or disease. We applied our pipeline to the GWAS catalog entries associated with bone mineral density (BMD), identifying ‘Skeletal System Development’ (Benjamini-Hochberg adjusted p = 1.02x10−5) as the top ranked pathway. Often, the causal gene identified at a given locus was well known and/or the nearest gene to the sentinel SNP. In other cases, our method implicated a gene further away. Our molecular experiments describe a novel example: ACP2, implicated at the canonical ‘ARHGAP1’ locus. We found ACP2 to be an important regulator of osteoblast metabolism, whereas a causal role of ARHGAP1 was not supported.
Conclusions Our results demonstrate how basic principles of three-dimensional genome organization can help define biologically informed windows of signal association. We anticipate that incorporating TADs will aid in refining and improving the performance of a variety of algorithms that linearly interpret genomic content.
Background
Genome-wide association studies (GWAS) have been applied to over 300 different traits, leading to the discovery and subsequent validation of several important disease associations [1]. However, GWAS can only discover association signals in the data. Subsequent assignment of signal to causal genes has proven difficult due to these signals falling principally within noncoding genomic regions [2–4] and not necessarily implicating the nearest gene [5]. For example, a signal found within an intron for FTO, a well-studied gene previously thought to be important for obesity [6], has been shown to physically interact with and lead to the differential expression of two genes (IRX3 and IRX5) directly next to this gene, and not FTO itself [7–9]. Moreover, there is evidence suggesting a type 2 diabetes GWAS association previously implicating TCF7L2 [10] also influences the nearby ACSL5 gene [11]. It remains unclear how pervasive these kinds of associations are, but similar strategies are necessary in order for GWAS to better guide research and precision medicine [12].
Three-dimensional genomics has changed the way geneticists think about genome organization and its functional implications [13,14]. Genome-wide chromatin interaction maps have facilitated the development of several genome organization principles, including topologically associating domains (TADs) [15–18]. TADs are sub-architectural units of the overall genome organization that have consistent and functionally important genomic element distributions including an enrichment of housekeeping genes, insulator elements, and early replication timing regions at boundary regions [19–21]. TADs are largely consistent across different cell types and demonstrate synteny [22,23]. These observations can therefore allow the leveraging of TADs to set the bounds of where non-coding causal variants can most likely impact promoters, enhancers and genes in a tissue independent fashion [24,25]. Therefore, we sought to develop a method that integrates GWAS data with interactome boundaries to more accurately map signals to the mostly likely candidate gene(s).
We developed a computational approach, called “TAD_Pathways”, which is agnostic to gene locations relative to each GWAS signal within TADs. We scanned publically available GWAS data for given traits and used TAD boundaries to output lists of genes likely to be causal. We demonstrate this approach by assessing the influence of GWAS signals on bone mineral density (BMD) [26–29]. This trait is clinically of great importance as low BMD is an important precursor to osteoporosis, a disease condition affecting millions of patients annually [30]. We also chose BMD as a trait for analysis because BMD GWAS primarily points to very well-known genes involved in bone development (positive controls) but there remain a number of established loci where no obvious gene resides, therefore offering the opportunity to uncover novel biology. After applying our TAD_Pathways discovery approach, we investigated putative causal genes using cell culture-based assays, identifying ACP2 as a novel regulator of osteoblast metabolism.
Results
The Genomic Landscape of SNPs across Topologically Associating Domains
We observed a consistent and non-random distribution of SNPs across TADs derived from human embryonic stem cells (hESC), human fibroblasts (IMR90), mouse embryonic stem cells (mESC), and mouse cortex cells (mcortex) cells. As expected, SNPs are tightly associated with TAD length for each cell type, but there are substantial outlier TADs (Figure 1). For example, the TAD harboring the largest number of common SNPs (minor allele frequency (MAF) greater than 0.05) in hESC is located on chromosome 6 (UCSC hg19: chr6:31492021-32932022) and has 19,431 SNPs. Not surprisingly, this TAD harbors an abundance of genes including HLA genes, which are well known to have many polymorphic sites [31]. However, the other human cell line (IMR90) outlier TAD is located on chromosome 8 (UCSC hg19: chr8:2132593-6252592) and has 27,220 SNPs and could be potentially biologically meaningful. Indeed, although this TAD harbors relatively few genes, it does include CSMD1, a gene implicated in cancer and neurological disorders such as epilepsy and schizophrenia [32,33].
Common SNPs were enriched near the center of TADs (Figure 1A). This is the opposite of gene (Supplementary Figure S1) and repeat element (Supplementary Figure S2) distributions (also see Dixon et al. 2012) [22]. The repeat element distribution was driven largely by the SINE/Alu repeat distribution, which could not be explained by GC content and estimated evolutionary divergence (Supplementary Figure S3). We also observed that common SNPs are significantly enriched in the 3’ half of TADs in hESC and mcortex cells (Supplementary Table S1). There was also a slight increase in GWAS implicated SNPs near hESC TAD boundaries (Figure 1B). Given the non-random patterns observed across the TADs, we went on to explore the gene content further in an attempt to imply causality at given GWAS loci.
TAD_Pathways reveals potentially causal genes within phenotype-associated TADs
Seeking to leverage TADs and disease associated SNPs, we integrated GWAS and TAD domain boundaries in an effort to assign GWAS signals to causal genes. Alternative approaches to understand the gene landscape of a locus that do not consider TAD boundaries typically either assign genes to a GWAS signal based on nearest gene [34] or by an arbitrary or a linkage disequilibrium-based window of several kilobases [35,36] (Figure 2A). Instead, we used TAD boundaries and the full catalog of GWAS findings for a given trait or disease to assign genes to GWAS variants based on overrepresentation in a gene set in an approach we termed “TAD_Pathways”. For a given trait or disease, we collected all genes that are located in TADs harboring significant GWAS signals. We then applied a statistical enrichment analysis for biological pathways using this TAD gene set and assign candidate genes within a TAD based on the pathways significantly associated with a phenotype (Figure 2B). In our implementation, we used GO biological processes, GO cellular components, and GO molecular functions to provide the pathway sets [37]. We included both experimentally confirmed and computationally inferred GO gene annotations, which permit the inclusion of putative casual genes that do not necessarily have literature support but are predicted by a variety of computational methods.
To validate our approach, we applied TAD_Pathways to bone mineral density (BMD) GWAS results derived from replication-requiring journals [26–29]. Our method implicated ‘Skeletal System Development’ (Benjamini-Hochberg adjusted p = 1.02x10-5) as the top ranked pathway. We provide full TAD_Pathways results for BMD in Supplementary Table S2. Despite a high content of presumably non-causal genes, which we expect would contribute noise to the overrepresentation analysis [38], our method demonstrated enrichment of a skeletal system related pathway and selected a subset of potentially causal genes belonging to the same pathway. Many of these genes (24/38) were not the nearest gene to the GWAS signal and several also had independent expression quantitative trail loci (eQTL) support (Supplementary Table S3, Supplementary Figure S4).
siRNA Knockdown of TAD Pathway Gene Predictions in Osteoblast Cells
The loci rs7932354 (cytoband: 11p11.2) and rs11602954 (cytoband: 11p15.5) are currently assigned to ARHGAP1 and BETL1 but our method implicated ACP2 and DEAF1, respectively. The two genes implicated by TAD_Pathways, ACP2 and DEAF1, lacked eQTL support and were not the nearest gene to the BMD GWAS signal. We tested the gene expression activity and metabolic importance of these four genes, ARHGAP1, BETL1, DEAF1, and ACP2. Specifically, our assays in a human fetal osteoblast cell line (hFOB) evaluate whether or not the TAD_Pathways method identifies causal genes at GWAS signals beyond those captured by closest and eQTL connected genes. Though these two genes were annotated to the identified GO process, the annotation had been made computationally and their known biology did not provide obvious links to bone biology.
We targeted the expression of all four of these genes in vitro using small interfering RNA (siRNA), assessing knockdown efficiency at the mRNA level relative to untreated controls and determined corresponding p values relative to scrambled siRNA controls. We used an siRNA targeting tissue-nonspecific alkaline phosphatase (TNAP) as a positive control. Knockdown efficiencies were: TNAP siRNA 48.7±9.9% (p=0.141), ARHGAP1 siRNA 68.7±14.3% (p=0.015), ACP2 siRNA 48.9±6.4% (p=0.035), BET1L 56.4±1.0% (N.S.) and DEAF1 52.7±9.2% (p=0.021) (Figure 3). siRNA targeted against each gene of interest did not down-regulate the expression of the other genes under investigation, indicating specificity of knockdown, although we noted that TNAP siRNA did reduce DEAF1 gene expression, though this did not reach the threshold for statistical significance (p= 0.077).
We noted significant variation across the three controls, with the scrambled siRNA control altering expression of OCN (osteocalcin), IBSP (bone sialoprotein), TNAP and BET1L (p < 0.05). Relative to the scrambled siRNA control, OCN was downregulated in all siRNA groups (p < 0.05) except for BET1L siRNA (p = 0.122). OSX, IBSP and TNAP were not significantly altered by any siRNA treatment (Figure 3).
Metabolic Activity of TAD Pathway Gene Predictions
Use of ACP2 siRNA led to a 66.0% reduction in MTT metabolic activity versus the scrambled siRNA control (p = 0.012). ARHGAP1 siRNA caused a 38.8% reduction, which fell short of statistical significance (p = 0.088). siRNA targeted against TNAP, BET1L or DEAF1 did not alter MTT metabolic activity (Figure 4A).
Influence of TAD_Pathways Gene Predictions on Alkaline Phosphatase Activity
Alkaline phosphatase (ALP) is highly expressed in osteoblasts; disruption of proliferation or osteoblast differentiation would result in downregulation of ALP. Treatment with siRNA resulted in changes in ALP staining that we analyzed further by quantitation. TNAP siRNA significantly reduced ALP by 5.98±1.77 versus the scrambled siRNA control (p = 0.006). ACP2 siRNA also significantly reduced ALP intensity by 8.74±2.11 versus the scrambled siRNA control (p = 0.003). The scrambled siRNA group stained less intensely than untreated or transfection reagent control wells, but this did not reach statistical significance (0.05 < p < 0.10) (Figure 4B).
Discussion
We observed a nonrandom enrichment of SNPs in the center of TADs that was consistent across different cell types, but was in the opposite direction of the gene and repeat elements distributions. It is possible that the gene distribution is driving this phenomenon, since coding regions are under higher evolutionary constraint and are thus more averse to SNPs [39]. Nevertheless, GWAS SNPs also appeared to be distributed closely to boundary regions in hESC cells. This may support GWAS causally implicating nearest genes more frequently since genes are also distributed near boundaries. The observation may also suggest that polymorphism in regions near TAD boundaries are more important drivers of disease risk associations than polymorphism in the center of TADs. However, we do not observe this pattern in IMR90 cells. The SNP distribution was also opposite of the SINE/Alu repeat distribution. Given that Alu elements tend to insert into GC-rich regions [40], we tracked GC content across TADs and observed only a slight increase in GC content near TAD boundaries. There was also a slightly inverse distribution of Alu evolutionary divergence [40]. Our results suggest that the Alu distribution is primarily driven by intronic clustering [41] rather than GC-biased insertion or evolutionary divergence. Recently, retrotransposons have been shown to act as genomic insulators [42], while Alu repeats have been shown to be correlated with functional elements [43]. However, the relationships between polymorphic sites, repeat elements, and genes across TADs and higher level genome organization have yet to be explored in detail and warrants further investigation.
TAD boundaries offer a unique computational opportunity to use biologically informed windows to predefine areas of the genome that are more likely to interact with themselves. We showed, as a proof of concept, that TADs can reveal functional GWAS variant to gene relationships using BMD. Several of the TAD_Pathways implicated genes, including LRP5 and other Wnt signaling genes, are bona fide BMD genes already identified by nearest gene GWAS, eQTL analyses and human clinical syndromes [44,45], thus providing positive controls for our approach. However, several BMD GWAS signals do not have obvious nearest gene associations, which allowed us to validate our approach with two candidate causal, non-nearest gene predictions: ACP2 and DEAF1. Both genes also did not have eQTL associations, but this is likely a result of the eQTL browser lacking bone tissue.
To assess the validity of our predictions, we experimentally knocked down ACP2 and DEAF1 in hFOB cells. siRNA for ACP2 and DEAF1 did not significantly alter expression of the osteoblast marker genes OSX, IBSP or TNAP. OCN was downregulated in each of the experimental groups relative to the scrambled siRNA control, but comparison with the reagent control indicated no significant difference in any group, suggesting an off-target effect on OCN in the scrambled siRNA group. Because osteoblast differentiation genes were not downregulated following knockdown of the genes of interest, we concluded that these genes do not directly regulate the transcriptional processes of osteoblast differentiation in vitro. The decrease in DEAF1 expression following TNAP siRNA treatment, though not statistically significant, suggests that DEAF1 may function downstream of TNAP.
There was a pronounced and statistically significant reduction in metabolic activity in hFOB cells treated with ACP2 siRNA. This result carried through to the ALP assay, in which staining intensity and ALP+ area fraction were dramatically reduced in only the TNAP siRNA and ACP2 siRNA groups. The combination of these results with the gene expression data suggests that ACP2 regulates early osteoblast proliferation/viability, but does not directly regulate osteoblast differentiation.
We provide evidence that our approach can steer researchers from GWAS signals toward genes relevant to the pathogenesis of the given trait. Furthermore, because our method treats all genes in implicated TADs equally, functional classification extends to the identification of single variant pleiotropic events; as was the case with an intronic FTO variant impacting both IRX3 and IRX5 [8].
Despite the advantages presented by TAD_Pathways, the method has a number of limitations. Currently, our method will not overcome the possibility of a gene being inappropriately included in a pathway that it does not actually contribute to, plus all other propagated errors related to pathway curation and analyses [46]. Network based methods built on gene-gene interaction data also suffer from similar biases [47], but potentially to a lesser extent than curated pathways. We include both curated and computationally predicted GO annotations to ameliorate this bias. The computational predictions provide additional support that these genes may be important disease associated genes that we would have missed using only experimentally validated pathway genes. We are also unable to implicate a gene to a trait if it is not assigned to a curated or predicted pathway, or if it does not fall within a TAD corresponding to a GWAS signal. It is also likely that our approach will not work well with every GWAS. Indeed we are implicating causality to given genes - we are not making a direct connection between the gene and the given variant. Furthermore, our method does not include the possibility of finding genes associated with a disease that is impacted by alternative looping, which has been observed to occur in cancer [48,49] and sickle cell anemia [50]. As research on 3D genome organization increases, it is likely that more diseases will include chromosome looping deficiencies as part of their etiology. Additionally, we used TAD boundaries defined by Dixon et al. 2012. A more recent Hi-C analysis at increased resolution substantially reduced the estimated average size of TADs [51]. Nevertheless, there remains disagreement about how TADs are defined [52]. Despite our method using larger TAD boundaries, thus promoting the inclusion of more presumably false positive genes, we retain the ability to identify biologically logical pathways. The larger boundaries permit us to screen a larger number of candidate genes but makes the method analytically conservative by increasing the pathway overrepresentation signal required to surpass the adjusted significance threshold.
The validation screen is also limited: it was performed in a simplified in vitro cell culture system lacking organismal complexity, and the cell line selected is largely tetraploid which may partially compensate for gene knockdown. This is particularly true for the lack of reduction in TNAP gene expression in the TNAP siRNA group, in light of the historical selection of the hFOB cell line based on robust ALP staining in culture [53]. As well, while the TAD approach identifies potentially several GWAS associated genes, herein we only examined two genes per TAD – one immediately adjacent to the GWAS SNP and another that we postulated could play a role in the skeleton. Further work would need to systematically examine the relative importance of each gene in a TAD.
Other recent mechanisms and algorithms used to assign causality from association signals or enhancers to genes typically leverage multiple data types including expression [54] or epigenetic features [24]. For example, TargetFinder uses several high throughput genomic marks to identify features predictive of a chromosome physically looping together enhancers and promoters [24]. Looping occurs at sub-TAD level resolutions [55] and sub-TADs are variable across cell types. Therefore, in order for a chromosome looping signature to generalize to GWAS signals, a user must assay tissue-specific and high resolution Hi-C to identify more specific interactions. Alternatively, one could also query variants that affect gene expression in high-throughput and systematically match signals to gene expression [56]. A major limitation to these approaches is that several diseases do not yet have a known tissue source or involve multiple tissues. This is particularly true for osteoporosis whereby multiple cell types, as well as systemic factors, influence bone mass [57]. Therefore, identifying these specific signals may require the procedures to be repeated across competing tissue types. In contrast, a TAD_Pathways analysis is computationally cheap and uses publicly available TAD boundaries and GO terms as a guide for assigning genes to GWAS signals. The method is effective in a wide variety of settings and across tissues because TADs are consistent across cell types [25]. In summary, TAD_Pathways can be used to guide researchers toward the most likely causal gene implicated by a GWAS signal. We have also identified ACP2 as a gene involved in BMD determination, which warrants further investigation.
Conclusions
TADs offer a novel tool in the investigation of genome function. We present an approach, called TAD_Pathways, to leverage 3D genomics to prioritize and predict causal genes implicated by GWAS signal. At the foundation of our method is the principle that genomic regions within the same TAD more often interact with each other, and therefore, provide the genomic scaffolding that can impact gene function and gene regulation within each TAD. We applied our method to established BMD GWAS signals. By selecting two GWAS signals and two classes of genes for each signal (nearest gene and predicted gene by TAD_Pathways), we demonstrated that our approach can causally implicate genes kilobases away from their associated GWAS signal. We validated ACP2 (TAD prediction), but not ARHGAP1 (nearest gene), and show that ACP2 influences the proliferation and differentiation of osteoblast cells. We were unable to validate either DEAF1 or BET1L and conclude that neither impacts osteoblast gene expression nor metabolic activity. Whether these genes influence other aspects of skeletal biology cannot be determined within the scope of the current study. Future studies focused on BMD GWAS would explore both osteoblast and osteoclast associated changes. In conclusion, as more information and data is collected regarding 3D genome principles, we propose that algorithms that leverage dynamic 3D structure rather than static linear organization will more accurately predict and discover the basic genomic biology of diseases.
Methods
Data Integration
We used previously identified TAD boundaries for hESC, IMR90, mESC, and mcortex cells for all TAD based analyses [22,58]. To describe the genomic content of TADs, we extracted common SNPs (major allele frequency ≤ 0.95) from the 1000 Genomes Phase III data (2 May 2013 release) [59] and downloaded hg19 Gencode genes [60] and hg19 RepeatMasker repeat elements [61]. We downloaded hg19 FASTA files for all chromosomes as provided by the Genome Reference Consortium [62]. Furthermore, we downloaded the NHGRI-EBI GWAS catalog on 25 February 2016, which holds the significant findings of several GWAS’ for over 300 traits [1]. Since the GWAS catalog reports hg38 coordinates, we used the hg38 to hg19 UCSC chain file [63] and PyLiftover [64] to convert genome build coordinates to hg19. We assessed relevant expression quantitative trait loci (eQTLs) using all tissues in the NCBI GTEx eQTL Browser [65].
TAD_Pathways
Our TAD_Pathways method is a light-weight approach that uses TAD boundary regions, rather than distance explicitly, to identify putative causal genes. We first build a comprehensive TAD based gene list that consists of all genes that fall inside TADs that are implicated with a GWAS signal (see Figure 2). This gene list assumes that all genes within each signal TAD have an equal likelihood of functional impact on the trait or disease of interest. We then input the TAD based gene list into a WebGestalt overrepresentation analysis [66]. WebGestalt is a webapp that facilitates a pathway analysis interface allowing for quick and custom gene set based analyses. We perform a pathway overrepresentation test for the input TAD based genes against GO biological process, molecular function, and cellular component terms with a background of the human genome. Specifically, this tests if the input gene set is associated with any particular GO term at a higher probability than by chance compared to background genes. We include both experimentally validated and computationally inferred genes in each GO term, which allows the method to discover associations for genes that lack literature support. We consider genes that are annotated to the most significantly enriched GO term to be the associated set [65].
Cell culture and siRNA transfection
A human fetal osteoblast cell line (ATCC hFOB 1.19 CRL-11372) was obtained and subcultured twice at 34°C, 5% CO2 and 95% relative humidity in 1:1 DMEM/F12 with 2.5mM L-glutamine without phenol red (Gibco 21041025) supplemented with 10% FBS (Atlas USDA F0500D) and 0.3mg/mL G418 sulfate (Gibco 10131035). All experiments were conducted in three temporally separated independent technical replicates from cryopreserved P2 aliquots of these cells. 48 hours prior to transfection, media was switched to a G418-free formulation. Transfections were conducted in single-cell suspension using a commercial siRNA reagent system (Santa Cruz sc-45064) according to the manufacturer’s instructions, with 6μL of siRNA duplex and 4μL transfection reagent in 750μL of transfection media per 100,000 cells. Following trypsinization, cells were counted and divided into one of 8 experimental groups: 1) untreated control, 2) siRNA-negative transfection reagent control, 3) scrambled control siRNA (sc-37007), 4) TNAP siRNA (sc-38921), 5) ARHGAP1 siRNA (sc-96477), 6) ACP2 siRNA (sc-96327), 7) BET1L siRNA (sc-97007) or 8) Suppressin (DEAF1) siRNA (sc-76613). Samples for RNA isolation were generated by plating 200,000 cells per well in tissue culture treated 6-well plates (Falcon 353046). Samples for MTT assay and alkaline phosphatase (ALP) staining were generated by plating 50,000 cells per well in tissue culture treated 24-well plates (Falcon 353047). Cells were then switched to a 37°C incubator and transfected for 6 hours, at which point transfection cocktails were diluted with 2x hFOB media concentrate. Media was completely changed to 1x standard hFOB media 16 hours later.
Quantitative PCR
RNA was harvested at Day 4 using TRIzol (Ambion) followed by acid guanidinium thiocyanate-phenol-chloroform extraction and RNeasy (Qiagen) spin-column purification with DNase (Qiagen) then reverse-transcribed using a high-capacity RNA-to-cDNA kit (Applied Biosystems). Duplicate qPCR reactions were conducted on 20ng of whole-RNA template using SYBR Select master mix (Applied Biosystems) in an Applied Biosystems 7500 Fast Real-Time PCR System. Primers for GAPDH, OSX, OCN and IBSP were adopted from the literature and synthesized by R&D Systems. New primer sets spanning exon-exon junctions were designed for ARHGAP1, ACP2, BET1L and DEAF1 using NCBI Primer Blast, verified by melt curve analysis and agarose gel electrophoresis. Sequences follow: ARHGAP1 F-GCGGAAATGGTTGGGGATAG R-CCTTAAGAGAAACCGCGCTC (127bp), ACP2 F-AGCGGGTTCCAGCTTGTTT R-TGGCGGTACAGCAAGGTAAC (165bp), BET1L F-GGATGGCATGGACTCGGATT R-TCCTCTGGAGCCCAAAACAC (254bp), DEAF1 F-GGAAGGAGCAGTCCTGCGTT R-TCACCTTCTCCATCACGCTTT (195bp). Results were analyzed using the 2-ddCt method using GAPDH as a housekeeping gene and reported as (mean ± standard deviation) fold-change versus untreated controls. Statistical significance was determined using 2-way homoscedastic Student’s t-tests versus the scrambled siRNA control, annotated using *p≤0.05 and #p≤0.10. The acronym N.S. stands for “not significant”.
MTT metabolic assay
Cellular metabolism/proliferation was assessed using a commercial cell growth determination kit (Sigma CGD1). At 1 or 4 days post-transfection, media in assigned 24-well plates was switched to 450μL hFOB media plus 50μL MTT solution. Cells were incubated at 37°C for 3.5 hours, after which the media was aspirated and the resulting formazan crystals were solubilized in MTT solvent. Plates were shaken and read in a BioTek Synergy H1 microplate reader. Results are reported as the difference in mean absorbance at 570nm-690nm from Day 1 to Day 4. Error bars represent root-mean-square standard deviation from the measurements at both days. Statistical significance was determined using 2-way homoscedastic Student’s t-tests versus the scrambled siRNA control, annotated using *p≤0.05 and #p≤0.10.
Alkaline phosphatase (ALP) staining
Plates were stained at 4 days post-transfection using a commercial ALP kit (Sigma 86C). Dried plates were then imaged at 600dpi using an Epson V370 photo scanner and staining was quantified using mean whole-well intensity measurements in ImageJ using raw output files. ALP area fraction was calculated using color thresholding. Values are reported as mean ± standard deviation. Statistical significance was determined using 2-way homoscedastic Student’s t-tests versus the scrambled siRNA control, annotated using *p≤0.05 and #p≤0.10. Images of plates presented as figures were edited using a warming filter and for brightness/contrast using Adobe Photoshop CS6.
Computational Reproducibility
We provide all of our source code under a permissive open source license and encourage others to modify and build upon our work [67]. Additionally, we provide an accompanying docker image [68] to replicate our computational environment (https://hub.docker.com/r/gregway/tad_pathways/) [69].
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and material
All data used to construct the TAD_Pathways approach are publically available datasets. We make all software used to develop this approach publically available in a GitHub repository (http://github.com/greenelab/tad_pathways). We also provide a docker image (https://hub.docker.com/r/gregway/tad_pathways/) and archive the GitHub software on Zenodo (https://zenodo.org/record/163950).
Competing interests
The authors declare no competing interests.
Funding
This work was supported by the Genomics and Computational Biology Graduate program at The University of Pennsylvania (to G.P.W.); the Gordon and Betty Moore Foundation’s Data Driven Discovery Initiative (grant number GBMF 4552 to C.S.G); the National Institute of Dental & Craniofacial Research (grant number NIH F32DE026346 to D.W.Y.); S.F.A.G is supported by the Daniel B. Burke Endowed Chair for Diabetes Research.
Authors’ contributions
GPW wrote the software, analyzed the data, developed the method and wrote the manuscript; DWY performed the experimental validation and wrote the manuscript; KDH performed the experimental validation and wrote the manuscript; CSG analyzed the data, developed the method and wrote the manuscript; SFAG analyzed the data, developed the method and wrote the manuscript.
Authors’ information (optional)
Tables
We provide Supplementary Tables S2 and S3 as attached .xls files.
Abbreviations
- TAD
- Topologically associating domain
- GWAS
- Genome wide association study
- SNP
- Single nucleotide polymorphism
- BMD
- Bone mineral density
- hESC
- Human embryonic stem cells
- mESC
- Mouse embryonic stem cells
- IMR90
- Human fibroblast cells
- mcortex
- Mouse cortex cells
- siRNA
- Small interfering RNA
- eQTL
- Expression quantitative trail loci
- hFOB
- Human fetal osteoblast
- TNAP
- tissue-nonspecific alkaline phosphatase
- OCN
- osteocalcin
- IBSP
- bone sialoprotein
- ALP
- Alkaline phosphatase
- GO
- Gene ontology
- eQTL
- Expression quantitative trait loci
Acknowledgements
Hannah E. Sexton and Troy L. Mitchell assisted in optimizing siRNA transfection conditions. Daniel Himmelstein and Amy Campbell performed analytical code review.
Footnotes
These authors directed this work jointly
Author Emails: GPW: gregway{at}mail.med.upenn.edu; DWY: dwy{at}msu.edu; KDH: kdhank{at}msu.edu; 22 CSG: csgreene{at}mail.med.upenn.edu; *SFAG: grants{at}email.chop.edu