Abstract
Gestation is a crucial timepoint in human development. Deviation from a term gestational age correlates with both acute and long-term adverse health effects for the child. Both being born pre and post-term, i.e. having short and long gestational ages, are heritable and influenced by the pre- and perinatal environment. Despite the obvious heritable component, specific genetic influences underlying differences in gestational age are poorly understood. Here we identify one globally significant intronic genetic variant within the ADAMTS13 gene that is associated with prolonged gestation in 9,141 white European individuals from the 1966 and 1986 Northern Finland birth cohorts. Additional variants that reached suggestive levels of significance were identified within introns at the TKT, and ARGHAP42 genes, and in the upstream (5’) intergenic regions of the B3GALT5 and SSBP2 genes. The variants near the ADAMTS13, B3GALT5, SSBP2 and TKT loci are linked to alterations in gene expression levels (cis-eQTLs). Luciferase assays confirmed the allele specific enhancer activity for the BGALT5 and TKT loci. Our findings provide the first evidence of a specific genetic influence associated with prolonged gestation.
Introduction
Gestation is a crucial period of human development. Being born too early (preterm, < 37 weeks gestation) or too late (post-term, ≥42 weeks gestation) can have significant acute and long-term health consequences [1]. While preterm birth has received substantial attention[2–4], post-term birth has been scantily explored despite approximately 3-5% of all births each year being post-term[5]. Prolonged gestation poses a unique set of acute and long-term adverse health outcomes, including an increased need for intervention during labour and risk factors for truncal obesity, insulin resistance, altered lipids and elevated blood pressure [6, 7]. Thus, there is a vital need to understand the role of genetic variants on post-term birth.
The acute health risks, for both the mother and the child, of being born post-term are well documented (for review see [8]). Consequently, induction of birth at or before 41 weeks gestation is recommended in order to reduce the acute risks associated with post-term birth[5, 8]. As a result, approximately 25% of the routine inductions in Australia in 2010 were primarily performed to prevent prolonged pregnancy. However, the rules regarding the decision to induce labour are not consistently applied across different hospitals, reflecting the influence of opinions of individual practitioners and differing staff routines[9]. Despite this, induction remains an excellent intervention, and its application has reduced post-term births from approximately 20% of all births in the 1960’s[10] to the modern-day rate of under 5%[5, 11]. However, there is a possibility that the long-term risks associated with post-term birth are a part of the genetically-informed trajectory for induced individuals. In this case, induction would not change this aspect of the biology of post-term individuals. Family and twin studies attribute 25-40% of the variation in gestational age to genetic factors [12–19] with fetal (26%) and maternal (21%) factors each explaining nearly half of this variation [18]. Thus, there is a large population of individuals who have “post-term potential”[20] and possibly face the long-term health risks of post-term birth without actually having been born post-term (due to pregnancy ending from obstetric management).
Evidence supports the hypothesis that it is the fetus that determines the timing of labor rather than the mother. Therefore, we have investigated the heritability and the genetic architecture of gestational age in 9,141 Northern Finnish (white European) individuals (1,167 post-term) across two birth cohorts (Northern Finland Birth Cohort [NFBC] 1966 and NFBC1986). Here we identify intronic genetic variants within the TKT, ARGHAP42, and ADAMTS13 genes and intergenic upstream (5’) of the B3GALT5 and SSBP2 genes that are associated with prolonged gestation.
Results
Discovery Phase
NFBC1966 Variants Associated with post-term birth
Analysis of the NFBC1966 post-term cohort identified six GWAS peaks that were suggestive of global significance (p < 1x10−5, Figure 1a and 2a). Two clusters of variants were located within introns in the B3GALT5 (lead SNP rs1534080) and DNHD1 (rs12285957) genes. An additional four clusters of variants were located within intergenic regions on chromosomes 10 (2 regions), 12, and 15 (Table 1).
NFBC1986 Variants Associated with post-term birth
Analysis of the NFBC1986 post-term cohort identified twenty-five significant GWAS peaks (p < 1x10−5, Table 1, Figure 1b and 2b). The lead SNPs for fourteen of these GWAS peaks were intronic: AC079779.5 (rs72774524), DCDC2C (rs12612077), TKT (rs4687715), DTWD2 (rs17440178), ESR1 (rs117533178), KCNB2 (rs79648768), ADAMTS13 (rs655911), ASAH2 (rs75320537), FAT3 (rs7950344), ARHGAP42 (rs78598508), AN04 (rs11609845), C14orf37/PSMA-AS1 (rs78874632), CTD-2277K2.1 (rsl91706929), and RIN3 (rs6575274). Only the SNP at the ADAMTS13 locus was globally significant (p < 5x10−8).
Eleven intergenic loci were also associated with gestational age in the NFBC1986 cohort. These intergenic loci were located ≤116kbp from a coding exon (gene): LRPPRC (rs62135521, 73 kb upstream), HMX1 (rs145023824, 75 kb upstream), ZFR (rs66858738, 7.7 kb upstream), SSBP2 (rs2135, 31 kb upstream), GRIK2 (rs183770336, 724 kb downstream), RP11-465K16.1 (rs7013779, 40 kb upstream), RP11-289F5.1 (rs10780480, 116 kb upstream), AL671972.1 (rs10995050, 7.2 kb downstream), RP11-644L4.1 (rs72965926, 5.3 kb downstream), KCNA5 (rs2239507, 2 kb upstream), and 7SK (rs11610162, 11 kb upstream)
Of the list of genes located close to the loci that were significantly associated with post-term birth in the 1986 cohort, only ARHGAP42 has previously been associated with a developmental phenotype (age at menarche in a Japanese population, rs12800752)[21].
Validation of Results of the Discovery Phase
The six NFBC1966 loci and twenty-five NFBC1986 loci were tested for cross-validation (i.e. significance) in the opposite cohort. Of these loci, none were associated with gestational age at a p value of ≤1x10−5 in both cohorts. However, the B3GALT5, SSBP2, and TKT GWAS loci (hereafter referred to as post-term loci) were validated as significant in both the 1966 and 1986 cohorts (i.e. discovery p < 1x10−5 and validation p<0.05, Tables 1A and 1B). ADAMTS13 reached global significance (p < 5x10−8 [22]) in the NFBC1986 cohort, and was included in the post-term loci for further analyses (Table 1B).
The rs78598508 variant, which is located within ARHGAP42 and associated with gestational age in the 1986 cohort, was not measured in the 1966 cohort. Furthermore, rs78598508 was not in strong LD (> 0.9 r2) with any other variants. Therefore, rs78598508 could not be tested for cross-validation in this study (Table 1).
Identification of Spatial and Functional Connections to Post-Term Birth
Spatial Associations
The post-term associated SNPs we identified fall outside of known coding regions and, as such, there is no a priori reason to assume impact through direct disruption of protein function. Therefore, the effect of these SNPs can be better explained through a model of gene regulation that incorporates the hypothesis that long distance SNP-Gene interactions make a significant contribution to genetically determined phenotypes[23, 24]. Observations of long distance SNP-gene regulatory interactions in diabetes (variants in the FTO gene alter the expression of the IRX3 gene)[25], inflammatory bowel disease (variants strongly interact with an anti-inflammatory gene located 380 kb away, and not closer anti-inflammatory genes) [26], and growth[27] are consistent with this hypothesis.
We screened Haploreg v4.1 (1000 genomes haplotype data), to show that none of the lead SNPs in the B3GALT5, ADAMTS13, SSBP2, ARHGAP42, and TKT loci are in linkage disequilibrium (LD, r2 > 0.95 and D’ > 0.95) with any variants located within exons or critical transcriptional processing sequences (e.g. intronic branch sites, polyA signals, or transcription termination signals). Thus, there is no evidence that these SNPs directly impact on protein function through aberrant transcript processing. As such, we hypothesized that the post-term SNPs were affecting enhancer regions (i.e. short genomic regions that are bound by transcription factors) and altering the transcriptional regulation of distant genes.
Interactions between the lead SNPs at each post-term locus and distant genes were screened for using GWAS3D (Figure 3, Table 2). The B3GALT5 locus spatially connects to the Down Syndrome Cell Adhesion Molecule (DSCAM) locus. DSCAM is a member of the immunoglobulin superfamily of cell adhesion molecules (Ig-CAMs) that are involved in human central and peripheral nervous system development. The ADAMTS13 locus spatially connects to the SLC2A6 (9q34), COL5A1 (9q34.2-q34.3), and RABGAP1L (1q24) loci. This is notable as ADAMTS13, SLC2A6, COL5A1, and additional genes within 9q34 have been implicated in the coagulation process and associated with ovarian function[28]. The TKT locus spatially connects to an intergenic region in 21p11.2 which contains predicted open reading frames that encode undefined proteins.
Locus-Specific Associations with Gene Expression
Functional-regulatory roles for the SNPs we identified in this study were refined by testing the spatial SNP-gene pairs for significant eQTLs using the GTEx database (version 6). Variants in the ADAMTS13 (Tibial Nerve Tissue, p=1.50x10−8), B3GALT5 (Thyroid Tissue, p=9.0x10−5), and TKT (Left Ventricular Heart Tissue, p=2.0x10−5) loci associate with altered expression changes within these genes, confirming that these SNPs fall within loci that regulate their local gene landscape (Table 3). In addition, rs655911 (intronic, ADAMTS13) also showed eQTL associations with the expression of SLC2A6, reinforcing the significance of the spatial connection. The variant associated with the ARHGAP42 locus (rs78598508) showed no significant eQTLs.
The SNP rs2135, which was 44kb upstream of SSBP2, did not show any evidence of spatial connections to other regions. Moreover, there was no evidence of a cis-eQTL between this SNPs and the SSBP2 gene itself. A global survey of eQTL associations with the SSBP2 SNPs, within GTEx, did not identify any globally significant eQTLs (Supplemental Figure 1). Analyses indicated putative eQTLs between the SSBP2 lead SNP (rs2135; 5q14.1) and: HBG1 (11p15.5, Lung, 1.10x10−5); HLA-DRB5 (6p21.3, Whole Blood, 2.50x10−5); and FYB (5p13.1, Mucosa of the Esophagus, 4.00x10−5) (Supplemental Figure 1).
Locus-Specific Associations with Gene Expression
The lead SNPs that were proximal to B3GALT5 (rs11702173, rs560928), TKT (rs4687715), SSBP2 (rs2135), and ARHGAP42 (rs78598508) were screened for enhancer activity (Figure 4). ADAMTS13 (rs655911) was unable to be cloned and could not be tested. Cloning loci with alternate alleles allowed the measurement of the effect of genetic variation on the observed enhancer activity. Luciferase assays in Hela cells revealed a pronounced enhancer effect for the loci containing rs4687715, rs111702173, rs78598508, and rs560928 and a repressive effect for rs2135 (Figure 4). For rs4687715 and rs560928, the region shows a differential enhancer allelic effect. Therefore, for two of the five loci tested, the enhancer activity associated with these gestational age associated regions is sensitive to the identity of the haplotype at the SNP position. For the remaining three regions, the SNP tested did not have a measurable allelic effect in HeLa cells, but still showed significant enhancer/insulator capabilities.
Discussion
Previously, there has been indirect evidence of a genetic component to post-term birth, as children born post-term are more likely to have a sibling or mother born post-term[6]. This study is the first to identify specific genetic variants in proximity to the B3GALT5, ADAMTS13, SSBP2, and TKT genes as being associated with prolonged gestation. Data on the spatial connections with these loci, eQTLs and enhancer activity is consistent with these post-term variants acting as functional determinants of gestational length. Thus, the SNPs associated with the B3GALT5, ADAMTS13, SSBP2, and TKT loci may alter the expression of these genes, contributing to the post-term phenotype by affecting regulation of processes involved in human development such as growth and metabolism, and, more specifically, hematopoiesis.
Post-term Associated TKT and SSBP2 are linked to Alterations in Cellular Growth, Proliferation, and Metabolism
Alterations in cellular growth, proliferation, and metabolism pre-program biological development (e.g. pentose phosphate pathway) resulting in an amplified risk of chronic noncommunicable disease[29]. Proteins encoded by the TKT and SSBP2 genes are involved in the cellular growth, proliferation, and metabolism and are thus capable of altering developmental trajectories. For example, TKT is involved in carbohydrate metabolism[30] and could contribute to the later-in-life increased adiposity and risk of metabolic syndrome in children and adults born post-term[6, 7].
Post-term Associated TKT is linked to Alterations in Cell Cycle and Growth
TKT dysregulation has an important role in cellular growth rates. The TKT gene encodes a protein that contributes to the main carbohydrate metabolic pathways by connecting the pentose phosphate pathway (PPP) to glycolysis. This process results in NADPH synthesis. NADPH is part of the control for reactive oxygen species, which were found to be imbalanced in post-term births[31]. A critical role for TKT (highly expressed in oocytes) is in oocyte cell cycle progression and maturation[30]. Under-expression of TKT in maternal mice contributes to pregnancy resulting in fewer progeny, retarded postnatal growth, and reduced levels of adipose tissue in offspring[32]. This phenotype has obvious similarities to post-term infants, who are typically born lean[14]. Collectively, the effects of aberrant TKT expression can be interpreted as being consistent with TKT variation in humans contributing to aberrant gestational timing. Moreover, the links between variation in TKT function and metabolism may help to partially explain the observed links between post-term birth and the later development of symptoms of the metabolic syndrome [6].
Post-term associated SSBP2 and ADAMTS13 are linked with hematopoiesis and blood disorders
Alterations in SSBP2 and ADAMTS13 levels could be affecting gestation through alterations in hematopoietic pathways. There is a large developmental aspect to hematopoiesis during different phases of gestation. After the second month of human gestation, hematopoiesis occurs primarily in the liver, and by the fourth to fifth month of gestation hematopoiesis switches again, this time to the bone marrow (where it occurs for the duration of life). Maturity of the hematopoietic system occurs late in gestation and tracks with gestational age, with the proportions of fetal hemoglobin decreasing during the progression from preterm - term - post-term (83.93% to 68.59% to 60.03%, respectively) [33]. In post-term births, cord blood collected at birth shows differences in levels of polycythemia (increased concentration of hemoglobin in the blood), erythropoietin levels (increased erythropoiesis), mean corpuscular hemoglobin, red blood cell count, neutrophil count, and monocyte count [34]. And finally, primiparity (a risk factor for post-term birth), post-term delivery, or delivery by emergency caesarean section are all correlated with an increased risk of arterial ischaemic stroke (17–20 per 100 000 live births) in the first year of life [35].
Our eQTL analysis showed that the variants in SSBP2 did not show significant evidence of transcriptional regulatory effects on SSBP2 itself. However, there were putative eQTLs with HBG1, HLA-DRB5 and FYB (Supplemental Figure 1). HBG1 is a key component of the fetal hemoglobin locus that is normally expressed in the fetal liver, spleen and bone marrow, as a part of the constitution of fetal hemoglobin (HbF)[36]. In adults, the beta-globin locus is only accessible (chromatin open, DNase I hypersensitive) in adult erythroid cells, however HBG1 shows chromatin accessibility in both fetal and adult erythroid cells[36]. HLA-DRB5, is a component of the major histocompatibility complex and thus plays a central role in the immune system, specifically antigen presentation and recognition[37]. The FYB gene encodes a hematopoietic-specific protein involved in platelet activation[38]. In mice, FYB knockout affects platelet function and causes mild thrombocytopenia[39]. Therefore, our results are consistent with rs2135 being involved in dysregulation of genes that contribute to the production and development of fetal blood.
The ADAMTS13 gene encodes a protease that has previously been shown to disrupt the regulation of platelet thrombosis by cleaving Von Willebrand factor [40]. Critically, ADAMTS13 variants are also known to cause neonatal platelet disorders. These disorders include Upshaw-Schulman syndrome (hemolytic anemia) and blood hyper-coagulation (thrombophilia), the second of which is associated with fetal loss[28]. Thus, the identification here of ADAMTS13 (rs655911) as significantly associated with post-term birth is consistent with earlier findings of hematopoetic phenotypes associating with post-term birth, and could be identifying a key gene in this cause-effect association with gestational length. Notably, the identification of a significant eQTL between a postterm associated B3GALT5 SNP and B3GALTL expression reinforces the significance of the ADAMTS13 variants (Table 3) [21]. Specifically, B3GALTL encodes a beta-1,3 glucosyltransferase that is required to glycosylate the ADAMTS13 gene product as part of its pre-processing for Von Willebrand factor cleavage [41].
Therefore, the role of alterations in expression of both SSBP2 and ADAMTS13 could result in a post-term birth phenotype arising through alterations in hematopoiesis. It should be noted that the SSBP2 variant was shown to repress transcription in the luciferase assay while the ADAMTS13 locus wasn’t able to be assessed. In future work, identifying the regulatory role of these regions could help elucidate the cause-and-effect relationships between hematopoiesis and gestational length.
Post-term Birth Versus the Rise of More Intensive Obstetric Management of Birth
Analysing two cohorts from the same geographical region (Northern Finland) enabled us to control for regional and culture-specific differences in routine management of pregnancies. However, two major confounders remain between the 1966 and 1986 cohorts: 1) technological changes led to better estimation and certainty of gestational age in the 1986 cohort; and 2) management practices led to changes in the incidence of induced labor and post-term birth. Firstly, the uncertainty surrounding gestational age prediction in 1966 raises issues around phenotype definition in this cohort. The impact of this ambiguity on ours study was minimized through the use of a narrow term gestational age range of 38 0/7 to 40 0/7.
There have been significant changes to obstetric management over the last 50 years, with a shift from conservative monitoring of prolonged pregnancies through to the current recommendations to induce women who are beyond 41 weeks gestation[5, 8, 42]. The induction of labor became a therapeutic option between 1966 and 1986. Therefore, the NFBC1986 cohort contains births that would have been post-term if not for induction and/or Caesarean-section. Consistent with this, we observed a reduction in numbers of post-term births from approximately 20% in the 1966 NFBC cohort to less than 5% in the 1986 NFBC cohort. However, while the induction of labour is an improvement in obstetric management, it is possible that these individuals have “post-term potential” and carry genetic risks that were not mitigated by the act of induction. Therefore, we excluded all term-born induced births from the 1986 cohort from the analyses. It should be noted that the fact that we identified genetic variants associated with post-term birth is consistent with the idea of post-term potential and induced children having similar genetic health risks as those born post-term.
We have identified genetic variants in proximity to the B3GALT5, ADAMTS13, SSBP2, and TKT loci as being associated with post-term birth in two birth cohorts (NFBC1966 and NFBC1986). This finding is consistent with previous observations that suggested there was a genetic component to post-term birth[12, 13, 15–20, 43]. Spatial and mRNA expression analyses further showed provided novel clues about how these loci contribute to the regulation and consequences of post-term birth. This study forms a foundation for a better understanding of the genetic and long term metabolic health risks faced by induced and post-term individuals. Since nearly 20% of births in the NFBC cohort were postterm, the long-term risks for induced individuals who have a previously overlooked post-term potential may be a major issue for current health providers.
Materials and Methods
Subjects
We undertook a discovery-replication study of two successive birth cohorts from Northern Finland (i.e. NFBC 1966[10] and NFBC1986[11]). Both cohorts were recruited from the two northernmost provinces of Finland (i.e. Oulu and Lapland). Each cohort followed participants prospectively from approximately 12-16 weeks of gestation, providing one of the earliest-known cohorts with accurate gestational age determination.[10, 11]
The NFBC1966 dataset consists of 12,231 children born to 12,068 mothers. This cohort represents 96% of all children born in Oulu and Lapland in 1966 with expected delivery dates between Jan 1st and Dec 31st 1966. Blood samples of the children in this cohort were collected for genotyping at age 31 (i.e. in 1997) and genetic data was available for 5,402 individuals. Genotyping was completed using Illumina HumanCNV370DUO Analysis BeadChip and the Beadstudio 3.1 algorithm.
The NFBC1986 dataset consists of a prospectively recruited cohort containing 9,432 children born to 9,362 mothers. This cohort represents 99% of all available births in Northern Finland between the 1st of July 1985 and the 30th of June 1986. Blood samples from the children of the NFBC1986 cohort were collected for genotyping at 16 years of age (i.e. in 2002-2003). Genetic data is available for 3,739 individuals in total (~500 were selected as representing individuals with GDM, GHT, and preterm birth; the remaining represented a random sample of the cohort). Genotyping was completed using the OmniExpresse Exome Chip and the Beadstudio 3.1 algorithm.
Defining the post-term dataset
The gestational ages of the individuals in the 1966 and 1986 NFBC cohorts were calculated at their first antenatal visit. For the NFBC1966 cohort gestational age was calculated through last menstrual period. In the NFBC1986 cohort gestational age was based on ultrasound at <20 weeks of gestation or on the last menstral period, with discrepant cases reviewed in detail from medical records as previously described.[44] The control cohort was restricted to those born at full-term which was defined as between 38 0/7 to 40 0/7 weeks of gestation. Children born between 37 0/7 to 37 6/7 weeks (i.e. Early term) or 41 0/7 to 41 6/7 weeks (i.e. Late term) were excluded to reduce mischaracterization due to errors in the calculated gestational age. The post-term case cohorts included those individuals born at ≥42 0/7 weeks of gestation. To further reduce the chances of obscuring the genetic potential of gestational age, we excluded individuals from our control cohort who were born early due to induction of labour or other factors: 1) those born from multiple births; 2) those whose mother had gestational diabetes (prediabetes in 1966 cohort); and 3) those whose birth was by planned caesarean section.
Quality control of genetic data
Genetic data was vetted for quality control. Genetic data for a subject was excluded if: 1) the call rate was < 95% (99 % if the minor allele frequency < 5 %); 2) the mean heterozygosity was < 0.29; 3) there were multidimensional scaling (MDS) outliers; 4) the concordance with other DNA samples in the cohort ≥ 0.99 (risk of being duplicated sample); 5) identity by state (IBS) pairwise comparisons were > 0.99 with most other samples (suspicion of samples being contaminated); 6) IBS pairwise sharing was > 0.20; 7) consent was not given; 8) comparison to medical records identified a gender-genotype mismatch; 9) there was an elevated heterozygosity rate (4 or more standard deviations from the mean); or 10) there was significant deviation from the Hardy-Weinberg Equilibrium (p < 0.0001).
After all QC measures and exclusions were applied, 5,402 and 3,739 individuals remained in the 1966 and 1986 studies, respectively. These included 1034 post-term individuals and 2375 term-born controls from the NFBC1966 cohort and 133 post-term individuals and 1250 term-born controls from the NFBC1986 cohort.
Imputation of genetic data
Impute version 2 was used to estimate the single nucleotide polymorphisms (SNPs) that were not sampled directly by the genotyping platform for the NFBC 1966 samples. The imputation used HapMap 2 (Build 36) as the reference panel and proper_info > 0.4 as the quality metric[45, 46]. Before imputation, there were 309,948 directly genotyped SNPs. After imputation, 3,855,963 SNPs, including those directly genotyped, were available from the 1966 genotypes for analysis.
Imputation of the missing 1986 genotypes was carried out in two steps: 1) a pre-phasing step that estimated haplotypes for all available samples using the SHAPEIT program with the 1000 genomes reference panel as the guide; and 2) an imputation step (Impute version 2) that imputed the missing alleles directly onto the phased haplotypes[45, 46]. After imputation, 59,683,063 SNPs, including those directly genotyped, were available from the 1986 genotypes for analysis.
Statistical analysis
SNPTEST version 2[46] was used to perform all genetic analyses on the imputed genetic data for both cohorts. In the regression analysis, the main effects model tested the association between SNP markers and gestational age. SNP genotypes were coded as 0, 1, or 2 (according to the number of copies of the minor allele) and an additive model of genetic variance was assumed where the effect on the trait of the heterozygote was estimated to be midway between the levels of the two homozygotes. This model fits the best assumption for a post-term phenotype that is thought to have a small amount of genetic variance produced by multiple genetic variants in combination.
All genetic analyses also accounted for child’s sex, as this was the only trait that has consistently shown a large effect on post-term birth status in any previous post-term studies[15, 43, 47, 48].
The p-value for results that were suggestive of statistical significance in the discovery phase in each cohort was set at any p-value less than 1 x 10−5. In the validation phase, any finding (p < 0.05) within the significant LD block was considered validation of significance of the locus. These p-values were selected because we planned functional analyses to confirm the significant variants,
Quantile-quantile plots were generated (qqman package in R), by plotting the expected distribution of p-values versus the observed p-values (assuming a uniform distribution), to test for possible sources of p-value inflation.
Spatial analysis of the validated SNPs for putative regulatory roles within the genome
HiC spatial genomic connectivity (HiC) data was used to identify genes that SNPs connected to[24, 27]. GWAS3D[49] was used, with default parameters, to identify physical connections (as captured by proximity ligation) that occurred with the most significant GWAS SNPs.
Identification of gene expression alterations associated with the GWAS loci
eQTL analysis identifies SNPs that associate with altered expression level(s) of one or more genes[50]. The Genotype-Tissue Expression (GTEx) project database (version 6) eQTL data is powered for the global examination of larger eQTL effects[51]. Therefore, we limited false positives in our trans-eQTL results by only testing eQTLs supported by SNP-gene spatial interactions. Significance levels for this analysis were based on evidence from prior literature[27, 51, 52]: cis-eQTL (genes < 1 Mb distance from SNP, p<1x10−4), trans-eQTL (longer distance or inter-chromosomal, p<1x10−3). Thus, the identification of a SNP-gene spatial interaction that is re-inforced by SNP-gene eQTL has two independent sources of evidence verifying the long-distance transcription regulatory functions.
Luciferase assays
Enhancer activity of the post-term SNPS in proximity to loci (B3GALT5, ARHGAP42, ADAMTS13, SSBP2, and TKT) was measured by luciferase assay. Briefly, the regions spanning each SNP were PCR amplified from genomic DNA obtained from 1000 genome samples cloned into the Gateway adapted pGL4.23-GW (Addgene Plasmid #60323) [53] and sequenced to confirm the genotype. For rs11170213, no sample genotype information could be obtained from the 1000 genomes, so the region spanning the allele ‘A’ of the SNP was amplified from MCF-7 genomic DNA. The Allele ‘C’ version of the SNP was generated by site directed mutageneis (SDM) of the cloned pGL4.23 plasmid using the QuickChange mutageneis protocol (Agilent Technologies). Primers for PCR amplification and SDM are listed in Supplemental Table S1. The ADAMTS13 locus could not be amplified by PCR. HeLa cells were seeded at 5x103 cells per well in a 96 well plate, grown in DMEM media supplemented with 10% fetal bovine serum (Thermo Scientific, #11995-065) one day prior to transfection. Cells were co-transfected with the cloned pGL4.23 and renilla plasmids using Lipofectamine 3000 (Thermo Scientific, #L3000008) and the the Promega Dual Glo Luciferase Assay System (#PME2920) was used to measure luciferase activity after 48 hours. Luminescence was normalized to Renilla and expressed relative to the normalized luminescence of empty pGL4.23. Results are from four independent biological replicates.
Funding
This work was supported by a Gravida Centre of Research Excellence grant (WSC) an Auckland University Future Research Development Fund grant (3702119 to JMOS) and a University of Auckland Scholarship (WS), NFBC ACKN-ACADEMY, SS, MRJ and VK received funding from the Academy of Finland [#268336] and the European Union’s Horizon 2020 research and innovation program [under grant agreement No 633595] for the DynaHEALTH action and [under grant agreement No 733206] for the LIFECYCLE action. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conflict of Interest Statement
The authors declare no conflicts of interest.
Acknowledgments
The authors would like to thank and acknowledge the contributions of Tuula Ylitalo, Nikman Adli Nor Hashim, Alexessander Da Silva Couto Alves, Andrianos M Yiorkas, and Minna Männikkö, each of whom played a significant role in data generation and collaboration at the NFBC.