1 Abstract
On average, educated people are healthier, wealthier and have higher life expectancy than those with less education. Numerous studies have attempted to determine whether these differences are caused by education, or are merely correlated with it and are ultimately caused by another factor. Previous studies have used a range of natural experiments to provide causal evidence. Here we exploit two natural experiments, perturbation of germline genetic variation associated with education which occurs at conception, known as Mendelian randomization, and a policy reform, the raising of the school leaving age in the UK in 1972. Previous studies have suggested that the differences in outcomes associated with education may be due to confounding. However, the two independent sources of variation we exploit largely imply consistent causal effects of education on outcomes much later in life.
Author contributions
NMD obtained funding for this study, analyzed and cleaned the data, interpreted results, wrote and revised the manuscript. MD interpreted the results, and wrote and revised the manuscript. GDS interpreted the results, wrote and revised the manuscript. FW interpreted the results, and wrote and revised the manuscript. GvdB interpreted the results, and wrote and revised the manuscript.
Acknowledgements
The Medical Research Council (MRC) and the University of Bristol support the MRC Integrative Epidemiology Unit [MC_UU_12013/1, MC_UU_12013/9]. The Economics and Social Research Council (ESRC) support NMD via a Future Research Leaders grant [ES/N000757/1]. No funding body has influenced data collection, analysis or its interpretations. This publication is the work of the authors, who serve as the guarantors for the contents of this paper. This work was carried out using the computational facilities of the Advanced Computing Research Centre - http://www.bris.ac.uk/acrc/ and the Research Data Storage Facility of the University of Bristol - http://www.bris.ac.uk/acrc/storage/. This research was conducted using the UK Biobank Resource.
2 Introduction
Educational decisions, such as choosing to remain in school, made comparatively early in life associate with substantial differences in outcomes across the life course.(1–8) Unfortunately for researchers interested in the causal effects of education, these choices do not occur at random. For example, on average people who chose to remain in school for longer are more likely to have educated parents. Thus it is challenging to determine if education causes differences in outcomes later in life, or if other, potentially unknown, factors drive these associations. As a result, approaches such as multivariable adjustment are likely to suffer from residual confounding.(9) In contrast, instrumental variable analysis can potentially estimate the causal effects of education in the presence of unmeasured confounding of the education-outcome association. Three assumptions define instrumental variables: 1) they must associate with the risk factor of interest (the “relevance/informativeness criterion”); 2) they have no common cause with the outcome (“the independence assumption”); and 3) they have no effect on the outcome except via the risk factor of interest (the “exclusion restriction”).(10) Natural experiments, such as legal changes to school leaving ages are potential instrumental variables for educational attainment. These changes forced people to remain in school for longer, and, because parents could not have anticipated them, are unlikely to be associated with factors that confound the association of education and other outcomes. The size of the effect of education can be estimated using instrumental variable estimators.(1)
Another potential instrumental variable for education are genetic variants that are known to associate with educational attainment.(6, 11, 12) The use of genetic variants as instrumental variables is known as Mendelian randomization. This approach exploits the natural experiment that occurs at conception – when each child inherits half of each of their parents’ genomes. This process means that at each locus there is a 50% chance of inheriting one or other of their parents’ alleles. The first instrumental variable assumption is likely to hold because large genome-wide association studies (GWAS) have discovered genetic variants that robustly associate with education. Because of the segregation of alleles at conception, these genetic variants are also independent of many confounders. While many phenotypes are far more associated with each other than would be expected by chance, genetic variants known to associate with one trait, tend to be independent of other potential risk factors.(13) Furthermore, each person’s genome is set at conception cannot be affected by their later educational choices or other outcomes.
While legal changes to school leaving ages have been widely used as an instrumental variable for education, genetic variants known to associate with education have received less attention. The instrumental variable assumptions are plausible for phenotypes whose biological pathways are relatively well understood (e.g. variants in the CRP gene for CRP levels(14) or variants in ALDH2 for alcohol(15)), they may be less plausible for phenotypes where the mediating pathways are less well understood such as education. For example, genetic variants that affect parents’ education may have direct effects on the offspring (so-called “dynastic effects”); parents assortatively mating on education(16); or that more educated parents have different ancestry from those with less education. These potential sources of bias are illustrated in Supplementary Figure 1. While Mendelian randomization using samples of unrelated individuals may be a credible identification strategy for biologically proximal phenotypes such as CRP or alcohol consumption, it may be less plausible for biologically distal phenotypes such as education. Recent studies have used genetic variants known to associate with education to estimate the effects of education on coronary heart disease and dementia.(6, 12) However, unlike hypotheses that relate to biological traits, such as lipids, there are no randomized trials that can provide gold-standard evidence of the causal effects of education.
Two key questions in the scientific and policy literature, are whether the effects of education across individuals or at different points in the life course are heterogeneous.(17–20) For example, does an additional year of schooling at age 16 have the same effect on everyone? Does an additional year of schooling at age 16 have the same effect as an additional year of schooling at age 20? Many of the previously investigated policy reforms affect a subset of individuals at a specific age (e.g. the effect of an additional year of education for low ability students at age 16). Policy makers may be interested in the effects of education on average across the whole population, or of the effects of obtaining a specific length of schooling (e.g. staying in school to age 18 versus 16). However, genetic variants affect educational choices across the entire lifespan. They identify the average effect of an additional year of school across the entire cohort.
Here we compare two potential instrumental variables, a policy reform and Mendelian randomization within the same sample. We have previously reported the effects of educational attainment using the raising of the mandatory minimum school leaving age using data from the UK Biobank.(21) We assess the plausibility of the Mendelian randomization assumptions for estimating the effects of educational attainment. We estimate the long-term effects of education using both genetic variants and the raising of the school leaving age.
3 Results
3.1 Descriptive statistics
The UK Biobank invited 9.2 million people aged between 40-69 to attend 23 centres across Great Britain.(22) Of those invited 503,325 (5.47%) were recruited in 2006-2010 to the study. Of these 315,436 met the inclusion criteria for this study. See the supplementary materials for a flowchart of the inclusions and exclusion of participants (Supplementary Figure 2). The average age when attending the assessment centre was 56.9, and 53.8% were female. On average, UK Biobank participants were more educated than the British population, 41.0%, 64.0%, and 82.1% had a degree or equivalent, had post-16 education, and any academic qualifications respectively. Whereas the UK census found that 27.9%, 61.8%, and 76.5% of the British population aged between 40 and 70 in 2011 had these qualifications respectively.(23) See Table 1 for a description of the participants included in this study. We used inverse probability weights to correct for this selection.
3.2 Testing the relevance assumption
Participants born after August 1957, who were affected by the raising of the school leaving age, were 26.6 (95% confidence interval (95%CI): 21.7 to 24.4) percentage points more likely to remain in school after age 15. We used the 74 genetic variants detected in the educational attainment GWAS to construct a weighted genetic score in the UK Biobank. Each variant was weighted by its association with educational attainment in the discovery sample of the GWAS. The educational attainment allele score was more weakly associated with educational attainment. A unit increase in the score was associated with 1.48 additional years of education (95%CI: 1.39 to 1.57) as defined by the International Standard Classification of Education (ISCED). Thus the educational attainment allele score was a strong instrument, but explained less of the variation in educational attainment than the raising of the school leaving age. Neither proposed instrument are likely to suffer from weak instrument bias. The policy reform induced fewer individuals to leave school before the age of 16 (Figure 1, top). Whereas the educational attainment allele score was associated with an increased likelihood of remaining in school at all ages (Figure 1, bottom).
3.3 Bias component plots
We were concerned that our results may be affected by selection bias or residual confounding. If there was strong selection into the study then this could induce correlations between the instruments and outcomes that are independent in the population. We evaluated this using bias component plots.(24) Bias component plots compare the relative bias of the instrumental variable and conventional estimators if an observed covariate was omitted. We assessed the bias associated with 14 non-genetic phenotypes and polygenic scores for 45 traits. The biases for the educational attainment genetic score were similar in size to those for the raising of the school leaving age (Figures 2 and 3).
3.3.1 Phenotypic confounders
The parents of participants affected by the raising of the school leaving age were less likely to have died. These differences are likely to be due to cohort effects. On average participants affected by the reform were one year younger than those who were not affected. Offspring educational attainment may also affect parental mortality.(25–27) There was little evidence that the reform affected any of the other baseline and childhood phenotypes. There was evidence that the educational attainment genetic score was non-randomly distributed across the UK (Figure 2). On average, genetic variants associated with educational attainment were more common in the east and south of the UK. However, the magnitude of these associations was relatively small. There was evidence that the educational attainment genetic score associated with having been breastfed, birthweight, being taller than average at age 10, and whether the participants’ mother smoked in pregnancy. These associations may be driven by dynastic effects or assortative mating. Dynastic effects could occur because on average, participants with more education associated genetic variants will have more educated parents. If more educated parents behave differently, e.g. smoke less in pregnancy, then this could cause an association with the educational attainment genetic score. Assortative mating could induce associations if for example, on average more educated parents choose taller spouses. Nevertheless, these covariates weakly associate with the outcomes. There is little evidence that the covariates associate more strongly with the educational attainment genetic score than the reform. As a result, for many outcomes the bias induced by these covariates is small, for more details see the full adjusted sensitivity analyses below. This suggests that residual confounding due to phenotypic covariates is unlikely.
3.3.2 Genetic confounders
The educational attainment genetic score weakly associated with polygenic scores for other phenotypes including bipolar disorder, childhood intelligence, inspection time, simple reaction time and infant head circumference (Figure 3). However, there was little evidence that bias components for the educational attainment genetic score were larger than those for the raising of the school leaving age. This suggests that genotypic confounding is limited.
3.4 Effect of educational attainment on outcomes
Figure 4 plots the estimated effects of an additional year of education on each of the 25 outcomes.
3.4.1 Mortality
Each additional year of education was observationally associated with −0.14 (95%CI: −0.16 to −0.11) percentage points lower mortality. The Mendelian randomization estimates were similar to this but less precise (−0.37 95%CI: −0.80 to 0.06). This effect is larger than the observational association of educational attainment and mortality, but smaller than the effect of remaining in school estimated by the raising of the school leaving age (Figure 4).
3.4.2 Morbidity
Observationally, an additional year of education generally associated with improved health. Each year of education was associated with 0.65 per 100 (95%CI: 0.58 to 0.72) fewer cases of hypertension, 0.30 (95%CI: 0.27 to 0.34) fewer diagnoses of diabetes, 0.14 (0.12 to 0.17) fewer strokes, 0.27 (95%CI: 0.24 to 0.30) fewer heart attacks, and 0.60 (95%CI: 0.55 to 0.66) more episodes of depression. There was little evidence of differences in rates of cancer diagnoses. The Mendelian randomization estimates suggested that each year of education reduced the likelihood of being diagnosed with hypertension by 1.04 per 100 (95%CI: −0.18 to 2.25), diabetes by 1.38 (95%CI: 0.78 to 1.97), stroke by 0.50 (95%CI: 0.14 to 0.86), heart attack by 1.21 (95%CI: 0.70 to 1.71). However, the Mendelian randomization estimates provided little evidence of an effect on depression or cancer. The estimates based on the raising of the school leaving age were in the same direction as the Mendelian randomization results. The policy reform suggested larger effects on diabetes and stroke.
3.4.3 Health behaviours
An additional year of education was associated with 1.65 per 100 (95%CI: 1.57 to 1.73) and 1.10 (95%CI: 1.04 to 1.17) fewer ever and current smokers. The Mendelian randomization analysis suggested that the causal effects of education on smoking were substantially larger, 8.25 (95%CI: 6.78 to 9.73) and 4.38 (95%CI: 3.43 to 5.34) fewer smokers per 100. The estimates based on the raising of the school leaving age were similar to those using Mendelian randomization. Each year of education was associated with a 0.07 (95%CI: 0.07 to 0.08) units increase in alcohol consumption. The Mendelian randomization estimates implied the causal effect of an additional year of schooling was 0.19 (95%CI: 0.15 to 0.23). Each year of education was associated with watching 0.16 (95%CI: 0.15 to 0.16) fewer hours of television per day. The Mendelian randomization suggests that this is likely to underestimate the causal effects (0.49, 95%CI: 0.44 to 0.54). A year of education was associated with 0.02 (95%CI: 0.02 to 0.02) fewer days per week of moderate exercise. The Mendelian randomization estimate suggested this underestimated the causal effect (0.10, 95%CI: 0.04 to 0.16). There were only very small associations between educational attainment and vigorous exercise which were similar to the Mendelian randomization and policy reform estimates.
3.4.4 Income
Each additional year of education was associated with a higher probability of having an income above £18,000, £31,000, £52,000 and £100,000 of 3.91 (95%CI: 3.74 to 4.06), 4.59 (95%CI: 4.51 to 4.67), 3.34 (95%CI: 3.20 to 3.48), 0.94 (95%CI: 0.88 to 1.00) per 100 participants respectively. The Mendelian randomization estimates were larger, suggesting 9.42 (95%CI: 7.93 to 10.90), 11.33 (95%CI: 9.94 to 12.72), 9.22 (95%CI: 8.06 to 10.38), and 2.98 (95%CI: 2.44 to 3.53) increase per 100 participants. The raising of the school leaving age analysis were similar in direction and magnitude as the Mendelian randomization estimates but provided little evidence that education affected the probability of having the highest income.
3.4.5 Indicators of ageing
Each year of education was associated with 0.25 (95%CI: 0.24 to 0.26) stronger grips. The Mendelian randomization estimates suggest a larger causal effect of 0.42 (95%CI: 0.22 to 0.61). Education was also associated with lower arterial stiffness 0.06 (95%CI: 0.04 to 0.07). The Mendelian randomization estimate was imprecise, but in the same direction, implying each year of education reduced arterial stiffness by 0.04 (95%CI: −0.14 to 0.22). The estimates based on the raising of the school leaving age suggested a larger effect on grip strength, but similar equivocal effects on arterial stiffness.
3.4.6 Anthropometry
Each additional year of education was observationally associated with a 0.28 (95%CI: 0.27 to 0.29) cm increase in height and 0.18 (95%CI: 0.17 to 0.18) kg/m2 reduction in BMI. The Mendelian randomization estimates suggested larger causal effects of education of 0.99 (95%CI: 0.80 to 1.17) cm increase in height and a 0.71 (95%CI: 0.57 to 0.86) kg/m2 reduction in BMI. The estimated effect on height using the raising of the school leaving age was very similar to the observational association. Whereas the effects on BMI estimated using the reform were much larger than the observational associations, and very similar to the Mendelian randomization estimates. The effect on education on height is likely to be due to pleiotropic or residual population stratification. We investigated this using a negative control outcome: whether the participant reported being taller than average at age 10. Mendelian randomization implied that each additional year of education was associated with being 4.14 (95%CI: 3.03 to 5.24) percentage points more likely to report being taller than average at age 10. We investigated this finding further in the pleiotropy robust sensitivity analyses below.
3.4.7 Blood pressure
Each additional year education was associated with lower diastolic and systolic blood pressure (0.12 mmHg, 95%CI: 0.10 to 0.14 and 0.32 mmHg 95%CI: 0.29 to 0.35 respectively). The genetic analysis suggested the causal effects were in the same direction, but larger (0.82 mmHg 95%CI: 0.59 to 1.08 and 1.20 mmHg 95%CI: 0.70 to 1.71 respectively). There was little evidence the reform affected diastolic blood pressure, and some evidence that it increased systolic blood pressure, however, these estimates are likely to be biased due to age effects.(21)
3.4.8 Neurocognitive
Each year of education was associated with 0.25 (95%CI: 0.25 to 0.26) additional correct answers on the intelligence test, but there was little difference in subjective well-being. The Mendelian randomization estimates suggested that educational attainment caused a 0.93 (95CI: 0.80 to 1.05) additional correct answers, but found little detectable effect on subjective well-being. The estimates of the effect on intelligence based on the raising of the school leaving age were also positive, but were slightly smaller. There was little evidence that the reform affected subjective well-being.
3.5 Sensitivity analyses
3.5.1 Weighting to account for non-random sampling
Reanalysing the data without applying inverse probability weights did not materially influence the Mendelian randomization estimates, see Supplementary Figure 2.
3.5.2 Association between the educational attainment genetic score and the outcome, the “reduced form”
We present the associations between the educational attainment genetic score and each of the 25 outcomes in Supplementary Figure 4. These associations are consistent in direction with the main instrumental variable results presented above.
3.5.3 Robustness of results to adjustment
We investigated whether the results were affected by removing the covariates, including sex, year and month of birth, and the first ten principal components of population stratification. The only estimates that was affected by this were grip strength and height, which attenuated towards the null. See Supplementary Figure 5 for details. There was little detectable impact of adjusting for a range of confounders, see Supplementary Figure 6. The estimated effect of height attenuated modestly. These sensitivity analysis suggests that residual confounding is unlikely to explain our results.
3.5.4 Pleiotropy robust methods
We investigated whether the results could be explained by pleiotropy using MR-Egger, weighted median and weighted mode approaches (Supplementary Figure 7). MR-Egger was highly imprecise for all outcomes and provided few inferences. The different estimators provided consistent evidence of causal effects for some of the outcomes including diabetes, Heart attack, mortality, smoking, income, grip strength, BMI, blood pressure, intelligence, alcohol consumption, and exercise. There was evidence of differences in the estimates for height, which may indicate the inverse variance weighted and two-stage least squares estimates above suffer from pleiotropy. The weighted mode estimator suggested little effect of educational attainment on height. We present the I2 statistics of the heterogeneity in estimated effects of education across the 74 genetic variants in Supplementary Table 2.
4 Discussion
Our findings suggest that the differences in many later life outcomes between educational groups are likely to be caused by education. There was evidence that genotypic perturbations in educational attainment associated with morbidity, including the risk of hypertension, diabetes, stroke, heart attack, and mortality. Furthermore, these results imply that education reduces the risk of currently or ever smoking, increases household income, lowers blood pressure and increases scores on intelligence tests. However, there was evidence that education reduced rates of moderate exercise and increased alcohol consumption. Our sensitivity analyses suggest that confounding by genotypic or phenotypic confounders, or specific forms of pleiotropy are unlikely to explain our results.
Triangulating across multiple sources of evidence can help provide stronger evidence of causal effects.(28) Here, we found that the two natural experiments gave remarkably similar results. The two sources of variation, Mendelian randomization and the raising of the school leaving age, have distinct causes of and direction of bias. The similarity in results strengthens the case that education has causal effects. The raising of the school leaving age affected relatively low-ability students, who were forced to remain in school for an additional year.(1) In contrast, our Mendelian randomization results exploit variation across the entire distribution of educational attainment, estimating an average effect of an additional year of schooling for everyone from those who leave school at 15 to graduates (see Figure 1). A priori there was little reason to assume that a year of additional schooling will have the same effects on a high school leaver as on a graduate. Surprisingly we found relatively little evidence that educational attainment had heterogeneous effects. The estimates from the two natural experiments are remarkably similar, both in direction and in many cases magnitude. There was very little evidence of heterogeneity in the effects identified by different variants. The effect of an additional year of education on smoking is comparable to other studies using natural experiments. For example, Grimard and Parent (2007) used data from the US Current Population Survey and the Vietnam draft to estimate that in 1995-99 an additional year of schooling caused a 7.97 (95%CI: 3.15 to 12.79) and 11.13 (95%CI: 5.54 to 16.72) percentage point reduction in probably of currently or ever smoking.(29) For other outcomes, such as measured blood pressure the results are very different. The effects of education on blood pressure estimated using the raising of the school leaving age may reflect non-linear cohort effects as previously discussed.(21)
We found some evidence that the educational attainment genetic score correlated with baseline covariates, including birth weight, being taller than average at age 10, mother smoked in pregnancy, parental mortality and geography. These associations may reflect dynastic effects or assortative mating (Figure 3). If there is assortative mating, then this could induce associations between education variants and variants for other traits. For example, if highly educated people assortatively mate with taller spouses, then the Mendelian randomization estimates of the effect of education on height would be positively biased. These effects may explain the implausible Mendelian randomization estimate of the effect of education on height. A sub-sample (N= 310,230) of the study provided information on whether they were taller than average at age 10, this variable cannot be affected by completed years of educational attainment. When we adjusted for being taller than average at age 10 the estimated effect falls from 0.93 (95CI: 0.78 to 1.09) to 0.62 (95CI: 0.46 to 0.78) cm increase in height per year of education. This result suggests that the effects on height may be induced by assortative mating, dynastic effects or population stratification.
A limitation of our study is that we used a non-representative sample. We have addressed this using inverse probability weights. The weights made little differences to the Mendelian randomization estimates. This suggests that sample selection bias is unlikely to affect our results.
The Mendelian randomization estimates can suffer from bias due to assortative mating or dynastic effects, however except height, adjusting for measured baseline covariates had little affect on our results (Supplementary Figure 5 and 6). Our results could reflect either direct effects of the participants’ educational attainment, assortative mating between their parents, dynastic effects of their parents’ education or differences in ancestry not accounted for by the principal components (Supplementary Figure 1). These potential explanations could be evaluated using either offspring-mother-father trios or sibling designs. Okbay and colleagues (2016) found little evidence that the effects of the genome-wide significant education variants attenuated after controlling for family structure.(30) However, these analyses may not have had sufficient power to detect dynastic effects. Kong and colleagues investigated this using a sample of parent and offspring from Iceland.(31) They found that a polygenic score for education, made up of alleles that were not inherited, was associated with offspring’s education. The association with the non-inherited polygenic score was 29% of the size of the association with the inherited genetic score. This suggests that the effects we identify are likely to represent a combination of the effect of the participants’ education and their parents’ education. Kong and colleagues results provide an upper bound for the contribution of parents’ education of 29%. The contribution of parents’ education to our results will be smaller if the direct effect of parents’ education on each outcome is smaller than the effect of the participant’s own education. To determine the relative contributions of parent versus offspring education will require large samples of parent-offspring data.
A further limitation is that the genetic variants may have pleiotropic, or direct effects on the outcomes, as their biological mechanisms of effect are unknown. However, our estimates were similar when using the weighted median and mode estimates. An exception to this was height, where the mode and median based estimate suggested smaller effects.
In conclusion, two independent natural experiments suggest that education has wide-ranging effects on important outcomes measured much later in life. Importantly, the two experiments affected different educational choices – one exclusively affecting those at the bottom of the distribution, the other affected education levels across the whole distribution – and yet find effects of a similar magnitude. This suggests a common treatment effect of additional education on many health behaviours and outcomes.
Footnotes
Classification: Social Sciences (Economic Sciences) and Biological Sciences (Genetics).
Conflicts of interest: We report no conflicts of interest.
Author contributions
NMD obtained funding for this study, analyzed and cleaned the data, interpreted results, wrote and revised the manuscript. MD interpreted the results, and wrote and revised the manuscript. GDS interpreted the results, wrote and revised the manuscript. FW interpreted the results, and wrote and revised the manuscript. GvdB interpreted the results, and wrote and revised the manuscript.