Abstract
Recombinant inbred lines (RILs) are an important resource for mapping genes controlling complex traits in many species. While RIL populations have been developed for maize, a maize RIL population with multiple teosinte inbred lines as parents has been lacking. Here, we report a teosinte nested association mapping population (TeoNAM), derived from crossing five teosinte inbreds to the maize inbred line W22. The resulting 1257 BC1S4 RILs were genotyped with 51,544 SNPs, providing a high-density genetic map with a length of 1540 cM. On average, each RIL is 15% homozygous teosinte and 8% heterozygous. We performed joint linkage mapping (JLM) and genome-wide association study (GWAS) for 22 domestication and agronomic traits. A total of 255 QTLs from JLM were identified with many of these mapping to known genes or novel candidate genes. TeoNAM is a useful resource for QTL mapping for the discovery of novel allelic variation from teosinte. TeoNAM provides the first report that PROSTRATE GROWTH1, a rice domestication gene, is also a QTL associated with tillering in teosinte and maize. We detected multiple QTLs for flowering time and other traits for which the teosinte allele contributes to a more maize-like phenotype. Such QTL could be valuable in maize improvement.
Introduction
Recombinant inbred line (RIL) populations are powerful tools for investigating the genetic architecture of traits and identifying the causal genes that underlie trait variation. RIL populations have been widely used in many organisms. In mammals, the well-known Collaborative Cross (CC), consisting of a large panel of multiparental recombinant inbred mouse lines, has been specifically designed for the analysis of complex traits (Churchill et al. 2004). Similarly, the Drosophila Synthetic Population Resource (DSPR), which consists of two sets of RILs, has been designed to combine the high mapping resolution offered by multiple generations of recombination with the high statistical power afforded by a linkage-based design (King et al. 2012). In plants, the maize nested association mapping population (NAM), which crossed 25 founders to a common parent in maize (Yu et al. 2008), has been successfully applied to a large number of traits (Buckler et al. 2009; Tian et al. 2011; Kump et al. 2011). The NAM design has also been utilized to other crops such as barley (Maurer et al. 2015; Nice et al. 2016), rice (Fragoso et al. 2017), sorghum (Bouchet et al. 2017), wheat (Jordan et al. 2018), and soybean (Xavier et al. 2018). In Arabidopsis, another design, called Multiparent Advanced Generation Inter-Cross (MAGIC) population, provides high precision to detect QTLs (Kover et al. 2009; Huang et al. 2011). This design has also been used in wheat (Huang et al. 2012; Mackay et al. 2014), rice (Bandillo et al. 2013), and maize (Dell’Acqua et al. 2015; Xiao et al. 2016).
For the study of maize domestication, many new discoveries were made using a biparental maize-teosinte BC2S3 RIL population. Shannon (2012) performed QTL mapping for 16 traits and examined the genetic architecture of domestication at the whole genome level. This RIL population has also been widely used to fine-map QTL and identify causal or candidate genes for many QTLs including ones controlling seed shattering (Lin et al. 2012), leaf number (Li et al. 2016), kernel row number (Calderón et al. 2016), shoot apical meristem morphology (Leiboff et al. 2016), vascular bundle number (Huang et al. 2016), tassel related traits (Xu et al. 2017b), and nodal root number (Zhang et al. 2018). With this population, several QTL have been fine-mapped to single genes including grassy tillers1 (gt1) for controlling prolificacy (Wills et al. 2013), prolamin-box binding factor1 (pbf1) for kernel weight (Lang et al. 2014), glossy15 (gl15) for vegetative phase changes (Xu et al. 2017a), ZmCCT10 for photo-period response (Hung et al. 2012), and zea agamous-like1 (zagl1) for kernel row number and flowering time (Wills et al. 2017), as well as several more genes regulating flowering time: ZmCCT9 (Huang et al. 2018), Zea mays CENTRORADIALIS8 (ZCN8) (Guo et al. 2018), and ZmMADS69 (Liang et al. 2018). In addition to phenotypic traits, the maize-teosinte BC2S3 RIL population was used for a comprehensive genome-wide eQTL analysis to study the changes in gene expression during maize domestication (Wang et al. 2018).
Despite its utility, the maize-teosinte BC2S3 RIL population has three limitations. First, there is only a single teosinte parent, which cannot broadly represent the diversity of teosinte. Second, this population had two generations of backcross, which produces a background in which some teosinte traits are suppressed and do not segregate among the RILs. Third, the teosinte parent was a wild outcrossed individual which, unlike an inbred line, could not be maintained as a permanent resource.
In this paper, we report the development of a teosinte NAM population (TeoNAM) of 1257 BC1S4 RILs using five teosinte inbred parents crossed with a common maize parent (W22) for mapping QTLs for domestication and agronomic traits. We have genotyped the RILs with 51,544 genotype-by-sequencing (GBS) markers that provide a high-density genetic map. The TeoNAM population captures a large number of recombination events for localizing QTL to genomic locations and the single generation of backcross allows enhanced expression of teosinte traits as compared to the BC2S3 RIL population. We report data for 22 traits but focus our discussion on 9 traits to illustrate the utility of TeoNAM including identifying candidate genes. TeoNAM will be a valuable resource for dissecting the genetic basis of domestication and agronomic traits.
Results
Characterization of a teosinte NAM population
We developed a teosinte NAM population (TeoNAM), which was constructed by crossing five teosinte inbred lines to a maize inbred line W22, followed by one generation of backcross to the common recurrent maize parent and four generations of selfing (Figure S1). The teosinte parents include four Zea mays ssp. parviglumis lines and one Zea mays ssp. mexicana line. As such, TeoNAM encompasses five bi-parental families, each with 219-310 BC1S4-derived recombinant inbred lines (RILs) for a total of 1257 RILs. The number of segregating SNP markers range from 11,395 to 16,109 per family with over 51,000 total SNP markers (Table 1).
The expected segregation for a BC1S4 population is 73.44% homozygous recurrent, 3.13% heterozygous, and 23.44% homozygous donor parent. Overall, the percentage of genotypes observed were 76.6% W22 homozygous, 15% teosinte homozygous and 8.1% heterozygous across all SNP sites in the TeoNAM population (Table 1). The percentage of teosinte varied among subpopulations from 14.2%-16.2% (Table 1) and also varied across the genome in all subpopulations (Figure S2). The observed higher than expected heterozygosity may be due to unconscious selection for more heterozygous plants which had hybrid vigor. The chromosomal region of highest heterozygosity is on the short arm of chromosome 4 near teosinte glume architecture1 (tga1) (Wang et al. 2005). Selection against homozygotes for the teosinte allele of tga1, which have poor ear and kernel quality, may be the cause. For a BC1S4, the expected frequency of the maize allele is 75%. All subpopulations deviate from this with an excess of maize allele (Table 1) and the amount of excess varies across the genome (Figure S3).
We constructed genetic linkage maps for each family and a composite linkage map based on all RILs across all families and identified and annotated 51,544 high confidence SNPs that were used to impute the SNP alleles in the RILs. The composite genetic map based on these markers is 1540 cM in length including 35,880 crossovers. We examined the relationship between genetic distance in cM and physical distance in Mb based on the composite genetic map. The mean value is 0.75 cM/Mb. However, there is a wide deviation from the mean across the genome (0 - 5.52 cM/Mb). As expected, there is suppressed recombination near the centromeres (Figure S2) and frequent recombination near the telomeres where gene density is high as well (Figure S2).
We scored 22 traits for the TeoNAM lines of which 15 traits are domestication related, including vegetative gigantism (CULM, LFLN, and LFWD), prolificacy (PROL), tillering (TILN), ear shattering (SHN), conversion of the inflorescence from staminate to pistillate (STAM), multiple ear-related traits (EB, ED, EL, KRN, KW), glume traits (GLCO and GLUM), and red pericarp color (REPE) (Table 2). Additionally, several agronomic traits were scored including flowering (ASI, DTA, and DTS), plant architecture (PLHT and TBN), barren ear base (BARE), and yellow pericarp color (YEPE). Most traits (ASI, CULM, DTA, DTS, ED, EL, KRN, KW, LFLN, LFWD, PHLT, and TBN) follow approximately normal distribution, suggesting an oligo- or polygenic genetic control of these traits, but other traits (BARE, EB, GLCO, GLUM, PROL, REPE, SHN, STAM, TILN, and YEPE) exhibited a skewed or non-normal distribution. Some of these traits are meristic or discrete traits (e.g. PROL or TILN). A few traits, like STAM, show a two-part distribution with a spike at 0 plus continuous range of values from 0 to 2, which suggest they may be polygenic threshold traits (Figure S4). There are also substantial differences in trait mean among the five subpopulations, indicating underlying differences in genetic architecture among the five teosinte inbreds (Figure S5).
QTL mapping
We used both Joint Linkage Mapping (JLM) and the Genome-Wide Association Study (GWAS) method as two complementary approaches for QTL detection. We also used basic interval QTL mapping for the five individual subpopulations to provide a guide for future work to fine-map the genes underlying the QTL. We detected 255 QTLs for 22 traits by JLM which combines information across all families (Figure 1; Table S1). We detected a total of 150 QTLs by GWAS, among which 57 QTLs overlapped with QTLs by JLM (Table S2). Separate QTL mapping for each subpopulation detected 464 QTLs in total, among which 310 QTLs overlapped with QTLs by JLM (Figure S6-S27; Table S3). Below, we focused on QTL detected by JLM for our characterization of the genetic architecture and the distribution of QTL allelic effects.
Among 22 traits, the number of QTL ranges from 2 to 24; the trait with most QTL is KRN. Genetic architecture varies considerably among traits (Figure 2; Figure S28). Several traits, including BARE, GLCO, GLUM, PROL, REPE, STAM and YEPE, had relatively simple genetic architectures with two to ten QTL including one of large effect. The largest QTL for each of these traits has between 2.1 and 11.7 times the additive effect of the second largest QTL. A second class of traits have a genetic architecture that is either more polygenic (ED, KRN, KW, LFLN, TBN, and TILN) or having only a few QTL of small effect (ASI, CULM, EB, LFWD, PLHT and SHN). For these traits, there was no single large effect QTL that accounts for the majority of the explainable variation. The largest effect QTL for each of these traits has between 1 and 1.8 times the size of the effect of the second largest QTL. A final class of traits has a genetic architecture with both a single QTL of large effect plus multiple QTLs of small effect. These traits include DTA, DTS, and EL. The largest effect QTL for each of these traits is between 2.4 and 3.7 times the size of the second largest QTL.
QTL for agronomic traits
DTA is a classical quantitative trait for maize, and in TeoNAM, it is controlled by a large-effect QTL plus many small-effect QTLs from JLM results. We detected 19 QTLs that explained 68% of the total variance for DTA (Figure 3). Among them, several recently cloned flowering time genes were detected. For example, QTL DTA1.1 mapped to zea agamous like1 (zagl1), which affects flowering time as well as multiple traits related to ear size with the maize allele conferring larger ears with more kernels (Wills et al. 2017). The QTL DTA3.1 mapped to MADS-box transcription factor69 (ZmMADS69), which functions as a flowering activator through the ZmRap2.7-ZCN8 regulatory module and contributes to both long-day and short-day adaptation (Liang et al. 2018). QTL DTA8.1 mapped to ZCN8, which is the maize florigen gene and has a central role in mediating flowering (Meng et al. 2011; Guo et al. 2018). QTL DTA9.1 mapped to ZmCCT9, in which a distant Harbinger-like transposon acts as a cis-regulatory element to repress its expression to promote flowering under the long days of higher latitudes (Huang et al. 2018). QTL DTA10.1 mapped to ZmCCT10, a known gene involved in photoperiod response in maize (Hung et al. 2012; Yang et al. 2013).
In addition to these genes, we also identified several other candidate genes for DTA that have not previously been characterized as genes underlying a QTL. QTL DTA3.2 mapped to Zea mays CENTRORADIALIS12 (ZCN12), which is a potential floral activator (Meng et al. 2011). QTL DTA4.1 mapped to Zea mays MADS19 (zmm19) and DTA5.1 mapped to Zea mays MADS31 (zmm31). QTL DTA6.1 mapped to silky1 (si1), which is also a MADS box gene required for lodicule and stamen identity (Ambrose et al. 2000). QTL DTA6.2 mapped to zea agamous1 (zag1), which is known to affect maize flower development (Schmidt et al. 1993). It’s well known that MADS-box genes encode transcription factors that are key regulators of plant inflorescence and flower development (Theissen et al. 2000). Other than MADS genes, QTL DTA7.2 mapped to delayed flowering1 (dlf1), a floral activator gene downstream of ZCN8 (Meng et al. 2011).
As expected, the teosinte alleles delayed flowering for the QTL that mapped to candidate genes. We plotted the phenotypic difference in DTA between teosinte and maize across the whole genome, and the teosinte genotype is associated with late flowering over most of the genome, even where no QTL were detected, suggesting that there are many additional minor-effect QTLs that were not detected due to insufficient statistical power (Figure S29). Interestingly, chromosome 5 and 7 are exceptions to this pattern with teosinte genotype being associated with early flowering at most sites (Figure S29). Results for DTS are similar to DTA as expected (Figure S30).
TBN is the only tassel trait we scored. We detected 12 QTLs of small effects that explained 52% of the total variance for TBN (Figure S31). Among them, several classical genes were identified. QTL TBN6.1 mapped to fasciated ear4 (fea4), a bZIP transcription factor with fasciated ears and tassels as well as greatly enlarged vegetative and inflorescence meristems (Pautler et al. 2015). QTL TBN6.2 mapped to tasselsheath1 (tsh1), a GATA class transcription factor that promotes bract growth and reduces branching (Whipple et al. 2010). QTL TBN7.1 mapped to ramosa1 (ra1), a C2H2 zinc-finger transcription factor which has tassels with an increased number of long branches as well as branched ears (Vollbrecht et al. 2005). QTL TBN7.2 mapped to tasselsheath4 (tsh4), a SBP-box transcription factor that functions to repress lateral organ growth and also affects phyllotaxy, axillary meristem initiation and meristem determinacy within the floral phase (Chuck et al. 2010). QTL TBN8.1 mapped to barren inflorescence1 (bif1), which shows a decreased production of branches and spikelet pairs (Barazesh and McSteen, 2008). QTL TBN10.1 mapped to zea floricaula leafy1 (zfl1), which together with its homolog zfl2, leads to a disruption of floral organ identity and patterning, as well as to defects in inflorescence architecture and in the vegetative to reproductive phase transition (Bomblies et al. 2003).
QTL for domestication traits
TILN is a classical domestication trait that measures difference in plant architecture between maize and its wild relative, teosinte – that is the low apical dominance of a highly branched teosinte plant as compared to the less-branched maize plant. We detected 18 small-effect QTLs that explained 68% of the total variance for TILN (Figure S32). Among them, QTL TILN1.3 mapped to tb1, a TCP family of transcriptional regulators contributing to the increase in apical dominance during maize domestication (Doebley et al. 1997). Additionally, QTL TILN3.2 mapped to Zea AGAMOUS homolog2 (zag2), a MADS box gene recently found to be downstream of tb1 (Studer et al. 2017). QTL TILN1.1 and TILN5.2 mapped to zmm20 and zmm26, two other MADS box genes that were possible targets of selection during domestication (Zhao et al. 2011). QTL TILN7.1 mapped to PROSTRATE GROWTH1 (PROG1), a C2H2 zinc finger protein controlling a key change during rice domestication from prostrate to erect growth, and also affects plant architecture and yield-related traits (Jin et al., 2008; Tan et al. 2008). There are 13 genes in the support interval and the QTL peak is closest to PROG1, being ~14 kb 5’ of the start site (Figure S32). This is the first evidence that PROG1 may have had a role in maize domestication.
GLUM is classical maize domestication trait measuring the dramatic change from the fruitcase-enveloped kernels of the teosinte ear to naked grains of maize ear. Previously, this trait was shown to be largely controlled by a single gene which is known as teosinte glume architecture1 (tga1) (Wang et al. 2005). Interestingly, tga1 is a direct target of tb1. We detected 11 QTLs that explained 62% of the total variance for GLUM. These QTL include a large effect QTL at tga1 itself plus many small effect QTLs (Figure S33). Among the small effect QTL, our results show that two of them (GLUM2.2 and GLUM7.1) mapped to MADS genes. In this regard, Studer et al. (2017) recently defined a maize domestication gene network in which tga1 regulates multiple MADS-box transcription factors.
PROL is also an important domestication trait that measures difference in prolificacy or the many-eared plants of teosinte and the few-eared (one or two) plants of maize. Previously, a large effect QTL was fine-mapped to a region 2.7 kb upstream of grassy tillers1 (gt1) (Wills et al. 2013). Interestingly, gt1 is a known target of tb1 (Whipple et al. 2011). We detected four QTLs that explained 39% of the total variance for PROL, which include a single large effect QTL plus three small effect QTLs (Figure S34). Concordantly, QTL PROL1.1 mapped to gt1. QTL PROL2.1 mapped to zea floricaula leafy2 (zfl2), which was shown to have a pleiotropic effect on lateral branch number (Bomblies and Doebley, 2006). QTL PROL3.1 mapped to sparse inflorescence1 (spi1), a mutant that has defects in the initiation of axillary meristems and lateral organs during vegetative and inflorescence development in maize (Gallavotti et al. 2008). QTL PROL5.1 mapped to yabby9 (yab9), a class of transcription factor that might play important roles during maize domestication.
STAM measures the proportion of the terminal lateral inflorescence on the uppermost lateral branch that is staminate. Relative to domestication, this trait represents the sexual conversion of the terminal lateral inflorescence from tassel (staminate) in teosinte to ear (pistillate) in maize. Currently, teosinte branched1 (tb1) and tassel replace upper ears1 (tru1) are the only two genes that have been shown to regulate this sexual difference. We detected five QTLs that explained 27% of the total variance for STAM (Figure 4). QTL STAM1.2 mapped to tb1, which is an important domestication gene known for various traits (Doebley et al. 1995). QTL STAM3.1 mapped to tru1 which is a direct target of tb1 (Dong et al. 2017). QTL STAM1.1 mapped to tassel seed2 (ts2), a recessive mutant that produces pistillate spikelets in the terminal inflorescence (tassel) (Irish and Nelson, 1993). QTL STAM3.2 mapped to Zea mays MADS16 (zmm16), which shows high expression in tassel and silk. QTL STAM7.1 mapped to tassel sheath4 (tsh4), a SQUAMOSA PROMOTER BINDING (SBP)-box transcription factor that regulates the differentiation of lateral primordia (Chuck et al. 2010). In addition to these QTLs, two other STAM QTLs were detected by GWAS. Notably, a QTL on chromosome 1 (AGPv4 chr1:234.4-249.9Mb) is located upstream of tb1 and co-localized with STAM1.1 from a recent study (Yang et al. 2018). The known gene anther ear1 (an1) is a strong candidate gene for this QTL since loss of an1 function results in the development of staminate flowers in the ears (Bensen et al. 1995). The tb1 QTL region was also detected by GWAS with a strong signal for interval AGPv4 chr1:264.1-283.1Mb.
SHN measures ear shattering, the loss of which is a key step during crop domestication (Doebley, 2006). Teosinte ears have abscission layers between the fruitcases (modified internodes) that allow the ear to shatter into single-seed units (fruitcase) at maturity. The maize ear lacks abscission layers and remains intact at maturity. Currently, only two maize orthologs (ZmSh1-1 and ZmSh1-5.1+ZmSh1-5.2) of sorghum and rice Shattering1 (Sh1) were verified for seed shattering (Lin et al. 2012). We detected six QTLs that explained 30% of the total variance for SHN (Figure S35). QTL SHN1.1 and SHN5.1 mapped to Sh1.1 and Sh1-5.1/5.2, respectively, confirming prior identification of these maize paralogs of the sorghum shattering gene as strong candidates for our QTL.
KRN is a domestication trait measuring the dramatic change from the two-ranked teosinte ear to multiple (4 or more) ranked maize ear. We detected 24 small-effect QTLs that explained 62% of the total variance for KRN (Figure S36). Among them, QTL KRN1.3 mapped to indeterminate spikelet1 (ids1), an APETALA2-like transcription factor that specifies determinate fates by suppressing indeterminate growth within the spikelet meristem (Chuck et al. 1998). A previous fine-mapping study of KRN using a maize-teosinte BC2S3 RIL population also identified ids1 is a strong candidate for KRN (Calderón et al. 2016). QTL KRN4.2 mapped to unbranched3 (ub3), a SBP transcription factor that has been shown to regulate kernel row number in both mutant and QTL studies (Chuck et al. 2014; Liu et al. 2015).
REPE for reddish-brownish pericarp is a trait that distinguishes teosinte kernels from those of most maize. The role of pigmentation in domestication is complex in that pigment can provide defense against molding and bird predation but can also impart bitterness and astringency (Morohashi et al. 2012). The red (or reddish brown) pigmentation often results from the accumulation of phlobaphenes - a flavonoid pigment (Morohashi et al. 2012). In the absence of the reddish-brown pigment, the kernels are white kernels unless anthocyanins (blue-purple) or carotenoids (yellow-orange) are present. Our results show that QTL REPE1.1 mapped to Pericarp color1 (P1) (Figure S37), which encodes an R2R3 Myb-like transcription factor that governs the biosynthesis of brick-red flavonoid pigments (Grotewold et al. 1994).
Results for 13 additional traits (ASI, BARE, CULM, DTS, EB, ED, EL, GLCO, KW, LFLN, LFWD, PLHT, and YEPE) are reported in supplemental figures and tables (Figure S30, S38-S49; Table S1).
QTL detection and effects
To evaluate the power of QTL mapping using TeoNAM, we summarized the distribution of QTLs detected with significant effects in the different subpopulations. Among 255 QTLs for 22 traits, 246 QTLs (96%) were detected in two or more subpopulations, 186 QTLs (73%) were detected in three or more subpopulations, 83 QTLs (33%) were detected in four or more subpopulations and 29 QTLs (11%) were detected in all five subpopulations (Figure 5A). These percentages are conservative as not all traits were scored in all five subpopulations. If one considers whether the QTL was detected in subpopulations in which it was scored, then 205 QTLs (80%) were detected in at least half of the subpopulations and 39 QTLs (15%) were detected in all subpopulations.
The allelic effect from different teosinte parents were estimated simultaneously by JLM. For most QTLs, the allelic effects from different subpopulations are in the same direction (Figure 5B). For seven traits (EB, GLUM, LFWD, PROL, SHN, STAM, and YEPE), the teosinte genotypes were consistently associated with a teosinte phenotype and the W22 allele with a maize phenotype at all QTLs. For all other traits, cases in which a teosinte allele was associated with the maize phenotype were detected. For example, the teosinte genotype is associated with late flowering at most QTLs for DTA except DTA5.2 and DTA7.1, for which the teosinte genotype consistently contributes to early flowering in at least three subpopulations (Figure 2). Similar results were observed for KRN and EL. The teosinte genotype is associated with fewer kernel row number (KRN) at most QTLs, but there is one QTL (KRN5.1) for which the teosinte genotype is consistently associated with more kernel row number in four subpopulations and also in the BC2S3 population (Figure S36). The teosinte genotype is associated with shorter ear length (EL) at most QTLs, but there are two QTLs (EL4.1 and EL9.1) for which the teosinte genotype is consistently associated with longer ear length in four and two subpopulations, respectively (Figure S43). These QTLs might be worth exploring further for use in maize improvement.
We also observed some interesting results for different teosinte parents. For KW, the teosinte genotype from different subpopulations is associated with reduced kernel weight at most QTLs. Only three QTLs (KW5.3, KW6.2 and KW9.1) are exceptions with one teosinte allele conferring heavier kernels. Interestingly for these three QTLs, the teosinte alleles with effects in the opposite direction are all from the TIL14 subpopulation (Figure S45). Similar results were observed for ED, where the teosinte genotype is associated with a decrease in ear diameter at most QTLs, but the teosinte allele from TIL03 at two QTLs (ED2.1 and ED6.1) is associated with the increase of ear diameter (Figure S42). These results suggest that there are beneficial alleles from teosinte that could be utilized for maize improvement.
Comparing and combining TeoNAM with the BC2S3
We compared TeoNAM with the previous maize-teosinte BC2S3 RIL population. The composite genetic map for TeoNAM is 1540 cM in length. The individual genetic maps based on the five subpopulations have an average length of 1461 cM with a range of 1348-1506 cM. The genetic map for BC2S3 RIL population is 1478 cM in length. Thus, the TeoNAM subpopulations are similar to the BC2S3 RIL population in genetic map length. The median length of homozygous teosinte segment in TeoNAM is 6 Mb. The median length of homozygous teosinte segment in BC2S3 population is 4.8 Mb. The longer segment length for TeoNAM is expected given it had one fewer generations of backcrossing and less opportunity for recombination. The mean number of homozygous teosinte segment in TeoNAM is 3502, and the number of homozygous teosinte segment in BC2S3 is 5745. The total length of teosinte segments for the five subpopulations is 67 GB (W22×TIL01), 87 GB (W22×TIL03), 66 GB (W22×TIL11), 56 GB (W22×TIL14) and 79 GB (W22×TIL25), and the BC2S3 (W22×8759) exceeds this range with 110 GB.
Previously, Shannon (2012) performed a comprehensive interval QTL analysis for 16 agronomic traits in the BC2S3 population and identified 218 QTLs for 16 traits. Among these traits, 14 traits were also scored in TeoNAM population. For the common 14 traits, 168 and 179 QTLs were detected for TeoNAM and BC2S3 population, respectively. The mean QTL support interval across 14 traits for BC2S3 is 5.7Mb, which is significantly smaller than TeoNAM of 17.2Mb (P=2.6E-08) (Figure S50). Among these QTLs, 50 QTLs overlapped between the two populations. For the common QTLs, the mean variance explained by QTL is 3.4% and 2.9% for BC2S3 and TeoNAM, respectively. Thus, there is no significant difference in QTL effect size (P=0.3) (Figure S51).
To maximize the power to detect QTLs, we combined TeoNAM and BC2S3 for eight traits (DTA, ED, EL, KRN, KW, GLCO, GLUM, and TILN) that were measured in all six subpopulations by the exactly same method to perform JLM. Before analysis, we imputed the genotype for BC2S3 at 4578 TeoNAM SNPs according to the flanking markers using the same procedure as for TeoNAM and permuted a new p-value cutoff for statistical significance for each trait. The LSMs from previous analysis (Shannon 2012) were used for JLM. With the combined TeoNAM-BC2S3 data, we detected 184 QTLs for these eight traits, which include 109 QTLs overlapped with TeoNAM, 80 QTLs overlapped with the BC2S3 and 32 novel QTLs not detected in either TeoNAM or the BC2S3 (Table S4). The QTLs with significant allele effects in multiple subpopulations will be good targets for fine-mapping. For future analysis of additional traits, one could combine TeoNAM and the BC2S3 together. The value of this combination is that there is one additional teosinte allele and increased QTL detection power, but the downside is that one would need to assay the BC2S3 population with 866 RILs plus TeoNAM with 1257 RILs.
Discussion
RILs are powerful tools for dissecting complex genetic architecture of different traits and for gene discovery. RILs such as maize NAM population have been successfully used for genetic dissection of many traits (Buckler et al. 2009; Tian et al. 2011; Kump et al. 2011). RILs with the multiple parents greatly increase the power and precision to identify QTLs compared to the traditional bi-parent RIL population. Multi-parent RILs also enable the estimation of allele effects simultaneously from each inbred parent. Our TeoNAM RILs were created by crossing five teosinte inbred parents with a maize inbred parent, but differs from MaizeNAM in that we applied a generation of backcrossing to the maize parent before four generations of selfing. The power and precision of TeoNAM can be shown with several traits. For example, we detected 19 QTLs for DTA, among which many QTLs mapped to recently cloned genes such as ZmCCT10, ZmCCT9, ZCN8, zagl1 and ZmMADS69. QTLs also mapped to some novel candidates such as dlf1, si1, zag1, ZCN12, zmm19 and zmm31, which may have an important role in flowering time regulation.
For RIL populations, both JLM and GWAS are common methods for QTL detection. In this study, we identified 255 QTLs for 22 traits by JLM, and significant peaks were detected at 57 QTLs by GWAS, which suggests that GWAS is less powerful than JLM for mapping QTLs in TeoNAM. Nevertheless, there are a few instances in which GWAS gave evidence of closely linked QTL that were not separated by JLM. For example, we did not identify an1, a strong candidate for STAM QTL on chromosome 1 with JLM possibly because it’s closely linked to tb1 (candidate of QTL STAM1.1), but we detected significant peaks at both an1 and tb1 through GWAS as it tests each SNP independently.
TeoNAM has allowed us to infer distinct genetic architectures for different traits. Traits like PROL and GLUM are controlled by a major effect QTL plus several QTLs of very small effect, while traits like DTA and KRN show more classic polygenic inheritance. These contrasting genetic architectures suggest that evolution during domestication did not always follow the same path. A variant of large effect at one locus with a few other small effect genes allowed naked kernels to evolve from covered kernels, but the more quantitative increase in the number of kernel rows required a larger number of genes with no single gene of substantially greater effect than all others.
In our study, a total of 15 domestication traits and 7 agronomic traits were analyzed. Further fine-mapping and gene cloning will be required to find the causal genes underlying QTLs for these traits. TeoNAM should also be useful for investigating genetic control of many new traits that we did not assay. Morphological traits such as root architecture, shoot apical meristem size, vasculature, and kernel shape can be explored. Also, molecular traits such as gene expression (eQTL), alternative splicing, grain protein content, and metabolites can also be explored to better understand the full spectrum of changes that occurred during maize domestication.
Materials and Methods
Population development
The teosinte NAM population was designed as a genetic resource for studying maize genetics and domestication. Five wild teosinte parents were chosen with four teosinte inbred lines that capture some diversity of Zea mays ssp. parviglumis (TIL01, TIL03, TIL11 and TIL14) and one teosinte inbred line of Zea mays ssp. mexicana (TIL25). The common parent is a modern maize inbred line W22 that has been widely used in maize genetics. The five teosinte parents were crossed to W22, and followed by one generation of backcross and four generations of selfing (Figure S1). We obtained 1257 BC1S4 recombinant inbred lines (RILs) with 223, 270, 219, 235 and 310 lines for W22×TIL01, W22×TIL03, W22×TIL11, W22×TIL14 and W22×TIL25, respectively.
Marker Data
All DNA samples of 1257 lines were genotyped using Genotype-by-Sequencing (GBS) technology (Elshire et al. 2011). The genotypes were called from GBS raw sequencing reads using the TASSEL5-GBS Production Pipeline based on 955,690 SNPs in the ZeaGBSv2.7 Production TagsOnPhysicalMap file (Glaubitz et al. 2014). Then, the raw GBS markers were filtered in each RIL subpopulation using following steps. We first removed sites with minor allele frequencies below 5% and thinned sites with 64 bp apart using “Thin Sites by Position” in TASSEL5 (Bradbury et al. 2007), and then we ran FSFHap Imputation in TASSEL5 separately for each chromosome using the following parameters: backcross (bc), Phet=0.03125, Fillgaps=TRUE, and the default settings for other features. The imputed parental call files from the 10 chromosomes were then combined together and passed to R/qtl (Broman et al. 2003) to estimate genetic map. The B73 reference genome v2 was used to determine marker order, and genetic distances between markers was calculated using the Haldane mapping function as part of the est.map command with an assumed genotyping error rate of 0.001 taking the BC1S4 pedigree of the RIL into consideration (Shannon 2012). Bad genetic markers were identified by visual inspection of the genetic map and removed, then we repeated all filtering steps. Finally, an average of 13,733 high-quality SNPs was obtained for each subpopulation (Table 1).
Field design and phenotyping
The teosinte NAM population was planted using a randomized complete block design at the University of Wisconsin West Madison Agricultural Research Station (UW - WMARS) in different years. The subpopulations W22×TIL01, W22×TIL03, W22×TIL11 were grown in summer 2015 and 2016, the subpopulation W22×TIL14 was grown in summer 2016 and 2017, and the subpopulation W22×TIL25 was grown in summer 2017 with two blocks. We planted one subpopulation within each block, and all lines were randomized within each block. Each row had 16 seeds planted with 1-foot apart, and spacing between any two rows was 30-inch.
Twenty-two traits were scored (Table 2): days to anthesis (DTA) (number of days between planting and when at least half the plants in a plot were shedding pollen); days to silk (DTA) (number of days between planting and when at least half the plants in a plot were showing silk); anthesis-silk interval (ASI) (number of days between anthesis and silk); tassel branch number (TBN) (number of tassel branches on the main stalk); culm diameter (CULM) (diameter of the narrowest plane of main stalk right above the ground); plant height (PLHT) (distance from ground to the topmost node on the main stalk); leaf length (LFLN) (length of a well-developed leaf, usually 4th-6th from top); leaf width (LFWD) (width of a well-developed leaf, usually 4th-6th from top); tiller number (TILN) (number of tillers surrounding main stalk); prolificacy (PROL) (0 vs. 1 for absence/presence of secondary ears at the topmost branch-bearing node on the main stalk); ear branch number (EB) (number of branch on the primary lateral inflorescence); staminate spikelet (STAM) (0-3 scale for spikelet sex on the primary lateral inflorescence, where 0 indicates completely feminized, and 3 indicates completely staminate); kernel row number (KRN) (number of internode columns on the primary lateral inflorescence); ear length (EL) (length of the primary lateral inflorescence); ear diameter (ED) (diameter of the primary lateral inflorescence); kernel weight (KW) (average weight of 50 random kernels from 5 ears); shattering (SHN) (number of pieces into which an ear shattered when dropped to the floor from a height of ~1.8m); barren ear base (BARE) (0-2 scale for lack of kernels at the base of ear, where 0 indicates kernels present at the base, and 2 indicates no developed kernels at the base of the ear); glume score (GLUM) (0-3 scale for glume size, where 0 indicates small and 3 indicates large); glume color (GLCO) (0-4 scale glume color for white through brown); red pericarp (REPE) (0-2 scale for colorless to red pericarp); yellow pericarp (YEPE) (0-2 scale for dull yellow to bright yellow pericarp). The average trait value from two years were used for QTL analysis.
Genetic map construction and marker imputation
A composite genetic map was constructed for the TeoNAM population. The markers from the five RIL subpopulations were combined together into 51,544 unique SNPs, and the missing genotypes were imputed according to the flanking markers. If the flanking markers have same genotypes, the missing genotype was imputed as the same with flanking markers, otherwise left as missing. The imputed genotypes were then passed to R/qtl software to estimate the genetic map.
Since stepwise regression cannot use individuals with missing marker data, we performed a further step to impute missing data around break point as previously described (Tian et al. 2011). First, we transformed genotype to numeric format, in which markers with homozygous W22 parent were coded as 0, markers with homozygous non-W22 parent were coded as 2, and markers with heterozygous genotypes were coded as 1. Markers within breakpoint were imputed according to the genetic distance of flanking two markers. Considering stepwise regression is computationally intensive, we thinned SNPs within 0.1 cM. We finally obtained 4,578 markers for subsequent joint linkage analysis.
Simple QTL mapping
QTL mapping was carried out using a modified version of R/qtl (Broman et al., 2003) which takes into account the BC1S4 pedigree of the RILs (Shannon, 2012). For each trait, a total of 1000 permutation tests were used to determine the significance threshold level for claiming QTLs. After permutation, an approximate LOD score of 4.0 at P < 0.05 was obtained across all traits. With the LOD threshold, simple interval mapping was first fitted using Haley-Knott regression implemented in the scanone command of R/qtl. The multiple QTL model was then applied to search for additional QTL and accurately refine QTL positions using refineqtl and addqtl in R/qtl. The entire process was repeated until significant QTLs could no longer be added. The total phenotypic variation explained by all QTLs was calculated from a full model that fitted all QTL terms in the model using the fitqtl function. The percentage of phenotypic variation explained by each QTL was estimated using a drop-one-ANOVA analysis implemented with the fitqtl function. The confidence interval for each QTL was defined using a 1.5-LOD support interval. To make results comparable among five subpopulations, the composite genetic map was used for QTL mapping.
Joint linkage mapping
To map QTL in the TeoNAM population, a joint linkage mapping (JLM) procedure was performed as previously described (Buckler et al. 2009; Tian et al. 2011). First, a total of 1000 permutation were performed to determine the significance cutoff for each trait. JLM was performed using the stepwise linear regression fixed model implemented by PROC GLMSELECT procedure in SAS software. The family main effect was fit first, and then marker effects nested within families were selected to enter or leave the model based on the permutated P-value using a marginal F-test. After the model was fit with stepwise regression, each marker was dropped from the full model one at a time and a single best marker was refit to improve the overall fit of the model using the remaining QTL as background. A threshold of α=0.05 was used to declare significant allele effects across families within each QTL identified by stepwise regression. The QTL support interval was calculated by adding each marker from the same chromosome of that QTL at a time to the full model. If the p-value of the marginal F-test of the QTL was not significant at the 0.01 level, the flanking marker should be in the support interval for the QTL as the new flanking marker explained the QTL as well as the original marker.
GWAS
A genome-wide association study (GWAS) approach was also used to map QTL in the TeoNAM population. Since GBS produces relatively low-density markers, the 955,690 raw SNPs from GBS pipeline were filtered using a less conservative criteria: MAF>0.01, missing rate < 0.75, and heterozygosity rate < 0.1. After this filtering, 181,404 GBS SNPs were used to run FSFHap Imputation in TASSEL5 separately for each chromosome and subpopulation using the following parameters: backcross (bc), Phet=0.03125, Fillgaps=TRUE, and the default settings for other features. Imputed genotypes were then combined together and SNPs with missing rate more than 0.2 and MAF less than 0.05 across 1257 RILs were removed and a total of 118,838 SNPs were kept and used for GWAS. GWAS was performed using a linear mixed model accounting for population structure (Q) and kinship matrix (K), where Q was computed as the first five principle components and K was calculated using centered IBS method as implemented in TASSEL (Bradbury et al. 2007). The P value below P=0.00001 (LOD=5) was considered as significance threshold following a previous study (Kremling et al. 2018).
QTL candidate analysis
To report the QTL position following the latest genomic version, we used the CrossMap (Zhao et al. 2014) software to uplift the GBS SNP positions from maize B73 reference AGPv2 coordinates to AGPv4 coordinates. QTL candidates were analyzed by checking the gene annotations of genes within QTL support intervals.
Data Availability
Seeds for all 1257 RILs are available through the Maize Genetics Cooperative Stock Center and the SNP genotypes of TeoNAM are available at the Cyverse Discovery Environment under the directory: /iplant/home/shared/panzea/genotypes/GBS/TeosinteNAM/. The genotypes were uploaded with AGPv2 position in the marker name.
Acknowledgements
This research was supported by the US National Science Foundation (NSF) grants IOS 1238014 and China Postdoctoral Science Foundation (2018M640204). No conflict of interest declared. We thank Karl Broman for suggestions on the analyses, and Jesse Rucker, Elizabeth Buschert, Eric Rentmeester, Adam Mittermaier, David Sierakowski, and Brian Schaeffer for assistance with field work and phenotyping.