Abstract
Perennialism is common among the higher plants, yet we know little about its inheritance. To address this, six hybrids were made by reciprocally crossing perennial Zea diploperennis Iltis, Doebley & R. Guzman with three varieties/inbred lines of annual maize (Z. mays L. spp. mays). We specifically focused on the plant’s ability to regrow after flowering and senescence. All the F1 plants demonstrated senescence and regrowth for several cycles, indicating a dominant effect of the Z. diploperennis alleles. The regrowth ability was stably transmitted to progeny of the hybrids in segregation ratios that suggested the trait was controlled by two dominant, complementary loci. Genome-wide screening with genotyping-by-sequencing (GBS) identified two major regrowth loci reg1 and reg2 on chromosomes 2 and 7, respectively. GBS results were validated using a larger F2 population and PCR markers derived from the single nucleotide polymorphisms within the locus intervals. These markers will be employed to select near-isogenic lines for the two loci and to identify candidate genes in the loci in Z. diploperennis.
Significance Statement Our study contributes to our general understanding of inheritance of perennialism in the higher plants. Previous genetic studies of the perennialism in Zea have yielded contradictory results. We take a reductionist approach by specifically focusing on the plant’s ability to regenerate new shoots after senescence without regard to associated traits, such as rhizome formation, tillering or environmental impacts. Using this criterion, inheritance of perennialism in Zea appears to be dominantly and qualitatively inherited. Importantly, our data indicate that there is no major barrier to transferring this trait into maize or other grass crops for perennial crop development, which enhances sustainability of grain crop production in an environmentally friendly way.
Introduction
Perennialism is the phenomenon that a plant can live for more than two years; the ability of doing so is termed perenniality. Plants typically have a life cycle of growth, reproduction (sexual and/or vegetative) and senescence. Annuals and biennials have only one such cycle in their life, leaving behind seeds, bulbs, tubers, etc. to initiate another life cycle. Some perennials maintain juvenile meristematic tissues capable of regrowth after senescence. How perennials do so remains as a mystery. Subterranean stems (such as rhizomes), polycarpy and tuberous roots are often cited as the means by which plants achieve perenniality. However, none of these traits is absolutely required by perennials. For instance, bamboos are essentially monocarpic perennial that regrow from rhizomes. Many perennial temperate grasses, such as switchgrass(1), cordgrass(2) and eastern gamagrass(3), regrow from the crowns instead of rhizomes. On the other hand, some annual/biennial plants, such as radish (Raphanus sativus), grow tuberous roots.
Although perennialism is common among higher plants, the study of its genetics and molecular biology is sporadic. So far, the only published research in molecular mechanism of plant perennialism was conducted in Arabidopsis. Melzer et al. successfully mutated this annual herb to show some perennial habits, such as increased woody fiber in the stem by down-regulating two flowering genes coding for MADS-box proteins, SUPPRESSOR OF OVEREXPRESSION OF CONSTANT 1 and FRUITFUL(4). Unfortunately, this woody mutant was sterile, and no follow-up research was reported. Perennial-related genes and quantitative loci (QTL) have been reported in other species. Major QTL controlling rhizome development, regrowth and tiller number have been mapped on sorghum linkage groups C (chromosome 1) and D (chromosome 4)(5, 6), which are homoeologous to regions of maize chromosomes 1, 4, 5 and 9, respectively(7). Hu at al. mapped two dominant, complementary QTL Rhz2 (Rhizomatousness 2) and Rhz3 that control rhizome production on rice chromosomes 3 and 4 at the loci homoeologous to the sorghum QTL(6). Tuberous roots in a wild perennial mungbean (Vigna radiate ssp. sublobata) are conditioned by two dominant, complementary genes(8). However, after years of effort these perennialism genes have yet to be cloned from any of the species despite that mapping data and complete rice and sorghum genomic sequences are readily available. Therefore, no further research has been reported about these perennialism loci/genes.
In the genus Zea L., most species, including maize, are annual. However, two closely related species, tetraploid Z. perennis [Hitchc.] Reeves and Mangelsdorf and diploid Z. diploperennis Iltis, Doebley & R. Guzman, are perennial. Perenniality of these two teosintes is manifested as regrowth after seed production and senescence, which includes developing juvenile basal axillary buds and rhizomes. Evergreen stalks, bulbils (highly-condensed rhizomes), basal shoot development, stiff stalk and robust root system have all been cited as phenotypic features of perennialism in Z. diploperennis(9–11). For example, evergreen stalks, which was proposed as a component of perennialism in Z. diploperennis(9), appears to be linked to sugary 1 on the short arm of chromosome 4(12).
Conflicting conclusions have been reached in various studies on how perennialism is inherited in Zea. Shaver(13) proposed that a triple homozygous recessive genotype is needed for the perenniality in Zea. In this model, pe (perennialism), interacting with gt (grassy tillers) and id (indeterminate), plays a key role in conferring totipotency to the basal axillary buds and rhizomes in the perennial teosintes(13, 14). The nature of pe remains unknown and the Z. perennis- derived genotype from which pe was identified by Shaver(13) was lost and never recovered despite decades of intensive efforts (Shaver, personal communication). Mangelsdorf and Dunn mapped Pe*-d, the maize allele of the pe homologue in Z. diploperennis, to the long arm of maize chromosome 4(15). The gt gene (aka gt1), located on the short arm of maize chromosome 1, encodes a class I homeodomain leucine zipper that promotes lateral bud dormancy and suppresses elongation of lateral ear branches(12). It appears that gt1 depends on the activity of a major maize domestication gene, teosinte branched 1 (tb1), and is inducible by shading(16). The id gene (aka id1) alters maize’s ability to flower(17). Both tb1 and id1 are located on the long arm of maize chromosome 1 and both encode transcription factors with zinc finger motifs(16, 18). Singleton believed that id1 inhibits plantlet generation at the upper nodes of a maize stalk(17). Mangelsdorf et al. proposed that one or two dominant genes control annual growth habit in their Z. diploperennis-popcorn hybrid(19). Murray and Jessup also believed that non-senescence and rhizomatousness are the must-have characteristics of perennial maize(20).
In contrast to the recessive inheritance model, Galinat proposed that perennialism in Z. diploperennis is at least partially controlled by two dominant complementary genes(12). Also, Ting and Yu obtained three perennial F1 hybrids by pollinating three Chinese field corn varieties with Z. diploperennis(21), which indicate that perennial factors are dominant. Unfortunately, there is no further report about these hybrids or their derivatives. Westerbergh and Doebley regarded perennialism in Z. diploperennis as a quantitative trait and identified a total of 38 QTL for eight perennial-habit traits from a Z. diploperennis x Z. mays ssp. parviglumis (annual) mapping population(11). Intriguingly, they did not identify any QTL that shows a singularly large effect.
The various criteria used by previous researchers for what constitutes perennialism in Zea may have contributed to the complex and contradictory observations. Traits such as rhizome formation and evergreen stalks may be important adaptive features that support the viability of perennial plants but are not key. In this study, we take a reductionist view and specifically focus on a plant’s ability to regrow after senescence. Using this criterion, we have identified two dominant, complementary loci that control this trait. Here we report the results of our genetic analysis and genome-wide screening of these regrowth loci with genotyping-by-sequencing (GBS) technology.
Results and Discussion
The production and growth of the hybrids
To study perennialism in Zea, we made reciprocal crosses of Z. diploperennis (Zd, hereafter in a cross combination) with the following three maize lines: B73, Mo17 and Rhee Flint (RF, hereafter in a cross combination). B73 and Mo17 are inbred lines and Rhee Flint is an heirloom maize variety. The first F1 was made with Rhee Flint in a greenhouse. Rhee flint is small, fast-growing and usually has a few tillers, which affords serial plantings with an increased opportunity of a plant simultaneously flowering with Z. diploperennis. Because Rhee Flint is an open-pollinated variety, later F1s were made with B73 and Mo17 to facilitate molecular analysis. All the F1 plants are perennial and fertile (Fig. 1), and have completed multiple cycles of growth, reproduction and senescence (Supplementary Fig. S1). Regrowth (as opposed to accidental replanting from seed) of F1 plants was insured by inspection that new shoots were attached to the base of the F1 and confirmed by the heterozygosity of polymorphic PCR markers (examples shown in Supplementary Fig. S2). Regrowth of these F1s originates mainly from basal axillary buds after stem senescence in all the crosses (Figs. 1D, 1E, 1F), but it also can occur at upper nodes of the F1s when B13 and Mo17 were used as the parent (Fig. 2C). The plantlets regrowing from the upper nodes, however, can only survive if transplanted into soil. This indicates that the senescent stalks do not function to provide the necessary nutrients to the plantlets. Interestingly, some of the basal regrowth immediately developed into a female (Fig. 2A) or a male (Fig. 2B) inflorescence, or a forest of them (Fig. 2D).
Because the F1 plants and their perennial derivatives are not winter hardy, the regeneration cycles were alternated between the greenhouse and the field (Supplementary Figs. S1 & S3). Interestingly, the ears and kernels of the F1s of the six crosses all were more teosinte-like (i.e. two rows of oppositely positioned spikelets with paired kernels encased by wooden rachides and glumes) when grown in greenhouse but were more maize-like (i.e. multiple rows of naked kernels with short soft glumes and rachides around a silica-filled soft core) when grown in the field (Fig. 3). In the F2 and higher generations, ear morphology segregated even under greenhouse condition (Fig. 3). These observations suggest that environmental factors play an important role in the preferential expression of the teosinte or the maize alleles of the genes influencing ear morphogenesis in the hybrids. These observations also indicate that it is possible to breed perennial maize with maize-like ears and kernels.
Some studies have used rhizome development as an indicator of perennialism in Zea(13, 14, 19, 22). We have not observed rhizomes in any of our F1s and the derived plants; when regrowth occurs, it is always from an axillary bud. Indeed, it is also our observation that the regrowth of Z. diploperennis is mainly from basal axillary buds, and only occasionally from rhizomes. Previous conclusions that perennialism in Zea is recessive might have resulted from the hypothesis that traits such as tiller number at tasseling (TNT) or rhizome development are indispensable components of perennialism in Zea. It is also possible that the perennial teosinte plants used in those studies were heterozygous for one or more perennialism genes. This opinion is supported by the observations of Shaver9 and Camara-Hernandez and Mangelsdorf(18) that some of their F1 plants regrew from basal axillary buds after a period of dormancy.
TNT has been associated with perennialism in several studies(11, 14, 23, 24), so we investigated the relationship of TNT with regrowth in the Zd-RF F2s. One-way ANOVA of TNT by regrowth (Supplementary Table S1), however, revealed no significant difference of TNT (F = 0.897, p = 0.353) between the regrowth and the non-regrowth F2s. Indeed, we observed regrowth from several single-stalked hybrid derivatives (Fig. 4A) and non-regrowth of some multi-stalked plants (Fig. 4B). These results suggest that TNT is not essential to perenniality in Zea.
The genetics of the hybrids
All our F1 plants are perennial and have undergone several growth cycles alternatively in greenhouse and field, demonstrating that regrowth is a dominant trait in Zea. Brewbaker suggested cytoplasm may contribute to perennialism(25), but our reciprocal F1s performed similarly, indicating that it does not. To analyze the genetics of regrowth further, 159 Zd-RF F2s (derived from an F1 where Zd was the female) and 134 B73-Zd F2s (derived from an F1 where B73 was the female) were tested. We did not grow the Mo17-Zd F2s due to limited resources. Among the 159 Zd-RF F2s, 90 regrew after senescence and 69 did not (Supplementary Table S2). Similarly, among the 134 B73-Zd F2s, 81 regrew and 53 did not (Table 1; Supplementary Table S3). Three Zd-RF F3 populations (Supplementary Table S2) and one B73-Zd F3 population (Supplementary Table S3), each of which was derived from a single regrowth F2 plant, were also evaluated for their regrowth.
A chi square (χ2) test of goodness-of-fit suggests that both of the F2 populations and one Zd-RF F3 population best fit the 9:7 regrowth to non-regrowth ratio (Table 2), and the B73-Zd F3 population and two Zd-RF F3 populations best fits a 3:1 ratio (Table 2). The simplest model that explains these results is that regrowth in the F1s and their derivatives is controlled by two dominant, complementary regrowth (reg) loci. The two dominant, complementary gene model parallels what has been found in other species, such as rice (Oryza sativa)(6), Johsongrass (Sorghum halepense)(5,6,26), basin wildrye (Leymus cinereus)(27) and wild mungbean (Vigna radiate ssp. sublobata)(8).
The Zd-RF F1 was also backcrossed to each parental line. All plants from the Zd backcross regrew, while only one of the 20 plants from the RF backcross showed regrowth. Therefore, alternative models, such as one or three dominant complementary genes, are not eliminated but are less probable (Table 2).
We noticed that the number of regrowth plants observed in any generation might be understated, because some plants initially recorded as non-regrowth eventually regrew after about two months of dormancy. It is possible, therefore, that some plants recorded as non-regrowth and discarded to open up greenhouse space may have possessed the ability to regrow. Furthermore, transplanting from the field to the greenhouse and vise versa was very stressful to the plants. It is possible that some regrowth plants were killed this way, resulting in a reduced number of regrowth plants. However, the estimated segregation ratios of regrowth to non-regrowth are reliable since they can be verified. For example, the 9:7 ratio of Zd-RF F2s were verified by the ratios of the Zd-RF F3s derived from single regrowth F2 plants (Table 2).
Rice rhizomatousness gene Rhz2 has been mapped to rice chromosomes 3(6) and sorghum chromosome 1(5, 6,26), which are both homoeologous to parts of maize chromosome 1(7). Also, gt1 and id1, which have been implicated with perenniality in Zea(8), and tb1, which controls gt1(16), are all on chromosome 1 in Zea(16, 18). Therefore, we investigated the allele compositions of these three genes in the B73-Zd F2s (Table 1), and 26 Zd-RF F2 plants and the three Zd-RF F3 populations (Supplementary Table S4), and assayed their association with regrowth. Of the 131 regrowth hybrid derivatives, 5, 33 and 115 were homozygous for the maize gt1, tb1 or id1 alleles, respectively. One Zd-RF F3 family is homozygous for the gt1 allele of Z. diploperennis (Supplementary Table S4) but segregates approximately 9:7 for regrowth and non-regrowth (Table 2). Therefore, our results are inconsistent with the model of Shaver(13), and show that gt1 and id1 do not control regrowth in our F1s and their derivatives. Z. diploperennis’s gt1 allele may be helpful to regrowth because the majority of the plants that regrew had at least one copy, but it is not indispensable because many plants regrew without it.
Interestingly, we observed no heterozygosity for id1 and much less-than-expected heterozygosity for tb1 in all the hybrid derivatives that were examined, regardless of regrowth (Tables 1; Supplementary Table S4). Of the 134 B73-Zd F2 plants investigated, only 16 had the Z. diploperennis id1 allele (Table 1). Similar phenomena were observed in the derivatives of the Zd-RF cross (Supplementary Table S4). It seems that the maize chromosome fragment that carries id1 was preferentially transmitted into the hybrid derivatives. Excess homozygosity of the maize id1 allele indicates some sort of selection. It could be that a deficiency or other rearrangement adjacent to the teosinte id1 allele causes it not to transmit efficiently, or it could be that the teosinte id1 allele causes the plant not to grow well or flower in South Dakota.
Identifying regrowth loci with genotyping-by-sequencing assay
To identify chromosomal regions that host the two regrowth loci revealed by our genetic analysis, we conducted genome-wide mining of single nucleotide polymorphisms (SNPs) in a randomly selected sub-population of 94 (55 regrowth and 39 non-regrowth) B73-Zd F2 plants with GBS technology (Supplementary Fig. S4). A total of 2,204,834 (85.14%) Illumina sequencing tags that passed routine quality control filtrations were aligned with B73 reference genome. A total of 714,158 SNPs were then called from 83 (46 regrowth and 37 non-regrowth, labeled in bold in Table 1) of the 94 F2 plants using TASSEL pipeline(28, 29) (Supplementary Fig. S4). SNP-calling for the excluded 11 plants failed probably due to the failure of barcode addition before sequencing. These SNPs covered all ten chromosomes with an average of 71,416 SNPs per chromosome (Table 4, Supplementary Fig. S5). As shown in Table 4, these SNPs were first subjected to a two-step filtration to remove those with low minor allele frequency (≤ 0.01) or high missing data rate (>20%) among the F2 plants. The SNPs that passed the two filtrations were subjected to a χ2 test for their fit to the two, dominant complementary locus model with the null hypothesis that the observed and the expected are not significantly different (p ≤ 0.05). We hypothesize that a SNP that is associated with one of the two regrowth factors should be carried by all the regrowth F2s but one or both are missing from the non-regrowth F2s. This step kept 946 SNPs that have . Finally, to simplify the mapping effort, the 946 SNPs were filtered once more by collapsing immediately neighboring SNPs that share the same haplotypes into one. The first SNP in such a cluster was chosen to represent the SNP cluster. This final filtering resulted in 597 SNPs with an overall average distance of 3.52 cM between them in the B73 reference genome. The distribution of these 597 SNPs in the B73 genome are shown in Supplementary Figure S5.
We then conducted locus analysis of the 597 SNPs together with additional 1,969 simulated SNPs, using R/qtl package (version 1.40-8) with the “lodint()” arguments with LOD drop unit of 0.5 cM and the “expandtomarkers” arguments. The results are shown in Figure 5. Using the LOD95% threshold of 4.17, two candidate reg loci were identified with one on B73 chromosome 2 in the interval from 24,244,192 bp (here and hereafter, the nucleotide position in the B73 reference sequence) to 28,975,747 bp with the peak at 27,934,739 bp and one on B73 chromosome 7 in the interval from 2,862,253 bp to 6,681,861 bp with the peak at 5,060,739 bp (Fig. 5). This result supports the genetic model that two major genes control regrowth. Table 5 shows the two representative SNPs for the two candidate reg loci on chromosomes 2 (reg1) and 7 (reg2), and the adjacent maize genes in the B73 reference genome.
Genes gt1 and id1 on chromosome 1 were proposed to control perennialism in Zea(13), and our LOD analysis located two weak peaks on chromosome 1, assisting regrowth (Fig. 5). One may wonder if these two loci are related to gt1 and id1, respectively. However, these loci are at 82,273,951 bp and 177,235,112 bp, far away from id1 (around 243,201,405 bp) and gt1 (around 23,625,801 bp). This observation further indicates that id1 and gt1 are irrelevant to regrowth. Previous studies reported that Z. diploperennis carried perennialism-related Pe*-d and an evergreen gene on chromosome 4(12, 15). However, our data could not support these observations since no SNP on chromosome 4 significantly associates with regrowth (Fig. 5).
Validation of the candidate SNPs with genetic mapping
To validate the association of the candidate SNPs with the trait of regrowth, we converted two SNPs at the peaks of the two candidate chromosomal intervals on chromosomes 2 and 7 into PCR markers (Table 6). The markers for the peak SNPs were designated S2-2 and S7-1/S7-2, respectively (see Materials and Methods for an explanation of the names). The 134 B73-Zd F2 population were screened with these PCR markers (Table 1, Supplementary Figs. S6 to S8). The hypothesis is that the PCR markers are linked with the regrowth trait, so a χ2 test of independence was used to test the alternative hypothesis that the markers segregate independently with the regrowth (Table 3). The test results (p≤0.0001) indicated that the SNPs are indeed associated with regrowth.
If the S2-2 and the S7-1/S7-2 markers reliably mark the two dominant complementary loci that are necessary and sufficient for regrowth, then no regrowth plant should be homozygous for a maize allele at either locus and all non-regrowth plants should be homozygous recessive for at least one locus. A review of Table 1 indicates some exceptions: 17 regrowth plants are homozygous of the B73 allele for either S2-2 or S7-1/S7-2 and 16 non-regrowth that have at least one Z. diploperennis allele at both loci. That is 26.8% of the 123 plants that can be scored for genotype/phenotype exceptions. These exceptions do not necessarily negate the two loci hypothesis because both genome-wide screening and genetic analyses reached the same conclusion. Three possible uncontrolled variables may have caused these exceptions: recombination between the marker and the reg locus it represents, mis-scoring of regrowth/non-regrowth phenotypes and mis-scoring of the PCR markers.
Recombination may explain some exceptions, but is unlikely to be a major contributor, considering the narrow ranges of the QTL peaks. We reviewed the SNPs among the 83 B73-Zd F2 plants that were used for the SNP discovery; the estimated maximum rates of recombination between regrowth and the peak SNP represented by S2-2 and S7-2 for the QTL are 0.01% and 0.03%, respectively (Supplementary Table S6). Therefore, recombination should not be an issue here.
Although the criterion for regrowth phenotyping was simple and reliable, there was still opportunity for mis-scoring. Some plants capable of regrowth may have been scored as non-regrowth because of the abnormality of their regrowth (Fig. 2) or because their regrowth may have been delayed or failed due to pre-mature mortality. Anecdotally, at least one “non-regrowth” plant that was discarded was observed later to have emerging shoots. Alternatively, a non-regrowth plant might have been scored as regrowth because of late developing tillers. The variability in morphology and timing of regrowth shoots indicate that modifiers influence this trait. Even so, unusual regrowth and delayed regrowth were the exceptions.
The major contributor to the exceptions is likely the reliability of the PCR markers. For each SNP, primers pair were designed to amplify only one allele. In order to reduce the possibility of amplifying the alternative allele, additional mismatches were incorporated into the primers(30). While avoiding false positives, this increases the rate of false negatives. Out of 134 plants assayed, nine failed to produce a product for either allele using S2-2 (Table 1). An alternative marker for reg1 on chromosome 2, S2-1, had six failures. Disregarding those failures, the apparent genotypes of S2-2 and S2-1 were different 43 times out of 119 comparisons (36%). Therefore, most differences appear to be due to failure of the marker of one allele or the other to amplify. The S7 primers were designed in a similar fashion as the S2 primers and are likely subject to the same problems. Thus, we believe that most of the genotype/phenotype exceptions are due to the imperfections of these markers.
Even so, these PCR markers will be valuable to produce and identify a pair of near-isogenic lines (NILs), each being homozygous dominant for one regrowth locus but homozygous recessive for the alternative. The expectation is that neither NIL is capable of regrowth. Genetic confirmation of the two reg loci will be made by a testcross between the NILs, which is expected to produce progeny that demonstrate regrowth. These NILs will also aid in the cloning the functional genes originating from the Z. diploperennis loci.
In summary, the results presented here indicate that perennialism in Zea, when defined as regrowth of shoots from basal axillary buds after senescence, is inherited dominantly and apparently qualitatively. Using this criterion, the inheritance of perennialism in Zea does not appear to be as complex as previously thought(11, 13, 14, 22). Two regrowth loci, reg1 and reg2, were mapped to chromosome 2 and chromosome 7, respectively. Even though our data point to two controlling factors, the data do not discount that perenniality in Zea is affected by modifiers and environment. Identification and the functional study of the candidate genes for reg1 and reg2 will initiate an understanding about the molecular mechanism of perenniality in Zea L.
Materials and Methods
Plant materials and phenotyping
Zea diploperennis (PI 462368) and Z. mays cv. Rhee Flint (PI 213764) were obtained from the USDA North Central Region Plant Introduction Station, Ames, IA. B73 and Mo17 inbreds were from the collection of D. Auger and are traceable back to the Maize Genetics Cooperation Stock Center, Urbana/Champaign, IL. In our designations of F1s and their derivatives, the female parental is shown first. Plants were grown and controlled pollinations were made in the greenhouse during the winter and in the field during the summer in Brookings, SD. In the greenhouse, plants were maintained with a 16 h-light/8 h-dark cycle and 20/16 °C day/night temperature except to induce the floral transition, when two-month old plants were treated with a 10 h light/14 h dark cycle for four weeks.
Plants were scored as regrowth if they produced shoots from the basal axillary buds after the original stalks senesced. Occasionally, the hybrid-derived plants developed shoots that terminate in ears (“ear forest”) or tassels prior to senescence, these were not scored as regrowth. Rhizome and tuber development were visually investigated on plants that were dug from the soil after senescence. The number of tillers (TNT) per plant was investigated at tasseling stage. Ear and kernel morphology was visually examined and photographed.
PCR assay
DNA samples were isolated from young leaves using the CTAB procedure(31) and used for PCR-based marker assay. PCR assays were done using GoTaq Green Master Mix (Catalog# M7505, Promega, Madison, WI) at the following conditions: 95°C, 35 cycles of 95°C for 45 sec, 55~62°C (primer dependent, see Table 6 for detail) for 1 min and 72°C for 1 min, and 72°C for 10 min. The primer sets used in the assays and their annealing temperatures are list in Table 6. The annealing temperatures were determined using a 1°C-touchdown PCR step starting from 65°C. Several primer sets generate only a dominant marker for either the Z. diploperennis or Z. mays allele, so two primer sets were used in combination to genotype the corresponding locus. This is especially true for the SNP-derived markers S2-1, S2-2, S7-1 and S7-2. In order to reduce the likelihood of false positives, the S2 and S7 primers are designed not to be perfectly complementary to the target sequence(30). This increases the likelihood of false negatives. For each reg locus, the peak SNP and a SNP immediately adjacent to it were chosen for marker development. The marker for the peak SNP of the QTL on the short arm of chromosome 2 is S2-2. A second marker for an adjacent SNP on chromosome 2 was also developed and named S2-1. We could not develop a single PCR marker for both the peak SNP (S7-2) on chromosome 7 QTL so a second one (S7-1) was designed for an immediately adjacent SNP. These two were used in combination as a single marker.
SNP discovery
A GBS assay was conducted according to Elshire et al(28). The preparation and sequencing of the library were conducted by the University of Wisconsin Biotechnology Center (UWBRC). Generally, DNA samples were digested with ApeKI restriction enzyme (RE), and unique barcodes were annealed to each DNA fragments. A single-end 100 bp (1×100bp) sequencing run was carried out on an Illumina HiSeq 2500 platform. The raw data were pooled as a single fastq file and downloaded from UWBRC along with a quality report (FastQC version 0.11.2).
The TASSEL (Trait Analysis by Association, Evolution and Linkage) 3 pipeline was used under the guidance of TASSEL manual(29) for the discovery of SNPs between Z. diploperennis and Z. mays B73 (Supplementary Fig. S4). TASSEL 4 and 5 pipelines were used if command line was compatible. The barcoded sequence reads were collapsed into a set of unique sequence tags with counts. The tag count files were filtered for a minimum count threshold and merged into the master tag count file. B73_RefGen_V4 reference genome sequence was downloaded from MaizeGDB and processed with Bowtie2 for alignment(32). Master tags were aligned to the B73 reference genome to generate a “Tags On Physical Map” (TOPM) file, which contains the genomic position of each tag with the best unique alignment. The occupancies of tags for each taxon were observed from barcodes information in the original FASTQ files. Tag counts were stored in a “Tags by Taxa” (TBT) file. The TOPM and TBT files were used to call SNPs at the tag locations on the genome. The SNPs were filtered by minimum taxa coverage, minimum locus coverage and minimum minor allele frequency. Fastq files containing sequences of chromosomes 1 to 10 were merged by FASTX_Toolkit and indexed. All commands for SNP discovery were executed in Ubuntu 16.04 LTS platform.
SNPs resulted from TASSEL filters plugin with a minimum minor allele frequency of 0. 01 were filtered again by removing sites that had missing data in more than 20% of the F2 plants. For those SNPs that have missing data in less than 20% of the F2 plants, the missing data were imputed by treating them as heterozygote since both two alleles can be embodied and considered to be moderate. SNPs were filtered again with χ2 (p < 0.05). The 4th SNP filter was performed by removing SNPs with positions very close to each other, in the range of 100 bp, and showed the exactly same haplotypes, keeping only the first SNP in the cluster. Thus, such a cluster of SNPs was treated as one locus. By removing the redundant SNPs, locus tests can be more precise because repeated SNP sites would affect the LOD value and influence the interval estimation.
The filtered SNPs were used for candidate locus estimation. The locus analysis was executed by a standard quantitative trait loci (QTL) procedure in R using the R/qtl package (version 1.40-8)(33) to better observe the contribution of each SNP and its neighbors. The R codes are listed in Supplementary Table S5. Position simulation was drawn with a maximum distance of 1.0 cM and an error probability of 1×10-4. The conditional genotype probability (calc.genoprob), as well as simulated genotypes (sim.geno with n.draw=32), were calculated.
The “haldane” function was used to convert genetic distances into recombination fractions. Genome scan with a single locus model (scanone) was performed with a binary model using the expectation-maximization algorithm(33). A permutation test with 1000 replicates was performed in scanone to visualize the LOD thresholds. We determined a locus interval by selecting the first and last SNP sites with significant LOD value. Genes within the intervals were identified by searching the corresponding region on the Gramene website.
Statistical analyses
For statistical analyses, all genotypes and phenotypes were transformed into numeric values. For phenotypes, the regrowth plants were scored as “1” and the non-regrowth plants were scored as “0”. For genotypes, the plants that were homozygous to the Z. diploperennis allele were scored as “1”; those that were homozygous to the B73 allele were scored as “2”; and those that were heterozygous were scored as “3”. When conducting locus analysis, genotype “1” was transformed to “AA”, “2” to “BB” and “3” to “AB”.
A chi square goodness-of-fit test was used to find the best-fit model or linkage in the genetic analysis and reveal candidate SNPs. To determine if TNT has any correlation with regrowth, a One-Way ANOVA of TNT by regrowth was performed in JMP (JMP® 11.2.0).
Sequencing Data availability
All raw fastq data from this study are available at NCBI data deposition site (https://www.ncbi.nlm.nih.gov/bioproject/) with accession number PRJNA477673.
Author Contributions
Y.Y. designed and supervised this project and all the experiments, and drafted the manuscript; Y.Q., A.M, T.R., B.P., A.G., Y.Z., Y.Y. & D.A. performed the experiments and collected data; Y.Q., A.M., T.R., D.A. & Y.Y. analyzed the data; all authors discussed the results and communicated on and approved the final manuscript.
Competing financial interests
The authors declare no competing financial interests.
Acknowledgement
This research was partially supported by funds from USDA-NIFA via South Dakota Experiment Station and Department of Biology and Microbiology, South Dakota State University. We greatly appreciate Dr. Frank M. You of Agriculture and Agri-Food Canada for his help in statistics.