ABSTRACT
The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitates functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake (Oryza sativa L. ssp. japonica), which completes its life cycle in 9 weeks. We sequenced 1,504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. Using this mutant population, we identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line with a small segregating population. This result reveals the usefulness of the resource for efficient identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks, enabling rapid functional genomic studies of rice.
One-sentence summary We have sequenced 1,504 mutant lines generated in the short life cycle rice variety Kitaake (9 weeks) and established a publicly available database, enabling rapid functional genomic studies of rice.
INTRODUCTION
Rice (Oryza sativa) provides food for more than half of the world’s population, making it the most important staple crop (Gross and Zhao, 2014). In addition to its critical role in global food security, rice also serves as a model for studies of monocotyledonous species including important cereals and bioenergy crops (Izawa and Shimamoto, 1996). For decades, map-based cloning has been the main strategy for isolating genes conferring agronomically important traits (Peters et al., 2003). In Arabidopsis and other model plant species (Alonso et al., 2003; Cheng et al., 2014; Li et al., 2016c), indexed mutant collections constitute highly valuable genetic resources for functional genomic studies. In rice, multiple mutant collections have been established in diverse genetic backgrounds including Nipponbare, Dong Jin, Zhonghua 11, and Hwayoung (Wang et al., 2013b; Wei et al., 2013). Rice mutants have been generated through T-DNA insertion (Jeon et al., 2000; Chen et al., 2003; Sallaud et al., 2003; Wu et al., 2003; Hsing et al., 2007), transposon/retrotransposon insertion (Miyao et al., 2003; Kolesnik et al., 2004; van Enckevort et al., 2005;Wang et al., 2013b), RNAi (Wang et al., 2013a), TALEN-based gene editing (Moscou and Bogdanove, 2009; Li et al., 2012), CRISPR/Cas9 genome editing (Jiang et al., 2013; Miao et al., 2013; Xie et al., 2015), chemical induction, such as ethyl methanesulfonate (EMS) (Henry et al., 2014), and irradiation (Wang et al., 2013b; Wei et al., 2013). Several databases have been established to facilitate use of the mutant collections (Droc, 2006; Zhang, 2006; Wang et al., 2013b). These approaches have advanced the characterization of approximately 2,000 genes (Yamamoto et al., 2012). The usefulness of these rice mutant collections has been hindered by the long life cycle of the genetic backgrounds used (i.e. 6 months) and the lack of sequence information for most of the mutant lines. To address these challenges, we recently established a fast-neutron (FN) mutagenized population in Kitaake, a model rice variety with a short life cycle (9 weeks) (Li et al., 2016b). Here we report the sequence of 1,504 individual lines. We anticipate that the availability of this mutant population will significantly accelerate rice genetic research.
FN irradiation induces a diversity of mutations that differ in size and copy number, including single base substitutions (SBSs), deletions, insertions, inversions, translocations, and duplications (Belfield et al., 2012; Bolon et al., 2014; Li et al., 2016b), in contrast to other mutagenesis approaches that mostly generate one type of mutation (Thompson et al., 2013; Wang et al., 2013b). It generates a broad spectrum of mutant alleles, including loss-of-function, partial loss-of-function and gain-of-function alleles that constitute an allelic series, highly desirable for functional genomic studies. In addition, FN irradiation induces subtle variations, such as SBSs and in-frame insertions/deletions (Indels), which facilitate the study of protein structure and domain functions (Li et al., 2016b). Finally, FN irradiation induces abundant mutations in noncoding genomic regions that may contain important functional transcription units such as microRNAs (Lan et al., 2012) and long noncoding RNAs (Ding et al., 2012). The availability of a FN-induced mutant population with these unique characteristics greatly expands the mutation spectrum relative to other collections and provides researchers the opportunity to discover novel genes and functional elements controlling diverse biological pathways.
Whole-genome sequencing (WGS) of a mutant population, and pinpointing each mutation at a single-nucleotide resolution using next-generation sequencing technologies is an efficient and cost-effective approach to characterize variants in a mutant collection, in contrast to targeting induced local lesions in genomes (TILLING) collections, for which researchers must scan amplicons from a large set of mutants for each use (McCallum et al., 2000). Another commonly used approach to characterize a genome is whole-exome sequencing (WES) (Krasileva et al., 2017). Though it is relatively low-cost, WES does not cover most noncoding regions that potentially contain important functional elements such as microRNAs. Furthermore, WES is unable to identify balanced variants, including inversions and translocations, which are commonly induced by FN irradiation (Biesecker et al., 2011; Li et al., 2016b). Finally, WGS gives more accurate and complete genome-wide variant information than WES, even for the exome (Belkadi et al., 2015). Fully sequenced mutant collections are particularly useful for crops, which have inefficient transformation, and require more time and space for genetic analyses compared to model organisms (Barampuram and Zhang, 2011). Among major crops, rice has the smallest genome (~389 Mb) (Michael and Jackson, 2013), making it the most amenable to WGS, especially with the low cost afforded by sample multiplexing.
In this study, taking advantage of the established FN mutant collection in Kitaake (Li et al., 2016b), we whole-genome sequenced 1,504 lines, identified 91,513 mutations affecting 32,307 genes (58% of all genes in the rice genome) and established the first WGS mutant collection in rice. To facilitate the use of this mutant collection, we established an open access resource called KitBase, which integrates multiple bioinformatics tools and enables users to search the mutant collection, visualize mutations, download genome sequences for functional analysis and order seed stocks.
RESULTS
Genome Sequencing
We sequenced 1,504 mutagenized lines, including 1,408 M2 lines and 96 M3 lines using the Illumina high-throughput sequencing technology, and characterized mutations in these lines. To facilitate downstream analysis, genomic DNA was isolated from a single plant of each line. High-throughput sequencing was performed using the Illumina Hiseq 2000 system, and the resultant sequence reads were mapped to the Nipponbare reference genome using BurrowsWheeler Aligner-Maximal Exact Match algorithm (BWA-MEM) (Li, 2013). On average, 183 million paired-end reads (18.6 Gb) were obtained for each line (Table 1 and Supplemental Data Set 1), and 170 million high-quality reads (93% of the raw reads) were mapped onto the reference genome, giving an average sequencing depth of 45.3-fold for each line. The high sequencing depth of these rice mutant lines facilitated detection of different types of variants.
Genomic Variants Detected in the 1,504 Mutant Lines
We used an established variant-calling pipeline containing multiple complementary programs to call variants in each rice line, filtering out variants present in the parental line and those found in two or more rice lines (see Methods). A total of 91,513 FN-induced mutations were detected in the 1,504 rice lines, including 43,483 single base substitutions (SBSs), 31,909 deletions, 7,929 insertions, 3,691 inversions, 4,436 translocations, and 65 tandem duplications (Figure 1 and Supplemental Data Set 2). The largest inversion is 36.8 Mb, the largest tandem duplication 4.2 Mb, and the largest deletion 1.7 Mb (Supplemental Figure 1). To assay the false positive rate, we randomly selected 10 lines and examined all of their mutations (Supplemental Data Set 3). Out of 638 mutation events, we identified 30 false positives (4.7%), indicating that our variant-calling pipeline is robust. 60% of these false positives are either SBSs or small Indels (<30bp), mostly in the polynucleotide or repetitive regions. Only 4 false positives out of 638 mutations events (0.6%) are in coding regions, indicating the minimal impact of false positives on mutated genes.
Among the 91,513 mutations, SBSs are the most abundant variants, accounting for 48% of mutation events. We identified 48,030 non-SBS mutations, of which deletions account for 66%. Small deletions make up the majority of all deletion events: deletions smaller than 100 bp account for nearly 90% of all deletions (Table 2). There are 7,469 single base deletions, accounting for 23% of all deletion events. The average deletion size is 8.8 kb.
To analyze the distribution of mutations in the genome, all mutations from the sequenced lines were mapped to the reference genome (Figure 2). We found that the FN-induced mutations are distributed evenly across the genome, except for some repetitive regions with low mapping quality reads or no read coverage caused by the inability to confidently align the reads to the reference. Many translocations were identified in the mutant population, shown by the connecting lines (Figure 2E). The density of translocations is similar on each chromosome, ranging from 20.4/Mb to 26.8/Mb (Supplemental Table 1). The genome-wide mutation rate of the Kitaake rice mutant population is 245 mutations/Mb. The even distribution of FN-induced mutations is similar to the distribution of mutations generated through chemical mutagenesis of sorghum and Caenorhabditis elegans (Thompson et al., 2013; Jiao et al., 2016).
Genes Affected in 1,504 Mutant Lines
Genes affected by FN-induced mutation were identified using an established pipeline (see Methods). A total of 32,307 genes, 58% of all 55,986 rice genes (Kawahara et al., 2013) are affected by different types of mutations (Figure 1 and Supplemental Data Set 4). Deletions affect the greatest number of genes, 27,614, accounting for 70% of the total number of affected genes. SBSs, constituting the most abundant mutation, only affect 4,378 genes (11%). Inversions, translocations, and duplications affect 2,230, 2,218, and 2,378 genes, respectively.
To test whether the affected genes are biased with respect to a particular biological process, we used gene ontology (GO) analysis to classify all affected genes into major functional categories (Ashburner et al., 2000; Du et al., 2010). As expected, the selected biological process categories “DNA metabolic process”, “protein modification process”, and “transcription” have the most hits and show similar percentages to the mutation saturation (58%) (Supplemental Table 2 and Supplemental Figure 2). We observed that the terms of “DNA metabolic process” and “cellular component organization” show slightly higher percentages within the biological process category, whereas “photosynthesis”, and “transcription” show much lower percentages (Supplemental Table 2). Core eukaryotic genes are highly conserved and are recalcitrant to modifications (Parra et al., 2008). We analyzed a set of core eukaryotic genes and showed that 40% of these analyzed are affected, mostly by heterozygous mutations (Supplemental Data Set 5). Taken together, these results suggest that, although FN-induced are evenly distributed across the genome in the mutant population, the affected genes are biased against mutations in core gene functions.
FN-Induced Mutations in Each Rice Line
To assess the overall effect of FN irradiation in each sequenced line, the mutations and genes affected in each line were calculated (Supplemental Data Set 1). On average, each line contains 61 mutations. The distribution of the number of mutations per line corresponds to a normal distribution (Figure 3). Of the 1,504 lines, 90% have fewer than 83 mutations per line (Figure 3). The average number of genes affected per line is 43 (Supplemental Data Set 1). The variation of affected genes per line is greater than that of mutations per line (Table 3), due to the presence of large mutation events (Supplemental Data Set 4). For example, line FN-259 has the most genes affected (681 genes) in this mutant population, largely due to the 4.2 Mb tandem duplication that affects 667 genes (Supplemental Data Set 4). However, 76% of the mutated lines contain no more than 50 mutated genes per line (Table 3). Only 10% of the mutated lines contain more than 100 affected genes. The relatively low number of mutations per line for most lines in the Kitaake rice mutant population facilitates downstream cosegregation assays.
Loss-of-Function Mutations
A large number of loss-of-function mutations were identified in this mutant population. Loss-of-function mutations completely disrupt genes. They are of considerable value in functional genomics because they often clearly indicate the function of a gene (MacArthur et al., 2012). To identify loss-of-function mutations from the Kitaake rice mutant population, we adopted the definition as described (MacArthur et al., 2012) with minor modifications: we included mutations affecting start/stop codons and intron splice sites as well as mutations causing frameshifts, gene knockouts or truncations (See Methods). There are 28,860 genes affected by loss-of-function mutations (Figure 4 and Supplemental Data Set 6), accounting for 89% of the genes affected in this mutant population and 52% of all rice genes in the genome. The 344 genes affected by loss-of-function SBSs account for 1% of all genes mutated by all loss-of-function mutations. In contrast, loss-of-function deletions disrupt 26,822 genes, accounting for 84% of genes mutated by loss-of-function mutations. Inversions and translocations disrupt 2,230 and 2,218 genes, respectively. These results explicitly show that FN irradiation induces a high percentage of loss-of-function mutations and that deletions are the main cause.
Loss-of-function mutations affecting a single gene allow straightforward functional genomic analysis. We analyzed genes affected by these mutations and cataloged them according to the effect of the mutation, and identified 8,221 such genes (Table 4 and Supplemental Data Set 7). Frameshifts and truncations, mostly a result of deletions, inversions and translocations, account for 96% of the genes, which indicates the importance of these non-SBS variants.
FN-Induced Single Base Substitutions
To draw comparisons between the FN-induced and EMS-induced mutant populations, we conducted a detailed analysis of SBSs. There is an average of 29 SBSs per line (Supplemental Figure 3). Ninety percent of our lines contain between 10 and 50 SBSs per line. There are 118 SBSs in mutant FN1423-S, the highest number of SBSs per line in the mutant population. SBSs are evenly distributed in the genome (Supplemental Figure 4), similar to the EMS-induced mutant populations (Thompson et al., 2013; Jiao et al., 2016). 37.9% of SBSs map within genes and 62.1% to intergenic regions (Supplemental Table 3). Of the genic SBSs, 17.3% are within exons, 17.4% within introns, 3.2% within untranslated regions (UTRs), and 0.1% at canonical splice sites (GT/AG). Non-synonymous SBSs, which represent 12.4% of all SBSs, are found in 4,378 genes (Supplemental Data Set 4). Of these, 11.5% cause missense mutations, 0.8% cause nonsense mutations, and 0.1% result in readthrough mutations (Supplemental Table 3).
The amino acid changes of the three mutant populations were further analyzed using heat maps (Figure 5A). The amino acid changes of the FN-induced Kitaake rice mutant population are relatively evenly distributed, compared to the two EMS-induced mutant populations (Figure 5B, C). The differences are due to the less biased nucleotide changes of the FN-induced mutant population compared to the two EMS-induced mutant populations (Figure 5D). The frequency of the most common GT>AC nucleotide changes in the FN-induced mutant population is 42.5%, half that in the EMS-induced population (88.3%) (Henry, 2014) (Figure 5D). All possible amino acid changes caused by a single nucleotide change are present in the FN-induced mutant population (Figure 5A). Alanine to threonine or valine changes show a much higher frequency, 4.5% and 4.3%, respectively, compared to the average amino acid change frequency of 0.7%. Alanine to threonine or valine changes occur so often because these three amino acids are all encoded by four codons, and a single nucleotide change (GT>AC), the most common nucleotide changes in the mutant population, is enough to change the amino acid (Figure 5E). Similar patterns are found in the two EMS-induced mutant populations (Thompson et al., 2013; Jiao et al., 2016). Some amino acid changes occur infrequently, because the occurrence frequency of these amino acids is low in rice (Itoh et al., 2007) and/or a single GT>AC change may not be sufficient to cause the amino acid change. The results demonstrate that FN irradiation induces diverse amino acid changes at higher frequencies than EMS treatment and that FN irradiation can result in amino acid mutations rarely achieved by chemical mutagens.
An Inversion in Mutant FN1535 Cosegregates with the Short Grain Phenotype
Grain shape is a key determinant of rice yield (Huang et al., 2013). When growing the mutated lines, we observed that line FN1535 produces significantly shorter grains compared to the parental line (Figure 6). The mutant is also dwarfed and shows a much shorter panicle. In a segregating population, we observed 34 normal plants and 13 short-grain plants, a 3:1 ratio. A goodness-of-fit test based on χ2 analysis of the phenotypic ratio revealed that the observed values are statistically similar to the expected values, indicating that the short-grain phenotype is likely caused by a recessive mutation. Next, we identified all mutations in line FN1535. We identified 76 mutations, including 26 SBSs, 38 deletions, 10 insertions, and 2 inversions (Supplemental Data Set 2). These mutations affect seven non-transposable element (TE) genes (Supplemental Table 4). To identify which mutation is responsible for the short-grain phenotype, we prioritized them based on their putative loss-of-function effects and predicted functions of the affected genes. We prioritized a 37 kb deletion on chromosome 7 that affects 5 genes, an inversion on chromosome 5 affecting one gene, and a SBS on chromosome 6 that affects one gene. Using the segregating population of 50 plants, we found that the inversion on chromosome 5, not the chromosome 7 deletion or the chromosome 6 SBS, cosegregates with the phenotype (Figure 6D and Supplemental Figure 6). We analyzed the causative inversion in detail. One breakpoint of the inversion is in the fourth exon of gene LOC_Os05g26890, which truncates the gene (Figure 6E). The other breakpoint of the inversion is not in the genic region. This gene, named Dwarf 1/RGA1, was previously isolated using a map-based cloning strategy (Ashikari et al., 1999). Gene Dwarf 1/RGA1 encodes a Gα protein, which is involved in gibberellin signal transduction (Ueguchi-Tanaka et al., 2000). Mutations in gene Dwarf 1/RGA1 cause the dwarf and short-grain phenotypes (Ashikari et al., 1999). Identical phenotypes were observed in line FN1535 (Figure 6). These results demonstrate that we can rapidly pinpoint the genetic lesion and gene conferring a specific phenotype using a small segregating population of the mutant line.
Access to Mutations, Sequence Data and Seed Stocks
Publicly available access to high-throughput resources are essential for advancing science (McCouch et al., 2016). To make the mutant collection and associated data available to users, we established an open access web resource named KitBase (http://kitbase.ucdavis.edu/) (Figure 7). KitBase provides the mutant collection information, including sequence data, mutation data, and seed information for each rice line. Users can use different inputs, including gene IDs, mutant IDs, and DNA or protein sequences to search and browse KitBase (Figure 7A). Search with DNA or protein sequences will be carried out with the standalone BLAST tool (Deng et al., 2007). Both MSU LOC gene IDs and RAP-DB gene IDs (Kawahara et al., 2013; Sakai et al., 2013) can be used in searching the database. Mutations are visualized using the web-based interactive JBrowse genome browser, in which different symbols are used to indicate different types of mutations at the corresponding locations. Users interested in a particular region of the genome can browse all the mutations from KitBase in that region (Figure 7B). This visual approach enables users to identify multiple allelic mutations and elucidate gene function quickly. Mutation information for each line can be downloaded from KitBase. The original sequence data and primary mutation data of lines in KitBase can be accessed through the National Center for Biotechnology Information (NCBI) and the Joint Genome Institute (JGI) (Supplemental Data Set 1). A seed request webpage was set up for seed distribution with a minimal handling fee. The seed distribution is currently subsidized by the Department of Energy via the Joint BioEnergy Institute. The user-friendly genetic resources and tools in this open access platform will facilitate rice functional genomic studies.
DISCUSSION
We describe a new resource that facilitates functional genomic studies of rice. A key technical feature of our mutant collection is the low level of mutagenesis (Li et al., 2016b). There is an average of 61 mutations per line (Figure 3), which means that only a small segregating population is needed to identify the causative mutation, for example, 50 plants as demonstrated by our study of the short-grain phenotype. Similar approaches have been used in Arabidopsis and other organisms to clone genes from WGS lines with a small population (Schneeberger, 2014; Li et al., 2016a). In contrast, a large segregating population is required to identify the causative mutation using conventional genetic mapping approaches. Our population requires 0-1 round of backcross. In contrast, some heavily mutagenized populations that carry thousands of SBSs in each mutant line require multiple rounds of time-consuming backcrosses to clean up the background of the line (Jiao et al, 2016). Because we sequenced a single plant instead of pooled samples, users can readily identify segregating populations to pinpoint the mutation responsible for the phenotype often without carrying out backcrossing. We estimate that 67% of all mutations in the M2 sequenced lines are heterozygous. For these heterozygous mutations, the progeny seeds available in KitBase can be directly used for cosegregation analysis. For homozygous mutations (33% of detected mutations), the sibling plants of the sequenced lines or progeny of their sibling plants that carry the corresponding heterozygous mutations can be used for cosegregation analysis (Figure 7), which significantly expedites genetic analysis. Users can also backcross the mutant to the parental line to create segregating progeny if needed. Compared to other sequence-indexed mutant populations including the T-DNA or Tos17 populations, WGS detects all possible variants, regardless whether the variant is induced or spontaneous, tagged or not, which avoids the problem of somatic variants going undetected even when the tag is clearly identified in some mutant populations (Wang et al., 2013b). The public availability of the mutant population in the early flowering, photoperiod insensitive Kitaake variety will lower the threshold for researchers outside the rice community to examine functions of their gene of interest in rice.
FN irradiation induces a high proportion of loss-of-function mutations, which means that a relatively small population is needed to mutate all the genes in the genome. In 1,504 mutated lines, 89.3% of all the affected genes are mutated by loss-of-function mutations (Figure 4). In comparison, only 0.2% of the EMS-induced mutations are annotated as loss-of-function mutations in the sequenced sorghum population (Jiao et al., 2016). 80,000 T-DNA insertion rice lines are needed to reach the same mutation saturation level (58%), without taking into account that T-DNA insertions are biased to certain genomic regions (Wang et al., 2013b). Many screens can only be performed when plants are mature, such as yield-related traits (Figure 7A); this means a serious delay when a variety with a long life cycle is used. The Kitaake rice mutant population enables researchers to do studies and complete screens on a relatively small population in a much shorter time. These features make it easier for researchers to conduct studies on complex traits like yield and stress tolerance, which were once too time- and labor-intensive. In addition, with FN-induced loss-of-function mutations, researchers also avoid the variation in knockdown efficiency or off-target issues with approaches such as RNAi or CRISPR-Cas9 (Peng et al., 2016).
Structural variants (variants>1 kb) are known to be the cause of some human diseases, such as the well-known Down and Turner syndromes, and are associated with several cancers (Weischenfeldt et al., 2013; Carvalho and Lupski, 2016). Limited studies in plants show that structural variants contribute to important agricultural and biological traits, like plant height, stress responses, crop domestication, speciation, and genome diversity and evolution (Lowry and Willis, 2010; Huang et al., 2012; Saxena et al., 2014; Zmienko et al., 2014; Zhang et al., 2015; Zhang et al., 2016). However, the study of structural variants in plants is still challenging because they are often identified in different plant varieties/accessions, and the numerous variants between varieties/accessions complicate the study of function of a specific structural variant (Saxena et al., 2014; Zhang et al., 2016). Our Kitaake rice mutant population provides structural variants in the same genetic background, with only a few of structural variants per line, significantly facilitating the study of the function and formation of structural variants in plants (Supplemental Data Set 2).
One limitation of this Kitaake rice mutant population is that large deletions cause loss of function of many genes at once. Although such large deletions are important in achieving saturation of the genome and are valuable in screens, they also pose challenges. A large deletion is likely homozygous lethal, and lethality makes it hard to study genes in the large deletion. In addition, if a large deletion is identified as the causative mutation, determining which gene causes the phenotype requires multiple complementation tests (Wei et al., 2013; Chern et al., 2016). However, as more mutagenized rice lines are collected, multiple lines carrying independent mutations of the same gene will allow researchers to quickly identify the gene associated with the phenotype (Henry et al., 2014). Another approach is to search other mutant collections to identify mutations in individual genes and connect the gene with the phenotype. Another deficit of the current mutant population is the lack of enough mutant alleles in core eukaryotic genes and genes involved in “photosynthesis” and “developmental process” (Supplemental Table 2 and Supplemental Data Set 5), which is likely due to the lethality of these genes and the high portion of loss-of-function mutations induced by FN irradiation. Other rice mutant collections, for example, the EMS-induced mutant populations, would be complementary on this aspect by providing alleles with less severe effects on these genes (Krishnan et al., 2009; Henry et al., 2014). Though we have sequenced the rice lines at a high depth (45-fold), it is still challenging to accurately call dispersed duplications that might result from imbalanced translocations; therefore we include only tandem duplications. Owing to the nature of variant calls made by the algorithms we used, the genotype (homozygosity/heterozygosity) of large structural variants is not included. However, users can use tools such as IGV (Robinson et al., 2011) to obtain the genotype information with available mutant files from KitBase (Figure 6). Cost is another factor to consider when using WGS in profiling variants in a population, though this consideration is not specific to the Kitaake mutant population. It still initially requires a considerable investment when establishing a WGS population but the price of sequencing has dropped dramatically with the technological improvement (Goodwin et al., 2016). One approach to alleviate the financial challenge is through community collaboration, as a WGS population greatly benefits every researcher in that community.
A systematically phenotyped WGS mutant population is highly desirable for functional genomic studies and can rapidly bridge the genotype-to-phenotype knowledge gap. The Kitaake rice mutant population we describe in this study paves the way toward the genomics-phenomics approach in functional genomics. The recently developed high-throughput phenotyping platform makes it feasible to conduct large-scale phenotyping in rice (Yang et al., 2014). We anticipate that adding systematic phenotypic data to these WGS lines will significantly boost the utilization of the mutant collection in this model rice variety. Pairing our genomics resource with a high-throughput phenomics platform will greatly expand the capacity of researchers in rice functional genomic studies.
This study provides a cost-efficient and time-saving open access resource to gene discovery in a short life cycle rice variety by integrating physical mutagenesis, WGS, and a publicly available online database. With the WGS approach, crops are advantageous compared to some mammalian systems, because a sufficiently large mutagenized population can be easily generated and maintained as seed stocks at a low cost, and the mutagenized lines can be directly planted and screened on a large scale in the field. Furthermore, as physical mutagenesis is not considered a transgenic approach, mutants with elite traits from the screens can be directly used in breeding. Given the close phylogenetic relations of rice to other grasses (Devos and Gale, 2000), this resource will also facilitate the functional studies of other grasses, such as cereals and candidate bioenergy crops (Yuan et al., 2008).
Supplemental Data
The following materials are available in the online version of this article.
Supplemental Data Set 1. Genome Sequencing Summary of Rice Plants Used in This Study.
Supplemental Data Set 2. Mutations Identified in the Kitaake Rice Mutant Population.
Supplemental Data Set 3. Mutations Selected for Validation.
Supplemental Data Set 4. Genes Affected in the Kitaake Rice Mutant Population.
Supplemental Data Set 5. Core Eukaryotic Genes Affected in the Kitaake Rice Mutant Population.
Supplemental Data Set 6. Genes Mutated by Loss-of-Function Mutations.
Supplemental Data Set 7. Genes Mutated by Loss-of-Function Mutations Affecting a Single Gene.
AUTHOR CONTRIBUTIONS
GL, MC, and PR participated in the design of the project, coordination of the project, and data interpretation. GL, RJ and PR drafted and revised the manuscript. MC developed and maintained the mutagenized population. GL, RJ, NP, MC, JM, TW, WS, AL, KJ, JL, PD, RR, DR, DB, YP, KB, and JS performed the sample preparation and sequencing and participated in in-house script development and statistical analyses. All authors read and approved the final manuscript.
ACKNOWLEDGMENTS
We thank Patrick E. Canlas, Shuwen Xu, Li Pan, Kira H. Lin, Rick A. Rios, Anton D. Rotter-Sieren, Hans A. Vasquez-Gross, Maria E. Hernandez, Furong Liu, Anna Joe, and Natasha Brown for assistance in genomic DNA isolation and submission, seed organization and data processing, and Drs. Catherine Nelson, Jenny C Mortimer and Brittany Anderton for critical reading of the manuscript. We also thank Drs. Chongyun Fu, Jiandi Xu, and other Ronald lab members for insightful discussions. This work was part of the DOE Joint BioEnergy Institute (http://www.jbei.org) supported by the U. S. Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U. S. Department of Energy. The work conducted by the US Department of Energy Joint Genome Institute (JGI) was supported by the Office of Science of the US Department of Energy under Contract no. DE-AC02-05CH11231. This work was also supported by NIH (GM59962) and NSF (IOS-1237975) to PCR.