Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models

Marcelo Mollinari; Antonio Augusto Franco Garcia

doi:10.1101/415232

Abstract

Modern SNP genotyping technologies allow to measure the relative abundance of different alleles for a given locus, and consequently to estimate their allele dosage, opening a new road for genetic studies in autopolyploids. Despite advances in genetic linkage analysis in autotetraploids, there is a lack of statistical models to perform linkage analysis in organisms with higher ploidy levels. In this paper, we present a statistical method to estimate recombination fractions and infer linkage phases in full-sib populations of autopolyploid species with even ploidy levels in a sequence of SNP markers using hidden Markov models. Our method uses efficient two-point procedures to reduce the search space for the best linkage phase configuration and reestimates the final parameters using maximum-likelihood estimation of the Markov chain. To evaluate the method, and demonstrate its properties, we rely on simulations of autotetraploid, autohexaploid and autooctaploid populations. The results show the reliability of our approach, including situations with complex linkage phase scenarios in hexaploid and octaploid populations.

Author summary In this paper we present a multilocus complete solution based in hidden Markov models to estimate recombination fractions and infer the linkage phase configuration in full-sib mapping populations with even ploidy levels under random chromosome segregation. We also present an efficient pairwise loci analysis to be used in cases were the multilocus analysis becomes compute-intensive.

Introduction

Polyploids are organisms with more than two sets of chromosomes. They are very important in agriculture and play a fundamental role in evolutionary processes, such as differentiation of species [1]. The number of sets of chromosomes in an organism is called ploidy level. These multiple sets of chromosomes in a polyploid can originate from the combination of chromosomes from different, but related species, or from the duplication of chromosomes from the same species [2, 3]. In the first scenario, they are called allopolyploids; in the second, autopolyploids. Another way to characterize polyploid organisms is according to their pattern of inheritance. In general, allopolyploids exhibits disomic segregation, since homologous chromosomes have more affinity than homeologous chromosomes and tend to form preferential bivalents within each sub-genome [4]. Autopolyploids, however, exhibit more than two homologous chromosomes per homology group. Thus, during the meiosis, they can form either bivalents or multivalents [4, 5]. The expected segregation ratios in autopolyploids vary depending on the type of chromosome configuration that the organism presents during meiosis. If the chromosomes pair randomly, the segregation is called polysomic [6-9]. In addition, the homologous chromosomes may have preferential pairing, which can vary from complete preferential (disomic segregation) to complete random (polysomic segregation). Since the molecular mechanics of polyploid organisms are quite complex, this rigid dichotomy is often broken, and organisms can exhibit intermediate modes of inheritance [4, 10]. Throughout this paper, the term autopolyploid (or autotetraploid, autohexaploid, etc.) will refer to polyploid organisms that exhibit polysomic segregation.

Despite all advances in genetic studies in autotetraploids [11-21], there is still a shortage of statistical methods to address organisms with higher ploidy levels, such as sweet potato [22-24], sugarcane [25, 26], some ornamental flowers and forage crops (reviewed in [27]). In this work, we denote as high-level autopolyploids those autopolyploid organisms with ploidy level greater than four. A fundamental class of statistical methods that are lagged behind in high-level autopolyploid studies is the construction of genetic maps. A reliable genetic map is a crucial step in quantitative trait loci (QTL) analysis, as well as the assembly of reference genomes and the study of evolutionary processes [28-30]. Although understanding the concept of genetic mapping is rather easy, the construction of such maps in high-level autopolyploids is challenging. Even under bivalent pairing, there is a large number of possible configurations during the meiosis, and this number gets exponentially larger as the ploidy level increases. Denoting m as the ploidy level, it is possible to find up to m different alleles for a locus in one individual. Furthermore, if some of those alleles are not distinguishable, it is necessary to consider the number of copies of each different allelic form, also known as allele dosage. Finally, depending on the marker system used to access the genotypic information, in the vast majority of cases, it is not possible to obtain the complete information about a particular locus.

The construction of a genetic map in a full-sib population can be summarized in five basic steps: i) estimation of pairwise recombination fractions and associated LOD Scores; ii) separation of markers into linkage groups; iii) order markers within each linkage group using an optimization technique; iv) parental phasing, recombination fraction update and likelihood computation and v) if the order is optimal, the map is complete, otherwise, return to step iii. Historically, genetic maps in high-level autopolyploids have been constructed using only alleles present in one homologous chromosome, called single-dose markers [31, 32]. In a full-sib population, these markers segregate in a 1:1 ratio (if they are present only in one parent), or in a 3:1 ratio (if present in both parents). Given this level of simplification, it is possible to use the five-step procedure coupled with a standard software suitable for backcross diploid populations. Nevertheless, it is well accepted that the use of single-dose markers imposes limitations on the construction of adequate genetic maps. These approaches sub-sample the genome [19, 26], which precludes further consideration of multiallelic effects in models for QTL mapping and subsequent studies. Moreover, there is low statistical power to detect linkage when markers are in repulsion phase configurations [31, 33]. Although some authors have addressed this problem by including multiple dose markers when constructing genetic maps and performing QTL mapping [33, 34], the limitations on the genotyping technologies at the time required that the allelic dosage had to be inferred based on expected segregation rates. Because of the high amount of hidden information imposed by marker systems on those studies [31, 33], the estimation of recombination fraction between multi-dose markers was highly impaired.

Quantitative genotyping technologies for single nucleotide polymorphism (SNPs) evaluation have opened the door for further genetic mapping studies in high-level autopolyploids. It is now possible to measure the abundance of specific alleles within a locus in a polyploid genome [19, 26, 36-39]. This technology, combined with the genotypic distribution in the population [37], makes it possible to infer the allelic dosage by using the ratio between the abundances of the two alternative alleles. Once the dosage of the markers is estimated, the construction of linkage maps can be significantly improved by taking this information into account. [19] and [40] presented works that take into consideration the dosage of quantitative SNP data both in linkage studies and QTL mapping for autotetraploids.

Genetic linkage maps can be constructed based on two-point or multipoint estimates of the recombination fraction. Two-point methods use information of pairs of markers, and even though they are less computationally demanding than multipoint methods, they require a higher amount of information in the markers to provide reliable results. Multipoint approaches, instead, use information of multiple makers present in a linkage group, increasing the statistical efficiency of the analysis [17, 41, 42, 53]. This feature is particularly important in polyploid linkage analysis, where markers are mostly partially informative. One widely used procedure to obtain multipoint estimates is the hidden Markov model (HMM) [41]. The construction of the genetic map using this method provides the estimates of the recombination fractions between all adjacent markers in a linkage group, as well the multipoint likelihood, which has been shown to be an excellent criterion to evaluate and compare linkage phase configurations and orders of makers [42]. [17] presented a statistical framework in which HMMs were applied to reconstruct genetic linkage maps, but it was limited to autotetraploids. Recently, [35] constructed an ultra-dense integrated linkage map for hexaploid chrysanthemum using two-point analysis. However, there is a lack of multipoint procedures that can handle cases where less marker information is available in high ploidy levels.

The main challenges we address in this paper are the inference of the haplotypes of the multiple homologous chromosomes and the multipoint estimation of recombination fractions in high-level polyploids. Although [21] proposed a probabilistic multilocus haplotype reconstruction model for autotetraploids considering double reduction, this remains as an open question for organisms with higher ploidy levels. Our method relies on an HMM and is developed for species with even ploidy levels under random chromosome segregation (complete polysomic inheritance). We also present a two-point method which is capable of dealing with hundreds of markers even in high ploidy level scenarios. Hence, we are proposing solutions for steps i and iv in high-level autopolyploids. Step ii is straightforward from step i using clusterization algorithms, as proposed by [50]. Even though step iii is a challenging task in genetic mapping, it can be addressed using pairwise recombination fractions or the resulting likelihood of the Markov model as it has been proposed by several studies [43-49]. To evaluate our method, and to show its properties, we rely on simulations of autotetraploid, autohexaploid, and autooctaploid data. The R computer codes to reproduce all simulations and analysis are publicly available.

Methods

In this section, we define the notation used throughout this article and present the probabilistic model for the gamete formation in autopolyploids. Then, we move to the calculation of the transition probabilities for adjacent marker loci (Eq 6) and follow to the initial state (Eq 7) and emission probability distributions (Eqs 8 and 9) which are fundamental in an HMM model. We conclude by explaining the complexity of estimating linkage phases between markers, presenting an efficient two-point algorithm that simplifies the problem in a way that allows the phasing to be inferred using real data.

Notation

Consider one homology linkage group in a mapping population derived from a cross between two autopolyploid individuals P and Q with the same ploidy level (full-sib family). The ploidy level is denoted by m, and can be any even number greater than zero. Let the vectors and , and and , i = 1, …, m, denote the genotype of two adjacent multiallelic loci k and k + 1 in P and Q, respectively. The superscript i indicates one of the possible alleles for the loci, and each locus has m different alleles in each parent. For example, for a cross between two autohexaploid individuals, ; similarly, this can be done for and . All alleles denoted by the same superscript number are in the same homologous chromosome (e.g., and are in homologous chromosome 1, etc).

The following assumptions are made to ensure random chromosome segregation [6, 8] and no double reduction [51]: i) there is only formation of bivalents during the meiosis; ii) there is no preferential pairing during the formation of bivalents; iii) all bivalents have the same recombination fraction between loci k and k + 1; iv) bivalents are independent and v) there is separation of sister chromatids during the meiosis II. Consequences of violations of these assumptions will be addressed later using simulations.

Bivalent formation

It occurs during meiosis I (more specifically, at the pachytene stage of prophase). In diploid cells, there is only one possible pairing configuration: two duplicated homologous from a homology group pair to form one bivalent. However, in autopolyploid cells, given the previous assumptions, the number of possible pairing configurations, i.e., the number of possible bivalent chromosomal pairing for a given homology group during meiosis is

The orientation of the bivalents does not affect the expected frequencies of each gamete type, and therefore will not be considered. For example, for an autotetraploid individual, there are two bivalents and three possible bivalent configurations. Homologous chromosome pair as 1 with 2, and 3 with 4; or, 1 with 3 and 2 with 4; or 1 with 4 and 2 with 3 [52]. We denote Ψ = {ψ_j}, j = 1, …, w_m a set of all bivalent configurations for a given ploidy level.

Expected gametic frequency for a given bivalent configuration

We will present the expected gametic frequencies considering parent P. Since parent Q undergoes a similar process, it is possible to combine the expected gametic frequencies to obtain the expected genotypic frequency in the full-sib population. Each of the bivalents obtained for a given configuration ψ_j can result in two types of chromosomes for loci k and k + 1: parental, which results from bivalents with zero or any other even number of recombinations between k and k + 1; and recombinants, which results from bivalents with any odd number of recombinations. As presented by [34], the probabilities of all chromosome types for any single bivalent can be represented always as where r_k is the recombination fraction between k and k + 1, i ≠ i′. For a given configuration ψ_j, the expected frequencies for all possible gametes derived from that configuration is where ⊗ denotes the Kronecker product of matrices and subscripts in V indicate the corresponding bivalent. All elements of this product are of the form where l denotes the number of total recombinant bivalents between loci k and k + 1, l ∈ {0, …, m/2}. From this, we can define the probability of observing any gamete (for two loci) given a bivalent configuration ψ_j as where vectors p_k and p_k+1 denote a subset of alleles present in and , respectively; {p_k, p_k+1} indicates a gamete for loci k and k + 1 from parental P. Consistent means that the gamete can be produced from bivalent configuration ψ_j. Notice that some gametes cannot be obtained from ψ_j once the bivalents are formed.

Since we assume that alleles with the same superscript are in the same homologous chromosome, l can be obtained by a simple examination of superscripts of elements contained in p_k and p_k+1. Consider, for example, ψ₁ = {(1, 2), (3,4), (5,6)} (m = 6, Fig 1). If one observes and , the number of recombinant chromosomes is l = 2. Therefore, . On the other hand, , since it is impossible to obtain this gamete from configuration ψ₁ i.e., it is not consistent with ψ₁.

Figure 1.

One possible pairing configuration in an autohexaploid, namely ψ₁. denotes one allele present in homologous chromosome i for loci k in parent P. Notice that some allelic configurations, such as , are impossible to be obtained in this bivalent pairing. In this case, the homologous chromosomes containing alleles and will migrate to opposite poles of the cell during meiosis I. Therefore, and will not be present in the same gamete.

Gametic frequency unconditional to bivalent configurations

In reality ψ_j. is unknown, thus the conditional probability given by Eq (2) must be considered for all possible ψ_j.. The probability of observing a gamete {p _k, p_k+1}, unconditional to ψ_j, can be expressed as

It is important to notice that only a subset of Ψ is consistent with the observed gamete, and consequently Pr(p_k, p_k+1 | ψ_j) > 0 only for some ψ_j’s. Fig 2 shows a graphical representation of Eqs 2 and 3 for autohexaploid gametes.

Figure 2.

Graphical representation of Eqs 2 and 3 for autohexaploid gametes. The first 15 tables represent the gametic probabilities given different bivalent configurations ψ. (Eq 2). The rows and the columns indicate gametic configurations for loci k and k + 1, respectively. For simplification, only the superscripts of the gametic configurations were presented. For example, row 123, column 123, represent the gamete . Colored cells indicate the probability of gametic configurations consistent with the bivalent configuration ψ. The color scale indicates the number of recombinant bivalents associated to the gametic probability varying from 0 (dark blue) to 3 (light blue). Blank cells indicate non-consistent configurations. The far right full table represents the sum over all ψ configurations, weighted by their probability (Eq 3).

The probability of observing a specific gamete is always the same for each ψ_j in this consistent subset (Eq 2). Therefore, under random pairing (assumption ii), our task reduces to finding the number of elements in this subset that are consistent with the observed gamete and multiply Pr(p_k, p_k+1 |ψ_j) Pr(ψ_j) by this number. The result is the probability of observing a gamete unconditional to the bivalent configuration.

For every gamete, l can change from zero to m/2 recombinant homologous chromosomes. The observed gamete is the result of homologous chromosomes that migrate to one pole of the cell at anaphase I. Since we are assuming that there is separation of sister chromatids during anaphase II, if l = 0 (all chromosomes are of parental type), there is no information about the pairing configuration of the homologous chromosomes that migrate to the opposite pole of the cell. In this situation, there are possible pairing configurations, and the number of possible ψ_j that can produce gametes with l = 0 is . Therefore, for l > 0, there are possible pairing configurations of parental chromosomes. For the remaining l recombinant chromosomes, the number of possible pairing configurations is l!. Thus, the total number of possible pairing configurations that can produce a specific gamete is . This is precisely the number of elements in the subset of Ψ consistent with the observed gamete. Given the assumption of no preferential pairing during the formation of bivalents, Pr, the probability of a gamete {p_k, p_k+1}, unconditional to ψ_j, can be simplified to

Map reconstruction via hidden Markov model

The construction of a genetic map involves the estimation of the genetic distance and order between markers within linkage groups. If the origin of the haplotypes (i.e., linkage phase) for the parents of the mapping population is unknown, it also needs to be estimated. For several years, hidden Markov models have been proven to be an excellent avenue for obtaining these estimates [17, 41, 42, 53]. The multipoint likelihood obtained using HMMs is employable as a criterion to compare marker orders and judge which one is best, and also to provide a reliable estimation of recombination fraction and linkage phases. [54] defines an HMM as a generative process composed of three well-defined probability distributions: transition, initial state and emission. In genetic mapping context, the transition probability distribution is defined as the probability of having a particular genotype at position k +1, given the genotype at position k. Using Eq (4) the gametic transition probabilities Pr(p_k+1|p_k), or the conditional probability of a gamete genotype at loci k + 1 given the gamete genotype at loci k, is simply

Under random chromosome segregation, both p_k and p_k+1 can have different genotypes. Let denote all possible genotypes that p_k can assume for loci k. Also, assume that genotypes in are arranged according to the lexicographical order of their superscripts. For example, in an autotetraploid for locus k. After some simplifications (see S1 Appendix) the transition probability, i.e., the conditional probability of a gametic genotype in locus k + 1 given the gametic genotype in locus k, is where . The initial state and the emission probability distributions will be addressed in the next section (Eqs 7 to 9).

Including information of both parents

Any given individual in a full-sib population is formed by the union of gametes from both parents, P and Q. Each parent can form different gametes for locus k. Since the formation of gametes in both parents is independent, the genotypic transition probability distribution can be written as where · denotes the genotype of an individual derived from the union of gametes and at locus k. The same reasoning applies to , and . l_P and l_Q denote the number of recombinant bivalents between loci k and k + 1 in parents P and Q, respectively. Let denote the number of possible genotypes derived from the cross between individuals P and Q. For simplification and without loss of generality, let . For a comprehensive example of the transition probabilities and the indexation used in Eq. 6, see Table 8 in S3 Appendix.

Given a ploidy level m and a recombination fraction r_k, the only information required to obtain t_k(j, j′) in Eq (6) is l_P and l_Q. Since the genotypes in and are arranged according to the lexicographical order of their superscripts, it is possible to obtain (l_P, l_Q) for any given pair (j, j′) using the algorithm presented in S2 Appendix. Although the number of possible transitions between positions k and k + 1 is (g_m)², which can be a very large number even for modest ploidy levels, it is possible to obtain the transition between any specific genotypes in j and j′ without computing the entirety of the transition space.

The initial state distribution is the probability of observing a specific genotype. Given the assumption that there is no preferential pairing during the formation of bivalents, a uniform probability density function can be employed as the initial state probability function

To this point, both transition and initial state distributions consider different allelic variants for all m homologous chromosomes in both parents. This scenario can only be achieved when using fully informative markers. In reality, autopolyploid species may have the same allelic variant in some homologous chromosomes. Besides, even if all homologous have different allelic forms, modern genotyping platforms are usually capable of detecting polymorphisms at the nucleotide level (SNPs), which are essentially biallelic. Due to this lack of identity between the observed data and the full transition space, we make use of the emission function, which is defined as the probability of observing a molecular phenotype given a genotype .

The detection of the allelic variants in modern genotyping platforms is based on the abundance of different alternative nucleotides. In the autopolyploid setting, this can be translated as the dosage of a SNP at a specific locus. The dosage of a SNP can be estimated using the ratio between the abundance of its two allelic forms. Several methods were proposed to perform this task including [36], [37] and [38]. Here we introduce a biallelic derivation of the emission probability distribution. Although the function presented here use biallelic information, other distributions can be derived for partial informative multiallelic marker systems following the same reasoning.

Let denote the observed dosage of one allelic form in locus k for parents P and Q, respectively. The choice of the allelic form denoted by is arbitrary, as long as the same allelic form is used in . The dosage observed in parent P can be originated from alleles present in of the m homologous chromosomes. Let denote a set of size containing all possible subsets in that originate the observed dosage . The operator #{.} is the cardinality of a set. The same reasoning applies for . For instance, in an autotetraploid, if , the three doses present in locus k can be derived from four distinct subsets . Given two particular subsets and in and , each one of the g_m genotypic states in the full transition space can be associated to a dosage. The dosage associated to the j-th state is obtained by counting the number of alleles present in the intersection between the parental allelic set and . Thus, the emission function can be defined as where and ϵ denotes the global genotype error rate. In addition to the punctual estimate of the dosage, the genotyping calling methods cited above also provide the probability distribution of the dosages for a particular marker for all individuals of the biparental population. If this information is available, a more general emission function can be derived. Instead of modeling a global error rate ϵ, we use the prior information provided by the genotyping calling procedure. Let denote the probability distribution vector associated to the dosages 0, …, m at position k for a particular individual in the biparental population. For example, denotes a tetraploid individual with probabilities and of having one, two and three doses, respectively, and zero for the remaining ones. Then, the emission probability function can be written as

In this case, the observation O can be any dosage from 0 to m and the information about the genotypes will be contained in the probability distribution of the dosages π_k. Thus, the probability of observing any dosage given a genotype associated to a particular dosage δ(k, j) can be obtained by simply assessing the corresponding value in the probability distribution provided by the genotype calling procedure. Notice that Eq 8 can be reduced to Eq 9 using the appropriate π_k. For example, in autotetraploids, when the observed dosage for locus k is one, . Moreover, for missing values, it is possible to use the probability distribution of the genotypic classes under polysomic segregation, as presented by [37].

Multipoint likelihood and the estimation of recombination fraction

Suppose there are z markers in a homology group in a known order represented by M₁, …, M_k, …, M_z. Let r = (r₁, …, r_k, …, r_z-1) denote the recombination fraction vector between all marker intervals in this sequence. Also, assume linkage phase configurations in parents P and Q denoted respectively by and . The sequence of observations for the z markers is denoted by (O₁, …, O_k, …, O_z) and its underlying probability distributions is denoted by Π = (π₁, …, π_k, …, π_z). The likelihood of M₁, …, M_k, …, M_z can be obtained using Eqs (6), (7) and (9) following the classical forward procedure [54]. Let denote the probability of the partial observation sequence (O₁, …, O_k) and genotype given the sequence of recombination fractions r, the linkage phase configurations Φ_P and Φ_Q and the probability distributions for the sequence of observations Π. The forward procedure follows the steps below:

1. Initialization:

2. Induction: where k =1, …, z – 1 and j′ = 1, …, g_m

3. Termination:

Then, the likelihood of the model is defined as where n is the number of individuals in the full-sib population, O₁,_i, …, O_z,_i is the sequence of marker observations for individual i and Π_i is a (m +1) × z matrix where the k-th column denotes the probability distributions associated to the marker M_k, individual i. The multipoint maximum likelihood estimate of r can be obtained using the forward-backward procedure coupled with the EM algorithm [54]. For the backward procedure, consider the variable as the probability of the partial observation sequence from k + 1 to z, given the genotype , the recombination fraction vector r, the linkage phase configurations Φ_P and Φ_Q and the probability distributions for the sequence of observations Π. The solution to β_k(j) was also described by [54] as follows:

1. Initialization:

2. Induction: where k = z – 1, z – 2, ···, 1 and j = 1, …, g_m

To estimate the recombination fraction for all intervals in the marker sequence we need to define ξ_k(j, j′) as the probability of state at position k and state at position k +1 given the sequence of observations O₁, … O_z and their underlying probability distributions Π, the recombination fraction vector r and the linkage phase configurations Φ_P and Φ_Q

The recombination frequency r_k can be estimated through an iterative process using where ξ_k(j, j′ | r^s) is calculated for individual is the proportion of recombinations between markers k and k + 1 for individuals with genotypes and and r^s is the vector of recombination fractions in the iteration (s) and r^s+1 is the updated recombination fraction vector [55].

Estimation of linkage phase

Let the Cartesian product denotes a set containing all possible linkage phase configurations in parent P. Also, let , , denote a set containing all possible linkage phase configurations in both parents. The probability of the linkage phase configurations can be obtained using Bayes’ rule where O is an array containing the observation for z markers in n individuals, and Π is the underlying probability distribution for all marker observations. Since the prior probability Pr(Φ^u) can be assumed to be uniform, the posterior probability is proportional to the likelihood of the model, which can be used to select the best linkage phase configuration. Depending on the dosage and number of markers, some of these configurations are equivalent and will result in the same likelihood. The search space for the best linkage phase configuration can be unwieldy depending on the ploidy level, dosage and number of markers. Also, the transition space on the HMM gets larger as the ploidy level increases. To circumvent these problems, we propose a very efficient two-point procedure to reduce the search space for linkage phases.

Two-point algorithm for high-level autopolyploids

When the linkage analysis is conducted only in two markers (two-point analysis), the information contained in these markers does not propagate into the rest of the chain. Thus, based on the dosage and linkage phase configuration of the markers involved in the analysis, the g_m genotypic states present in the full transition space can be collapsed into a small number of states, and a straightforward likelihood function can be derived. It is worthwhile to mention that the estimates obtained using the two-point procedure are the same as those obtained using the multipoint algorithm for two markers. However, the computation is extremely faster.

Consider a biallelic marker in an autopolyploid biparental cross with ploidy m. The number of possible genotypic states in the progeny for a given locus at position k is , where the operator and |.| denotes module. For example, in an autohexaploid biparental cross, if the dosage of the marker at position k in parent P is two and in parent Q is three , the number of possible genotypic classes expected in the progeny is six. Depending on the linkage phase configuration, each of the g_m genotypic states in the full transition space corresponds to one of these expected genotypic classes, as presented in the emission function (Eqs 8 and 9). Thus, in the previous example, all the g_m states could be collapsed into six different classes. To perform this reduction of dimensionality, let denote one of the possible genotypes based on the dosage of one individual in the progeny of an autopolyploid biparental cross for position k with ploidy m. The joint probability of and , for a given genotypic configuration at positions k and k′ can be written as where and δ(k, j) was defined in Eq 8; the same applies to T_k′. Since in a two-point analysis the probability distribution of the genotypic states in locus k can be assumed to be uniform, i.e., , Eq (19) can be rewritten as a sum of weighted terms from Eq (6) where h(j, j′; l_P, l_Q) is 1 if (j,j′) corresponds to (l_P, l_Q) according to the procedure described in S2 Appendix and zero otherwise. Eq 20 can be expressed in matrix form as where is a (m + 1) × (m +1) matrix. Yet, in a two-point analysis with biallelic markers, the linkage phase configuration can be summarized in an ordered pair indicating the number of homologous chromosomes that share allelic variants for loci k and k′ in parents P and Q, respectively. For a given pair , where and denote the set of homologous chromosomes inherited by parent P in positions k and k′, which can be assessed using the superscripts in and . indicates the cardinality of the set. Notice that and can assume several linkage phase configurations resulting in the same . Let denote a set containing all possible pairs for a given pair . In this set, there are min partitions, each one corresponding to a different . Fig 3 shows an example of for in an autotetraploid homology group. The size of the set is 36, and it can be subdivided into three partitions where and .

Figure 3.

Example of for an autotetraploid homology group with observed dosages and homologous chromosomes sharing alleles. In this case, denotes a set of size six, containing all possible subsets of size two in . The same reasoning applies to . The horizontal bars represent homologous chromosomes forming a homology group and the dots represent allelic variations of a biallelic marker. The number below each homology group represents the number of homologous chromosomes that share allelic variants . This defines three partitions: and . Notice that, from a homology group within a specific partition, it is possible to obtain the same linkage phase configuration observed in another homology group within that partition by permuting the its homologous chromosomes

In a two-point context, the likelihood function derived from any of the configurations belonging to the same partition (same ) will be the same. Thus, any of them can be used to obtain the likelihood function for a given . Let denote one of the possible pairs that correspond to . The same reasoning applies to parent Q. Without loss of generality, the two-point likelihood function of biallelic observed molecular phenotypes for markers k and k′ given and is where n is the number of individuals and T denotes transposition of a vector. In Eq (22), r_k can be estimated using iterative procedures such as EM or Newton-Raphson. As in Eq (18), it is possible to list all linkage phase configurations and evaluate them based on their likelihood. Here we use the LOD Score (base-10 logarithm of likelihood ratios) in relation to the highest likelihood. Thus, models with high likelihoods will yield LOD Scores close to zero. We also use the LOD Score to asses the evidence for linkage between the two markers using the ratio between the model under and under the null hypothesis of no linkage H_o: r = 0.5, given a linkage phase configuration.

As previously shown, it is possible to enumerate all linkage phase configurations for parent P using the Cartesian product . To reduce this Cartesian space based on two-point analysis, we add a restriction where all pairs in a sequence of configurations must be contained in , where is a subset of all partitions in in which the associated LOD Sore is smaller than η. Thus, a reduced subset of linkage phases in parent P based on two-point analysis can be obtained using

It is important to note that it is not necessary to represent the whole Cartesian space {Φ_P} to restrict the linkage phase configurations to the condition . This procedure can be done through the sequential addition of markers from M₁ to M_z. For each marker M_k′ added to the end of the chain, the ordered pair (k, k′), k′ = 2, …, z and k = k′ – 1, …, 1, is evaluated and only linkage phase configurations that meet the condition are considered.

Some of the configurations selected using the previous procedure can be equivalent once they are products of a permutation of the same set of homologous chromosomes. In order to remove this redundancy, let each one of the selected configurations be represented as a binary matrix of dimensions (m × k′) such as where u ∈ {1, …, U}, U is the number of selected linkage phase configurations, and k′ indicates that M_k′ was the last marker inserted in the chain. The rows of matrix represent the homologous chromosomes for the u-th linkage phase configuration with the insertion of the k′-th marker at the end of chain; 1 denotes the presence of an allelic variation, and 0 denotes its absence. If a matrix H_k′ could be obtained from a matrix just by permuting the rows (permuting the order of the homologous chromosomes), these two linkage configurations yield the same likelihood. Thus, one of the configurations should be excluded from consideration. The same reasoning applies to parent Q. This procedure can be done recursively until all redundancy is eliminated. The reduced linkage phase configurations search space considering both parents is obtained using Ф(η) = Ф_P(η) × Ф_Q(η), such as #{Ф(η)} ≪ #{Ф}, combined with the redundancy elimination for homology groups. This sequential procedure results in a set of linkage phase configurations containing markers up to M_k′, which are evaluated using the HMM likelihood. A LOD Score threshold in relation to the most likely configuration is assumed to determine which configurations should be taken into consideration in the next round of marker inclusion (Fig. 4).

Figure 4.

Example of linkage phase configuration estimation using two-point based sequential space reduction and HMM evaluation. Only one parent is presented. The two-point search reduction is composed of two parts: the first one evaluates the LOD Scores obtained through pairwise recombination fraction likelihoods. The second detects equivalent configurations by performing all possible permutations of the homologous chromosomes. The remaining configurations are evaluated using the HMM-based likelihood. In the first step, linkage phase configurations of M₁ and M₂ are evaluated using the two-point analysis. Color shades indicate different linkage phase configurations provided by the two-point analysis. In this example, there are two possible linkage phases represented by two shades of red. These configurations are not evaluated using the HMM, once the outcome would be the same obtained using two-point analysis. In the second step, we evaluate the linkage phases between markers M₃ and M₂, and M₃ and M₁. Configurations with LOD scores smaller than η are maintained to be evaluated by HMM. There are two possible linkage phases given a certain η, represented by two shades of blue. These two configurations are combined with the configurations from the previous step, resulting in four configurations evaluated using HMM likelihood. Given a likelihood threshold, only configurations 1 and 4 are eligible for the next step. The same reasoning applies for the remaining markers. A final linkage phase configuration is obtained after inserting the last marker and choosing the one that yields the highest HMM-based likelihood.

Finally, with all markers inserted, the multipoint likelihood of the whole map is used to find the best configuration among the remaining ones, and the recombination fractions are reestimated. To demonstrate the mechanics of the two-point analysis coupled with the multipoint procedure, a simple example is presented in S3 Appendix. All the methods and procedures described here are available in a software called MAPPoly, which can be accessed at https://github.com/mmollina/mappoly.

Simulations

Simulation 1 - local performance under random bivalent pairing

the aim of this simulation study was to evaluate the local performance of the algorithm considering three ploidy levels (m = 4, m = 6 and m = 8) under the mapping model assumptions (i.e., random pairing and bivalent formation). To be in accordance with molecular data that have been made available through sequence technologies, we simulated bi-allelic markers that can be observed in terms of dosage in parents and progeny. Three different linkage phase scenarios were simulated: In scenario A, for each marker, if the dosage was greater than zero, one of the allelic variants was assigned to the first homologous chromosome in the homology group and the remaining variants of the same type were assigned to the subsequent homologous chromosomes. In B, the allelic variant was randomly assigned to one of the first homologous chromosome and the remaining were assigned to the subsequent homologous chromosomes; in scenario C the allelic variants were randomly assigned to the m homologous chromosomes. Thus, it is expected an increasing difficulty to detect recombination events from scenario A, where the allelic variants were concentrated in the same homologous chromosomes, to scenario C, where they are randomly distributed. Consequently, the phasing and recombination fraction estimation become more challenging from scenario A to scenario C. In real situations, scenarios A and B could occur locally due to lack of recombination between homologous chromosomes since their polyploid formation, whereas scenario C represents regions with higher recombination rates.

For each combination of ploidy level and linkage phase scenario, we simulated five different parental haplotypes. In total, 45 parental configurations were considered (3 × 3 × 5, S4 Figure). For autotetraploid and autohexaploid configurations, we simulated 1000 full-sib populations. For autooctaploids, this number was reduced to 200 due to the high demand of computer processing required to reconstruct such maps. Each population was comprised of 200 individuals with one linkage group containing 10 markers positioned at a fixed distance of 1 cM between them. For each combination, the percentage of correctly estimated linkage phase configuration in each parent was recorded. Also, for the cases where the linkage phases were correctly estimated, we calculated the average Euclidean distance between the distances of the estimated and simulated maps using where is the vector of distances for a estimated map, d is the vector of distances for the simulated map, z is the number of markers and T indicates vector transposition. For example, a value of 1 cM indicates that the maps differ 1 cM in average from each other [42]. We used the sequential two-point procedure to reduce the search space assuming that linkage phase configurations with associated LOD < 3.0 should be investigated using HMM multipoint strategies (η = 3). For the remaining configurations evaluated using HMM, we kept those with LOD < 10.0 to be evaluated in the next round of marker insertion. Notice that, although the likelihood obtained for each map could be used as a criterion to evaluate the order of the markers, this was not considered in this simulation due to the computational demanding nature of the multiple simulations added to high ploidy levels, specially m = 8.

Simulation 2 - chromosome-wise performance under preferential pairing and multivalent formation

In this simulation study, we evaluated the performance of the algorithm in dense maps, allowing for multivalent formation and preferential pairing. We used Scenario C from the previous study as a template to simulate five tetraploid and five hexaploid parental haplotypic configurations, each one comprising 200 equally spaced markers with a final length of 100.0 cM (S5 Figure). For each parental configuration, we simulated 200 full-sib populations of 200 offspring considering a combination of three levels of preferential pairing (0.00, 0.25 and 0.50) and three levels of cross-like quadrivalent formation proportion (0.00, 0.25 and 0.50). No hexavalents were simulated in this study. For autohexaploids, the multivalent configurations were always composed by a cross-like quadrivalent plus a bivalent. The centromere was positioned at 20.0 cM from the beginning of the chromosome (subtelocentric centromere with arms ratio 1:4) to study the effect of the double reduction at the distal end of both chromosome arms. All simulations were conducted using the software PedigreeSim [56]. In addition to the statistics recorded in Simulation 1, we computed the rate of double reduction observed in each marker for all constructed maps using the “founderalleles” file provided by PedigreeSim. We also evaluate two values for the LOD Score threshold associated to the two-point analysis (η = 3 and η = 5). We used a multipoint LOD Score threshold of 10. 0. The R scripts to perform the simulations presented here can be accessed at https://go.ncsu.edu/mappoly-support-info.

Simulation results

Simulation 1

Table 1 shows the percentage of data sets where the linkage phase configuration was correctly estimated in both parents P and Q. In scenario (A) the method was capable of recovering the correct linkage phase configuration in all situations for all ploidy levels. In scenarios (B) and (C) there was a slight decrease on the ability to correctly estimate the linkage phase configuration, especially for m = 6 and m = 8. Although in these cases the percentage of correctly estimated linkage phases was lower, the numbers are considerably high, varying from 100% to 88.8%. This indicates a very good performance to estimate the linkage phase configurations, even using the two-point procedure to narrow the search space.

View this table:

Table 1. Percentage of data sets where linkage phase configuration was correctly estimated for parents P and Q in simulation 1.

Fig 5 shows the distributions of the average Euclidean distances between the estimated and simulated distance vectors for the correctly estimated linkage phase configuration. In all cases, the majority of the recombination fractions were consistently estimated once the medians of all distributions are very close 0.5, with no practical problems in terms of mapping construction. These results show that, apart from a relatively small percentage of entangled linkage phase configurations, the method successfully performed the phasing and managed to estimate the recombination fraction of 10 markers in all situations evaluated.

Figure 5.

Distributions of the average Euclidean distances between the estimated and simulated distance vectors considering correctly estimated linkage phase configurations. The order of boxplots is the same as the order of haplotypes in S4 Figure. Each column indicates the results for different linkage phase configuration scenarios, namely, A, B and C, and each row indicates a different haplotypic configuration within three ploidy levels.

Simulation 2

The proportion of correctly estimated linkage phase configurations for the dense chromosome-wise map is shown in Table 2. In general, results for tetraploid maps were superior when compared to results for hexaploid maps. It is also possible to observe a better performance for the threshold level η = 5 in comparison to η = 3. Similarly to Simulation 1, maps resulting from configurations with no preferential pairing or quadrivalent formation showed a high proportion of correctly estimated linkage phase configurations. Results ranged from 100% to 99% for tetraploid maps and from 100% to 84% for hexaploid maps. Different levels of quadrivalent formation rate had no substantial influence in estimating the correct linkage phase configurations in tetraploids. Within the preferential pairing level 0.0, the percentage of maps with correct linkage phases varied from 100% to 90%. For hexaploids, there was a decrease in this percentage as the quadrivalent formation increases from 0.0 to 0.50, with proportions varying from 100% to 70.5%. Especially for autohexaploids, there was a considerable variation between the five simulated configurations. This occurred, because the effect of the quadrivalent formation can be more pronounced depending on the level of information contained in a particular configuration. Also, the use of a more stringent two-point threshold η = 5, improved the performance of the phasing algorithm.

View this table:

Table 2. Percentage of data sets where linkage phase configuration was correctly estimated for parents P and Q in simulation 2.

Within the preferential pairing level 0.25, results showed decay of correctly estimated linkage phases, which was more pronounced for hexaploid cases with threshold level η = 3, reaching a minimum value of 52.5% for parent Q in configuration 1. Again, the use of a higher two-point threshold level, η = 5, helped to improve this number to 68.5%. For preferential pairing level 0.50, there was a clear distinction between the results in tetraploid and hexaploid cases. In the former, the effect was not as pronounced as it was in the latter, where in several cases, the proportion of correctly estimated linkage phases was close to zero. As expected, the usage of a higher threshold level of η = 5 helped to improve the number of corrected estimated linkage phase configurations. Interestingly, for both cases with preferential pairing (0.25 and 0.50), the formation of quadrivalents had an overall tendency to improve the algorithm’s performance. This improvement was expected because when a quadrivalent is formed, each chromosome involved can exchange segments with two others, providing more information regarding their phase configuration.

Given a correctly estimated linkage phase, the recombination fractions were consistently estimated for all levels of preferential pairing with no quadrivalent formation. However, they were overestimated in the presence of quadrivalent formation. This effect was mainly observed at the terminal regions of the chromosome, especially in the long arm, where double reduction is more pronounced (Fig. 6). In this case, tetraploid maps were the most affected. This is in agreement with our expectations since in autohexaploid simulations, there was always the formation of a bivalent which was not involved in the double reduction process (although the rates of double reduction were very similar in both ploidy levels, Fig. 6). In addition to the quadrivalent, the bivalent serves as an extra source of information to access the recombination events.

Figure 6.

Comparison of estimated versus simulated maps given a correct estimation of linkage phases in simulation 2. Smoothed conditional means of the observed average rate of double reduction is presented along with the simulated chromosome. The centromere was positioned at 20 cM from its beginning (vertical dashed line). Upper panels show the results for tetraploid simulations while lower panels show the results for hexaploid simulations. Three levels of preferential pairing (0.00, 0.25, 0.50) and three levels of quadrivalent formation rate (0.00, 0.25, 0.50) were simulated. The lines superimposed to the scatter plots are smoothed conditional means of the distances using a generalized additive model. Both two-point thresholds were considered since they only affect the phasing procedure.

The average Euclidean distances reflect the overestimation of recombination fractions in cases with quadrivalent formation, showing distributions with higher medians and interquartile ranges in tetraploid cases when compared to hexaploids (S6 Figure). Nevertheless, all the Euclidean distances distributions were located relatively close to zero, with a maximum value of 1.41 cM, indicating that although we observed overestimated recombination fractions towards the terminal ends of the chromosome, they were equally distributed, causing no severe disturbances in the final map. S7 Figure shows an example of the effect of increasing quadrivalent formation rate in autotetraploid and autohexaploid maps. As the markers get further away from the centromere, the recombination fractions become overestimated.

Discussion

Although the concept of linkage mapping is relatively simple, the combinatorial properties and increasingly missing information that arise from the multiple sets of chromosomes make the construction of genetic maps in high-level autopolyploids extremely challenging. In this work, we frame and solve two fundamental steps towards the construction of such maps, namely multipoint recombination fraction estimations and linkage phase estimation. Our method can be applied to biallelic codominant markers and, due to the flexibility of the HMM framework upon which it was derived, it can be extended to any type of molecular marker. The HMM used in this work takes into account the linkage phase configuration of the whole linkage group to estimate the recombination fractions between adjacent markers. An efficient two-point approach was also presented to reduce the search space of linkage phase configurations. As result, our method provides the likelihood of the model, which can be used as an objective function to compare different map configurations, including linkage phases and marker order. When considering experimental populations, our method is a generalization, for any even ploidy level, of well established genetic linkage mapping methods. For diploid (m = 2) populations derived from biparental crosses, our method is equivalent to the influential Lander and Green algorithm [41]; considering full-sib phase-unknown crosses, it is equivalent to [57]. For tetraploids (m = 4) the method is equivalent to [17], disregarding double reduction. Thus, it encapsulates the essence of the HMM-based genetic mapping methods in a single one.

To assess the statistical power of our method, we conducted two simulation studies. Simulation 1 comprised three ploidy levels and three linkage phase configuration scenarios with ten markers. We demonstrated that our model was capable of correctly estimating the majority of parental linkage phase configurations and recombination fractions, even for complex linkage phase configurations and high ploidy levels. These well-assembled regions could function as multiallelic codominant markers which propagate their information through the HMM to the rest of the chain, improving the quality of the final map. In simulation 2, we analyzed a sequence of 200 markers in combinations of different levels of preferential pairing and rates of quadrivalent formation. In this situation, quadrivalent formation rate had a marginal effect on the phasing procedure, whereas preferential pairing reduced its performance, especially for autohexaploids. The usage of a higher two-point threshold (η) improved the linkage phase estimation in all cases. This fact indicates that the haplotype phasing is more accurate when HMM-based likelihood is used as objective function to evaluate linkage phases. We also observed that quadrivalent formation yield overestimated recombination fractions between adjacent markers located further away from the centromere. This behavior was expected since our model disregards double reduction and, consequently, was not able to correctly estimate the number of crossing over events when this phenomenon was present. Although our model is robust enough to cope with low levels of preferential pairing and tetravalent rate formation, it is possible to include both phenomena in specific points of its derivation. Preferential paring can be included in Eq 4 by not considering Pr(ψ_j) as uniformly distributed. Double reduction can be included in the definition of the genotypic states in the full transition space (Eq 5). These two phenomena add extra layers of complexity to the genetic mapping of polyploid organisms with high ploidy levels and should be addressed in future studies.

The difficulty in correctly estimating entangled linkage phase configurations lies in two major aspects of the experiments studied here: (i) the outbred nature of the experimental crosses and (ii) the incomplete information of the markers based on dosage (i.e., by not being multiallelic). In experimental population derived from inbred lines, the origin of the haplotypes can be easily inferred from the genetic design. However, obtaining pure inbred lines in high-level autopolyploids has been proven to be impractical due to the high number of crosses and generations necessary to achieve homozygous genotypes and to the inbred depression which some species undergo [61]. In our method, the linkage phase configuration is obtained by comparing the likelihood of a set of models with different linkage phase configurations (Eq 18). The capability of estimating the correct configuration is directly related to the information contained in the marker data. Some of these limitations can be overcome through the use of HMMs which take into account the information of a whole linkage group.

HMMs provide an excellent avenue to assemble genetic maps in complex scenarios, but they are remarkably computational demanding and, in some cases, unfeasible to use. Apart from parallel computing, which can greatly speed up the estimation process and is ubiquitous nowadays, the usage of two-point approaches is a viable option to reduce the dimension of the original problem efficiently. The dimension reduction is achieved by collapsing genotypic states in the full transition space according to the marker information. However, in several cases, the two-point based method can result in low statistical power which is related to the amount of information contained in markers in certain combinations of allelic dosage and linkage phase configurations. This lack of information is exacerbated as markers get distant from each other. Fig 7 shows eight possible configurations of pairs of markers in one autohexaploid parent. Considering the other parent non-informative, we computed the Fisher’s information equations based on the likelihood Eq (22) [15, 33, 62]. The equations were plotted as a function of the recombination fraction. The information profiles are related to the number of different haplotypes present on the parental configuration for a given marker dosage. For instance, for two single-dose markers (Fig 7, panel I), when the alleles share the same homologous chromosome (w_k = 1), it is always possible to detect if the gamete contains at least one recombinant chromosome. However, when the alleles are in different homologous chromosomes (w_k = 0), the detection of recombination events is limited to meiotic configurations containing a bivalent where these chromosomes paired to each other. Additionally, the model proposed here contemplates both parents on the analyses, leading to more complicated linkage phase configurations and information equations.

Figure 7.

Fisher’s information for the two-point maximum likelihood estimators in different combinations of dosages and linkage phases configurations considering one informative hexaploid parent. (I) single-dose markers; alleles share 1 and 0 homologous. (II) double-dose markers; alleles share 2, 1 and 0 homologous. (III) triple dose markers; alleles share 3, 2, 1 and 0 homologous.

The multipoint procedure improves the power to detect genetic linkage since the information on the markers depends not only on the observed molecular phenotype for the locus in question but also on the accumulated information along the Markov chain. Fig 7(I) shows that maps using only single-dose markers are limited to the detection of markers whose allelic variants are the same homologous chromosome (w_k = 1). Thus, the homologous chromosomes are treated as separate entities, instead of belonging to a homology group, and it is not possible to assemble haplotypes on the parents considering all homologous chromosomes (i.e., linkage phase estimation). Due to the lack of appropriate statistical methods, the use of diploid approximations considering single-dose markers has been the method of choice to build genetic maps in high-level autopolyploids. In our experience with construction of genetic maps in sugarcane [63-66], it is possible to anticipate a great gain of quality in those maps when using the new method proposed in this work. We also expect the same improvement for other high-level autopolyploid species.

The intrinsic lack of information in biallelic markers can be circumvented using multiple markers clustered in linkage disequilibrium (LD) blocks to assemble multiallelic marker data. Two different approaches can be used: the first one relies on the usage of high throughput molecular data and subsequent estimation of pairwise recombination fraction between the markers. In this case, due to the density of the data, closely linked markers are expected, and the Fisher’s information for the two-point maximum likelihood estimator is high (Fig 7). Thus, the determination of linkage phase configurations between markers in small blocks can be successfully achieved by using two-point methods (for a detailed example, see S3 Appendix). Once these LD blocks are well assembled, including the correct linkage phase configuration of both parents, they can be regarded as multiallelic markers. Simulation 1 showed that using two-point procedures coupled with the multipoint analysis is a trustworthy way to assemble haplotypes with closely linked markers. Another approach relies on a priori information about markers belonging to the same genomic region where recombination events can be neglected. This information can be obtained using any reference such as genomic or transcriptomic information. In this case, the recombination fraction can be assumed to be r = 0 for any pair of markers belonging to the LD block and the linkage phase configuration can be obtained using a trivial Markovian process, with transition probabilities t_k(j, j′) = 1, ∀ j = j′ and t_k(j, j′) = 0 otherwise. Therefore, the biallelic information contained in SNP markers can be combined to assemble haplotypes which will represent alleles allocated in different homologous chromosomes.

The multipoint method proposed herein rely on biallelic marker information. However, the emission function (Eq 9) can be modified to incorporate multiallelic observations. When using multiallelic markers, the number of states that should be visited in the Markov model can be significantly reduced, making the HMM procedure much more efficient. Ideally, in a full-sib population, the number of different alleles should be as high as two times the ploidy level (fully informative). In this case, the Markov model would be fully observed and, the task of estimating recombination fraction reduces to count the number of recombinant events given a linkage phase configuration. Since our algorithm does not need the entire transition space to work, only a subset of states should be visited, making the calculation much faster when compared to the biallelic case.

It is worthwhile to mention that, in this paper we do not address the step iii mentioned in the Introduction section, namely, ordering of genetic markers. The genetic mapping literature has an extensive body of methods to address the problem of ordering markers. Several works evaluated some of these methods [42, 67, 68] and others were proposed since then [47-49]. A fundamental lesson learned from these works is that, in complex linkage phase configurations with partially informative markers, methods based on multipoint likelihood provide better results when compared with two-point based methods. However, the multipoint procedures are highly compute-intensive. In the case of high-level autopolyploids, while it is important to rely on the multipoint estimates to recover the lack of information in the biallelic markers, it is also fundamental that the method is fast enough to cope with hundreds of markers per linkage group. One possible solution to these problems is to use two-point information to build marker blocks with a small number of SNPs in high linkage disequilibrium using some clusterization process. The linkage phase within these blocks can be estimated using a combination of two-point and HMM procedures. Then, these marker blocks can be used as multiallelic markers to reduce the number of states that need to be visited in the HMM. The more informative the assembled marker blocks are, the faster is the reconstruction of the mapping using the HMM. Moreover, in several situations, genomic and transcriptomic references are available and often provide, at least, the local physical order of SNPs. Thus, instead of using two-point information to cluster the SNPs into marker blocks, they can be assembled using genomic or transcriptomic references. While this paper provides fundamental steps towards the construction of complete genetic maps in high-level autopolyploids using both multipoint and two-point procedures, the practical aspects and implications will be addressed in future studies.

Once the map is assembled, it is a trivial exercise to obtain the probability of a specific genotype at any map position, conditioned on the whole linkage group. Using this information, it is possible to compute the probability of any unobserved genotype given the genetic map. These conditional probabilities are the basis for answering a series of fundamental questions about quantitative trait loci analysis in high-level autopolyploids, such as the effect of the dosage level on the variation of quantitative traits, the interaction of the alleles within (dominance effects) and between loci (epistatic effects). Therefore, the present study will provide a sound basis for the next step of genetic studies in high-level autopolyploids, trying to unveil the complex structure of autopolyploid genomes through genetic mapping and genome assembling, and even for studying the genetic architecture of quantitative traits based on QTL mapping.

Supporting information

S1 Appendix. Algebraic simplifications for transition probabilities.

S2 Appendix. Algorithm for obtaining l_P and l_Q given two genotypic indices.

S3 Appendix. Example of usage of the two-point and multipoint procedures. In order to show the mechanics of the mapping reconstruction using the combination of two-point and multipoint strategies, we present a simple full-bib autotetraploid mapping population example. This example is easily extendable to higher ploidy levels, since it does not involve matrix forms whose high dimensions would preclude the operations.

S4 Figure. Haplotypes for simulation study 1 Simulated haplotypes with 10 markers and three ploidy levels, namely autotetraploid (m = 4), autohexaploid (m = 6) and autooctaploid (m = 8).

S5 Figure. Haplotypes for simulation study 2 Simulated haplotypes with 200 markers and two ploidy levels, namely autotetraploid (m = 4) and autohexaploid (m = 6).

S6 Figure. Boxplots of the average Euclidean distances between the estimated and simulated distance vectors for simulation study 2

S7 Figure. Examples of autotetraploid and autohexapoloid maps estimated from datasets with three quadrivalent formation rates: 0.00, 0.25 and 0.50

Acknowledgments

The authors wish to thank Dr. Guilherme da Silva Pereira and Dr. Zhao-Bang Zeng for their invaluable suggestions for elaboration of the manuscript.

Footnotes

↵* mmollin{at}ncsu.edu(MM), augusto.garcia{at}usp.br(AAFG)

References

1.↵
Soltis DE, Segovia-Salcedo MC, Jordon-Thaden I, Majure L, Miles NM, Mavrodiev EV, et al. Are polyploids really evolutionary dead-ends (again)? A critical reappraisal of Mayrose et al. (2011). New Phytologist. 2014;202(4):1105–1117.
OpenUrl
2.↵
1. Soltis PS,
2. Soltis DE
Birchler JA. Genetic Consequences of Polyploidy in Plants. In: Soltis PS, Soltis DE, editors. Polyploidy and Genome Evolution. Berlin: Springer-Verlag; 2012. p. 21–32.
3.↵
Comai L. The advantages and disadvantages of being polyploid. Nature Rev Genet. 2005;6(11):836–846.
OpenUrl CrossRef PubMed Web of Science
4.↵
Osborn TC, Pires JC, Birchler JA, Auger DL, Chen ZJ, Lee HS, et al. Understanding mechanisms of novel gene expression in polyploids. Trends in Genetics. 2003;19(3):141 –147.
OpenUrl CrossRef PubMed Web of Science
5.↵
Sybenga J. Meiotic configurations. Berlin: Springer; 1975.
6.↵
Muller HJ. A New Mode of Segregation in Gregory’s Tetraploid Primulas. Am Nat. 1914;48(572):508–512.
OpenUrl CrossRef Web of Science
7.
Soltis DE, Soltis PS. Molecular Data and the Dynamic Nature of Polyploidy Molecular Data and the Dynamic Nature of Polyploidy. Crit Rev Plant Sci. 1993;12(3):243–273.
OpenUrl CrossRef Web of Science
8.↵
Haldane J. Theoretical Genetics of Autopolyploids. J Genet. 1930;22(3):359–372.
OpenUrl CrossRef Web of Science
9.↵
Parisod C, Holderegger R, Brochmann C. Evolutionary consequences of autopolyploidy. New Phytol. 2010;186(1):5–17.
OpenUrl CrossRef PubMed Web of Science
10.↵
Otto SP, Whitton J. Polyploid incidence and evolution. Annu Rev Genet. 2000;34:401–437.
OpenUrl CrossRef PubMed Web of Science
11.↵
Mather K. Segregation and Linkage in Autotetraploids. J Genet. 1936;32(2):287–314.
OpenUrl CrossRef Web of Science
12.
Fisher RA. The Theory of Linkage in Polysomic Inheritance. Philos Trans R Soc Lond B Biol Sci. 1947;233(594):55–87.
OpenUrl CrossRef
13.
Fisher RA. Allowance for double reduction in the calculation of genotype frequencies with polysomic inheritance. Ann of Eugen. 1954;12:169-171.
OpenUrl
14.
Hackett Ca, Bradshaw JE, McNicol JW. Interval mapping of quantitative trait loci in autotetraploid species. Genetics. 2001;159(4):1819–1832.
OpenUrl Abstract/FREE Full Text
15.↵
Luo ZW, Zhang RM, Kearsey MJ. Theoretical basis for genetic linkage analysis in autotetraploid species. Proc Natl Acad Sci USA. 2004;101:7040–7045.
OpenUrl Abstract/FREE Full Text
16.
Wu R, Ma CX, Casella G. A Bivalent Polyploid Model for Mapping Quantitative Trait Loci in Outcrossing Tetraploids. Genetics. 2004;166(1):581–595.
OpenUrl Abstract/FREE Full Text
17.↵
Leach LJ, Wang L, Kearsey MJ, Luo Z. Multilocus tetrasomic linkage analysis using hidden Markov chain model. Proc Natl Acad Sci USA. 2010;107:4270–4274.
OpenUrl Abstract/FREE Full Text
18.
Li J, Das K, Fu G, Tong C, Li Y, Tobias C, et al. EM Algorithm for Mapping Quantitative Trait Loci in Multivalent Tetraploids. Int J Plant Genomics. 2010;2010:216547.
OpenUrl PubMed
19.↵
Hackett CA, McLean K, Bryan GJ. Linkage analysis and QTL mapping using SNP dosage data in a tetraploid potato mapping population. PLoS One. 2013;8(5):e63939.
OpenUrl CrossRef PubMed
20.
Xu F, Lyu Y, Tong C, Wu W, Zhu X, Yin D, et al. A statistical model for QTL mapping in polysomic autotetraploids underlying double reduction. Brief Bioinform. 2013;15(6):1044–1056.
OpenUrl
21.↵
Zheng C, Voorrips RE, Jansen J, Hackett CA, Ho J, Bink MCAM. Probabilistic Multilocus Haplotype Reconstruction in Outcrossing Tetraploids. Genetics. 2016;203:119–131.
OpenUrl Abstract/FREE Full Text
22.↵
Kriegner A, Cervantes JC, Burg K, Mwanga ROM, Zhang D. A genetic linkage map of sweetpotato [Ipomoea batatas(L.) Lam.] based on AFLP markers. Mol Breed. 2003;11(3):169–185.
OpenUrl
23.
Arizio CM, Costa Tartara SM, Manifesto MM. Carotenoids gene markers for sweetpotato (Ipomoea batatas L. Lam): applications in genetic mapping, diversity evaluation and cross-species transference. Mol Genet Genomics. 2014;289(2):237–251.
OpenUrl PubMed
24.↵
Shirasawa K, Tanaka M, Takahata Y, Ma D, Cao Q, Liu Q, et al. A high-density SNP genetic map consisting of a complete set of homologous groups in autohexaploid sweetpotato (Ipomoea batatas). Sci Rep. 2017;7(February):44207.
OpenUrl CrossRef
25.↵
Wang J, Roe B, Macmil S, Yu Q, Murray JE, Tang H, et al. Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes. BMC Genomics. 2010;11(1):261.
OpenUrl CrossRef PubMed
26.↵
Garcia AAF, Mollinari M, Marconi TG, Serang OR, Silva RR, Vieira MLC, et al. SNP genotyping allows an in-depth characterisation of the genome of sugarcane and other complex autopolyploids. Sci Rep. 2013;3(1).
27.↵
Soltis DE, Visger CJ, Soltis PS. The polyploidy revolution then… and now: Stebbins revisited. Am J Bot. 2014;101(7):1057–1078.
OpenUrl Abstract/FREE Full Text
28.↵
Lewin HA, Larkin DM, Pontius J, O’Brien SJ. Every genome sequence needs a good map. Genome Res. 2009;19(11):1925–1928.
OpenUrl FREE Full Text
29.
Luo MC, Gu YQ, You FM, Deal KR, Ma Y, Hu Y, et al. A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor. Proc Natl Acad Sci USA. 2013;110(19):7940–7945.
OpenUrl Abstract/FREE Full Text
30.↵
Lemmon ZH, Doebley JF. Genetic Dissection of a Genomic Region with Pleiotropic Effects on Domestication Traits in Maize Reveals Multiple Linked QTL. Genetics. 2014;198:345–353.
OpenUrl Abstract/FREE Full Text
31.↵
Wu KK, Burnquist W, Sorrells ME, Tew TL, Moore PH, Tanksley SD. The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet. 1992;83(3):294–300.
OpenUrl CrossRef PubMed Web of Science
32.↵
Sorrells ME. Development and Application of RFLPs in Polyploids. Crop Sci. 1992;32(5):1086.
OpenUrl
33.↵
Ripol MI, Churchill GA, Silva JAGD, Sorrells M. Statistical aspects of genetic mapping in autopolyploids. Gene. 1999;235:31–41.
OpenUrl CrossRef PubMed Web of Science
34.↵
Doerge RW, Craig BA. Model selection for quantitative trait locus analysis in polyploids. Proc Natl Acad Sci USA. 2000;97(14):7951–7956.
OpenUrl Abstract/FREE Full Text
35.↵
van Geest G, Bourke PM, Voorrips RE, et al. An ultra-dense integrated linkage map for hexaploid chrysanthemum enables multi-allelic QTL analysis Theor Appl Genet. 2017;130:2527–2541.
OpenUrl
36.↵
Voorrips RE, Gort G, Vosman B. Genotype calling in tetraploid species from bi-allelic marker data using mixture models. BMC Bioinformatics. 2011;12(1):172.
OpenUrl CrossRef PubMed
37.↵
Serang O, Mollinari M, Garcia AA. Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids. PLoS One. 2012;7(2):e30906.
OpenUrl CrossRef PubMed
38.↵
1. MacKenzie G,
2. Peng D
Bargary N, Hinde J, Garcia AAF. Finite Mixture Model Clustering of SNP Data. In: MacKenzie G, Peng D, editors. Statistical Modeling in Biostatistics and Bioinformatics. Switzerland: Springer; 2014. p. 139–157.
39.↵
1. Batley J
Mollinari M, Serang O. Quantitative SNP Genotyping of Polyploids with MassARRAY and Other Platforms. In: Batley J, editor. Plant genotyping: methods and protocols. New York: Springer; 2015. p. 215–241.
40.↵
Hackett Ca, Bradshaw JE, Bryan GJ. QTL mapping in autotetraploids using SNP dosage information. Theor Appl Genet. 2014;127:1885–1904.
OpenUrl CrossRef PubMed
41.↵
Lander ES, Green P. Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci USA. 1987;84:2363–2367.
OpenUrl Abstract/FREE Full Text
42.↵
Mollinari M, Margarido GRA, Vencovsky R, Garcia AAF. Evaluation of algorithms used to order markers on genetic maps. Heredity. 2009;103:494–502.
OpenUrl CrossRef PubMed
43.↵
Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE, et al. MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics. 1987;1(2):174–181.
OpenUrl CrossRef PubMed
44.
Buetow KH, Chakravarti A. Multipoint Gene Mapping Using Seriation. I. General Methods. Am J Hum Genet. 1987;41:180–188.
OpenUrl PubMed Web of Science
45.
Doerge RW. Constructing Genetic Maps By Rapid Chain Delineation. J Quant Trait Loci. 1996;2:1–14.
OpenUrl
46.
Van Os H, Stam P, Visser RG, Van Eck HJ. RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet. 2005;112(1):30–40.
OpenUrl CrossRef PubMed
47.↵
Wu Y, Bhat PR, Close TJ, Lonardi S. Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph. PLoS Genet. 2008;4(10):1–11.
OpenUrl CrossRef
48.
Preedy KF, Hackett CA. A rapid marker ordering approach for high-density genetic linkage maps in experimental autotetraploid populations using multidimensional scaling. Theor Appl Genet. 2016;129(11):2117–2132.
OpenUrl
49.↵
Wang H, van Eeuwijk FA, Jansen J. The potential of probabilistic graphical models in linkage map construction. Theor Appl Genet. 2016;130:1–12.
OpenUrl CrossRef
50.↵
Van Ooijen JW, Jansen J. Genetic Mapping in Experimental Populations. Cambridge University Press; 2013.
51.↵
Burnham CR. Discussions in cytogenetics. Mineapolis: Burgess Publishing; 1962.
52.↵
Hackett CA. A comment on Xie and Xu: ‘Mapping quantitative trait loci in tetraploid species’. Genet Res. 2001;78(02):187–189.
OpenUrl PubMed
53.↵
Jiang C, Zeng ZB. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica. 1997;101(1997):47–58.
OpenUrl CrossRef PubMed Web of Science
54.↵
Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–286.
OpenUrl CrossRef
55.↵
Broman K, Sen S. A Guide to QTL Mapping with R/qtl. New York: Springer; 2009.
56.↵
Voorrips RE, Maliepaard CA. The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics. 2012;13(1):248.
OpenUrl CrossRef PubMed
57.↵
Wu R, Ma CX, Painter I, Zeng ZB. Simultaneous Maximum Likelihood Estimation of Linkage and Linkage Phases in Outcrossing Species. Theor Popul Biol. 2002;61(3):349–363.
OpenUrl CrossRef PubMed Web of Science
58.
Cao D, Craig BA, Doerge R. A model selection-based interval-mapping method for autopolyploids. Genetics. 2005;169(4):2371–2382.
OpenUrl Abstract/FREE Full Text
59.
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011;6(5):e19379.
OpenUrl CrossRef PubMed
60.
Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3(10):e3376.
OpenUrl CrossRef PubMed
61.↵
Gallais A. Quantitative genetics and breeding methods in autopolyploids plants. Paris: INRA; 2003.
62.↵
Mather K. The mesurement of linkage in heredity. London: Methuen & Co; 1957.
63.↵
Garcia AAF, Kido EA, Meza AN, Souza HMB, Pinto LR, Pastina MM, et al. Development of an integrated genetic map of a sugarcane (Saccharum spp.) commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases. Theor Appl Genet. 2006;112(2):298–314.
OpenUrl PubMed
64.
Oliveira KM, Pinto LR, Marconi TG, Margarido GRA, Pastina MM, Teixeira LHM, et al. Functional integrated genetic linkage map based on EST-markers for a sugarcane (Saccharum spp.) commercial cross. Mol Breed. 2007;20(3):189–208.
OpenUrl
65.
Pastina MM, Malosetti M, Gazaffi R, Mollinari M, Margarido GRA, Oliveira KM, et al. A mixed model QTL analysis for sugarcane multiple-harvest-location trial data. Theor Appl Genet. 2012;124(5):835–849.
OpenUrl CrossRef PubMed
66.↵
Palhares AC, Rodrigues-Morais TB, Van Sluys MA, Domingues DS, Maccheroni W, Jordao H, et al. A novel linkage map of sugarcane with evidence for clustering of retrotransposon-based markers. BMC Genet. 2012;13(1):51.
OpenUrl PubMed
67.↵
Hackett CA, Broadfoot LB. Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity. 2003;90(1):33–38.
OpenUrl CrossRef PubMed Web of Science
68.↵
Wu J, Jenkins J, Zhu J, McCarty J, Watson C. Monte Carlo simulations on marker grouping and ordering. Theor Appl Genet. 2003;107:568–573.
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted September 12, 2018.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5197)
Biochemistry (11697)
Bioengineering (8714)
Bioinformatics (29116)
Biophysics (14924)
Cancer Biology (12047)
Cell Biology (17347)
Clinical Trials (138)
Developmental Biology (9405)
Ecology (14136)
Epidemiology (2067)
Evolutionary Biology (18260)
Genetics (12214)
Genomics (16758)
Immunology (11838)
Microbiology (27986)
Molecular Biology (11544)
Neuroscience (60776)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3228)
Physiology (4936)
Plant Biology (10381)
Scientific Communication and Education (1679)
Synthetic Biology (2876)
Systems Biology (7331)
Zoology (1642)

[1] 1.↵
Soltis DE, Segovia-Salcedo MC, Jordon-Thaden I, Majure L, Miles NM, Mavrodiev EV, et al. Are polyploids really evolutionary dead-ends (again)? A critical reappraisal of Mayrose et al. (2011). New Phytologist. 2014;202(4):1105–1117.
OpenUrl

[2] 2.↵
Soltis PS,
Soltis DE
Birchler JA. Genetic Consequences of Polyploidy in Plants. In: Soltis PS, Soltis DE, editors. Polyploidy and Genome Evolution. Berlin: Springer-Verlag; 2012. p. 21–32.

[3] Soltis PS,

[4] Soltis DE

[5] 3.↵
Comai L. The advantages and disadvantages of being polyploid. Nature Rev Genet. 2005;6(11):836–846.
OpenUrl CrossRef PubMed Web of Science

[6] 4.↵
Osborn TC, Pires JC, Birchler JA, Auger DL, Chen ZJ, Lee HS, et al. Understanding mechanisms of novel gene expression in polyploids. Trends in Genetics. 2003;19(3):141 –147.
OpenUrl CrossRef PubMed Web of Science

[7] 5.↵
Sybenga J. Meiotic configurations. Berlin: Springer; 1975.

[8] 6.↵
Muller HJ. A New Mode of Segregation in Gregory’s Tetraploid Primulas. Am Nat. 1914;48(572):508–512.
OpenUrl CrossRef Web of Science

[9] 7.
Soltis DE, Soltis PS. Molecular Data and the Dynamic Nature of Polyploidy Molecular Data and the Dynamic Nature of Polyploidy. Crit Rev Plant Sci. 1993;12(3):243–273.
OpenUrl CrossRef Web of Science

[10] 8.↵
Haldane J. Theoretical Genetics of Autopolyploids. J Genet. 1930;22(3):359–372.
OpenUrl CrossRef Web of Science

[11] 9.↵
Parisod C, Holderegger R, Brochmann C. Evolutionary consequences of autopolyploidy. New Phytol. 2010;186(1):5–17.
OpenUrl CrossRef PubMed Web of Science

[12] 10.↵
Otto SP, Whitton J. Polyploid incidence and evolution. Annu Rev Genet. 2000;34:401–437.
OpenUrl CrossRef PubMed Web of Science

[13] 11.↵
Mather K. Segregation and Linkage in Autotetraploids. J Genet. 1936;32(2):287–314.
OpenUrl CrossRef Web of Science

[14] 12.
Fisher RA. The Theory of Linkage in Polysomic Inheritance. Philos Trans R Soc Lond B Biol Sci. 1947;233(594):55–87.
OpenUrl CrossRef

[15] 13.
Fisher RA. Allowance for double reduction in the calculation of genotype frequencies with polysomic inheritance. Ann of Eugen. 1954;12:169-171.
OpenUrl

[16] 14.
Hackett Ca, Bradshaw JE, McNicol JW. Interval mapping of quantitative trait loci in autotetraploid species. Genetics. 2001;159(4):1819–1832.
OpenUrl Abstract/FREE Full Text

[17] 15.↵
Luo ZW, Zhang RM, Kearsey MJ. Theoretical basis for genetic linkage analysis in autotetraploid species. Proc Natl Acad Sci USA. 2004;101:7040–7045.
OpenUrl Abstract/FREE Full Text

[18] 16.
Wu R, Ma CX, Casella G. A Bivalent Polyploid Model for Mapping Quantitative Trait Loci in Outcrossing Tetraploids. Genetics. 2004;166(1):581–595.
OpenUrl Abstract/FREE Full Text

[19] 17.↵
Leach LJ, Wang L, Kearsey MJ, Luo Z. Multilocus tetrasomic linkage analysis using hidden Markov chain model. Proc Natl Acad Sci USA. 2010;107:4270–4274.
OpenUrl Abstract/FREE Full Text

[20] 18.
Li J, Das K, Fu G, Tong C, Li Y, Tobias C, et al. EM Algorithm for Mapping Quantitative Trait Loci in Multivalent Tetraploids. Int J Plant Genomics. 2010;2010:216547.
OpenUrl PubMed

[21] 19.↵
Hackett CA, McLean K, Bryan GJ. Linkage analysis and QTL mapping using SNP dosage data in a tetraploid potato mapping population. PLoS One. 2013;8(5):e63939.
OpenUrl CrossRef PubMed

[22] 20.
Xu F, Lyu Y, Tong C, Wu W, Zhu X, Yin D, et al. A statistical model for QTL mapping in polysomic autotetraploids underlying double reduction. Brief Bioinform. 2013;15(6):1044–1056.
OpenUrl

[23] 21.↵
Zheng C, Voorrips RE, Jansen J, Hackett CA, Ho J, Bink MCAM. Probabilistic Multilocus Haplotype Reconstruction in Outcrossing Tetraploids. Genetics. 2016;203:119–131.
OpenUrl Abstract/FREE Full Text

[24] 22.↵
Kriegner A, Cervantes JC, Burg K, Mwanga ROM, Zhang D. A genetic linkage map of sweetpotato [Ipomoea batatas(L.) Lam.] based on AFLP markers. Mol Breed. 2003;11(3):169–185.
OpenUrl

[25] 23.
Arizio CM, Costa Tartara SM, Manifesto MM. Carotenoids gene markers for sweetpotato (Ipomoea batatas L. Lam): applications in genetic mapping, diversity evaluation and cross-species transference. Mol Genet Genomics. 2014;289(2):237–251.
OpenUrl PubMed

[26] 24.↵
Shirasawa K, Tanaka M, Takahata Y, Ma D, Cao Q, Liu Q, et al. A high-density SNP genetic map consisting of a complete set of homologous groups in autohexaploid sweetpotato (Ipomoea batatas). Sci Rep. 2017;7(February):44207.
OpenUrl CrossRef

[27] 25.↵
Wang J, Roe B, Macmil S, Yu Q, Murray JE, Tang H, et al. Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes. BMC Genomics. 2010;11(1):261.
OpenUrl CrossRef PubMed

[28] 26.↵
Garcia AAF, Mollinari M, Marconi TG, Serang OR, Silva RR, Vieira MLC, et al. SNP genotyping allows an in-depth characterisation of the genome of sugarcane and other complex autopolyploids. Sci Rep. 2013;3(1).

[29] 27.↵
Soltis DE, Visger CJ, Soltis PS. The polyploidy revolution then… and now: Stebbins revisited. Am J Bot. 2014;101(7):1057–1078.
OpenUrl Abstract/FREE Full Text

[30] 28.↵
Lewin HA, Larkin DM, Pontius J, O’Brien SJ. Every genome sequence needs a good map. Genome Res. 2009;19(11):1925–1928.
OpenUrl FREE Full Text

[31] 29.
Luo MC, Gu YQ, You FM, Deal KR, Ma Y, Hu Y, et al. A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor. Proc Natl Acad Sci USA. 2013;110(19):7940–7945.
OpenUrl Abstract/FREE Full Text

[32] 30.↵
Lemmon ZH, Doebley JF. Genetic Dissection of a Genomic Region with Pleiotropic Effects on Domestication Traits in Maize Reveals Multiple Linked QTL. Genetics. 2014;198:345–353.
OpenUrl Abstract/FREE Full Text

[33] 31.↵
Wu KK, Burnquist W, Sorrells ME, Tew TL, Moore PH, Tanksley SD. The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet. 1992;83(3):294–300.
OpenUrl CrossRef PubMed Web of Science

[34] 32.↵
Sorrells ME. Development and Application of RFLPs in Polyploids. Crop Sci. 1992;32(5):1086.
OpenUrl

[35] 33.↵
Ripol MI, Churchill GA, Silva JAGD, Sorrells M. Statistical aspects of genetic mapping in autopolyploids. Gene. 1999;235:31–41.
OpenUrl CrossRef PubMed Web of Science

[36] 34.↵
Doerge RW, Craig BA. Model selection for quantitative trait locus analysis in polyploids. Proc Natl Acad Sci USA. 2000;97(14):7951–7956.
OpenUrl Abstract/FREE Full Text

[37] 35.↵
van Geest G, Bourke PM, Voorrips RE, et al. An ultra-dense integrated linkage map for hexaploid chrysanthemum enables multi-allelic QTL analysis Theor Appl Genet. 2017;130:2527–2541.
OpenUrl

[38] 36.↵
Voorrips RE, Gort G, Vosman B. Genotype calling in tetraploid species from bi-allelic marker data using mixture models. BMC Bioinformatics. 2011;12(1):172.
OpenUrl CrossRef PubMed

[39] 37.↵
Serang O, Mollinari M, Garcia AA. Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids. PLoS One. 2012;7(2):e30906.
OpenUrl CrossRef PubMed

[40] 38.↵
MacKenzie G,
Peng D
Bargary N, Hinde J, Garcia AAF. Finite Mixture Model Clustering of SNP Data. In: MacKenzie G, Peng D, editors. Statistical Modeling in Biostatistics and Bioinformatics. Switzerland: Springer; 2014. p. 139–157.

[41] MacKenzie G,

[42] Peng D

[43] 39.↵
Batley J
Mollinari M, Serang O. Quantitative SNP Genotyping of Polyploids with MassARRAY and Other Platforms. In: Batley J, editor. Plant genotyping: methods and protocols. New York: Springer; 2015. p. 215–241.

[44] Batley J

[45] 40.↵
Hackett Ca, Bradshaw JE, Bryan GJ. QTL mapping in autotetraploids using SNP dosage information. Theor Appl Genet. 2014;127:1885–1904.
OpenUrl CrossRef PubMed

[46] 41.↵
Lander ES, Green P. Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci USA. 1987;84:2363–2367.
OpenUrl Abstract/FREE Full Text

[47] 42.↵
Mollinari M, Margarido GRA, Vencovsky R, Garcia AAF. Evaluation of algorithms used to order markers on genetic maps. Heredity. 2009;103:494–502.
OpenUrl CrossRef PubMed

[48] 43.↵
Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE, et al. MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics. 1987;1(2):174–181.
OpenUrl CrossRef PubMed

[49] 44.
Buetow KH, Chakravarti A. Multipoint Gene Mapping Using Seriation. I. General Methods. Am J Hum Genet. 1987;41:180–188.
OpenUrl PubMed Web of Science

[50] 45.
Doerge RW. Constructing Genetic Maps By Rapid Chain Delineation. J Quant Trait Loci. 1996;2:1–14.
OpenUrl

[51] 46.
Van Os H, Stam P, Visser RG, Van Eck HJ. RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet. 2005;112(1):30–40.
OpenUrl CrossRef PubMed

[52] 47.↵
Wu Y, Bhat PR, Close TJ, Lonardi S. Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph. PLoS Genet. 2008;4(10):1–11.
OpenUrl CrossRef

[53] 48.
Preedy KF, Hackett CA. A rapid marker ordering approach for high-density genetic linkage maps in experimental autotetraploid populations using multidimensional scaling. Theor Appl Genet. 2016;129(11):2117–2132.
OpenUrl

[54] 49.↵
Wang H, van Eeuwijk FA, Jansen J. The potential of probabilistic graphical models in linkage map construction. Theor Appl Genet. 2016;130:1–12.
OpenUrl CrossRef

[55] 50.↵
Van Ooijen JW, Jansen J. Genetic Mapping in Experimental Populations. Cambridge University Press; 2013.

[56] 51.↵
Burnham CR. Discussions in cytogenetics. Mineapolis: Burgess Publishing; 1962.

[57] 52.↵
Hackett CA. A comment on Xie and Xu: ‘Mapping quantitative trait loci in tetraploid species’. Genet Res. 2001;78(02):187–189.
OpenUrl PubMed

[58] 53.↵
Jiang C, Zeng ZB. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica. 1997;101(1997):47–58.
OpenUrl CrossRef PubMed Web of Science

[59] 54.↵
Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–286.
OpenUrl CrossRef

[60] 55.↵
Broman K, Sen S. A Guide to QTL Mapping with R/qtl. New York: Springer; 2009.

[61] 56.↵
Voorrips RE, Maliepaard CA. The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics. 2012;13(1):248.
OpenUrl CrossRef PubMed

[62] 57.↵
Wu R, Ma CX, Painter I, Zeng ZB. Simultaneous Maximum Likelihood Estimation of Linkage and Linkage Phases in Outcrossing Species. Theor Popul Biol. 2002;61(3):349–363.
OpenUrl CrossRef PubMed Web of Science

[63] 58.
Cao D, Craig BA, Doerge R. A model selection-based interval-mapping method for autopolyploids. Genetics. 2005;169(4):2371–2382.
OpenUrl Abstract/FREE Full Text

[64] 59.
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011;6(5):e19379.
OpenUrl CrossRef PubMed

[65] 60.
Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3(10):e3376.
OpenUrl CrossRef PubMed

[66] 61.↵
Gallais A. Quantitative genetics and breeding methods in autopolyploids plants. Paris: INRA; 2003.

[67] 62.↵
Mather K. The mesurement of linkage in heredity. London: Methuen & Co; 1957.

[68] 63.↵
Garcia AAF, Kido EA, Meza AN, Souza HMB, Pinto LR, Pastina MM, et al. Development of an integrated genetic map of a sugarcane (Saccharum spp.) commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases. Theor Appl Genet. 2006;112(2):298–314.
OpenUrl PubMed

[69] 64.
Oliveira KM, Pinto LR, Marconi TG, Margarido GRA, Pastina MM, Teixeira LHM, et al. Functional integrated genetic linkage map based on EST-markers for a sugarcane (Saccharum spp.) commercial cross. Mol Breed. 2007;20(3):189–208.
OpenUrl

[70] 65.
Pastina MM, Malosetti M, Gazaffi R, Mollinari M, Margarido GRA, Oliveira KM, et al. A mixed model QTL analysis for sugarcane multiple-harvest-location trial data. Theor Appl Genet. 2012;124(5):835–849.
OpenUrl CrossRef PubMed

[71] 66.↵
Palhares AC, Rodrigues-Morais TB, Van Sluys MA, Domingues DS, Maccheroni W, Jordao H, et al. A novel linkage map of sugarcane with evidence for clustering of retrotransposon-based markers. BMC Genet. 2012;13(1):51.
OpenUrl PubMed

[72] 67.↵
Hackett CA, Broadfoot LB. Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity. 2003;90(1):33–38.
OpenUrl CrossRef PubMed Web of Science

[73] 68.↵
Wu J, Jenkins J, Zhu J, McCarty J, Watson C. Monte Carlo simulations on marker grouping and ordering. Theor Appl Genet. 2003;107:568–573.
OpenUrl CrossRef PubMed Web of Science