AN ACCURATE GENETIC CLOCK

D. H. Hamilton

doi:10.1101/019414

Abstract

Molecular clocks give “Time to most recent common ancestor” TMRCA of genetic trees. By Watson-Galton¹⁷ most lineages terminate, with a few overrepresented singular lineages generated by W. Hamilton’s “kin selection”¹³. Applying current methods to this non-uniform branching produces greatly exaggerated TMRCA. We introduce an inhomogenous stochastic process which detects singular lineages by asymmetries, whose reduction gives true TMRCA. This implies a new method for computing mutation rates. Despite low rates similar to mitosis data, reduction implies younger TMRCA, with smaller errors. We establish accuracy by a comparison across a wide range of time, indeed this is only clock giving consistent results for both short and long term times. In particular we show that the dominant European y-haplotypes R1a1a & R1b1a2, expand from c3700BC, not reaching Anatolia before c3300BC. While this contradicts current clocks which date R1b1a2 to either the Neolithic Near East⁴ or Paleo-Europe²⁰, our dates support recent genetic analysis of ancient skeletons by Reich²³.

The genetic clock, computing TMRCA by measuring genetic mutations, was conceived by Emile Zuckerkandl and Linus Pauling ³²,³³ on empirical grounds. However work on neutral mutations by Motto Kimura¹⁶ gave a theoretical basis and formula. While our theory applies to general molecular evolution, we focus on the Y-chromosome with DYS regions (DNA Y-chromosome Segments) counting the “short tandem repeat” (STR) number of nucleotides of a micro satellite. In fact one uses many DYS sites, marked by j = 1, …N, each individual i, 1 = 1, ‥n, has STR number x_i,j. The Y-chromosome is passed unchanged from father to son, except for mutations x_i,j → x_i,j ± 1 occurring at rate

The fundamental assumption is that the sample population has a single patriarch at time t = TMRCA(generations). Now suppose the present (sample) population has mode m_j at DYS j. This is taken to be the STR value of the original patriarch. A calculation shows the present population with variance . Then averaging over the markers gives TMRCA = Σ_j V_j / (n Σ_j μ_j). This variance method and its variations we call KAPZ after its originators.

In practise problems soon arose. Mutation rates could be computed from mitosis, but sample sizes are too small to give great accuracy. Using these a KAPZ due to Zhivotovsky³¹,³¹ was applied to R1b1a2 by Myres²⁰ giving L23*(Turkey) giving 9000BC, σ = 2000.

Mutation rates could also be estimated from large family groups with genealogy data. However there are significant discrepancies in rates between different family groups. Also these “pedigree” rates are much larger than those from mitosis. A similar phenomena for the mitochondrial clock suggested high short term rates and lower long term rates¹⁴,¹⁵. So very low long term rates of .00069 were suggested³¹ for the Y-clock. We show this is unnecessary.

Another problem is that KAPZ is for large populations whereas ancient populations were small and modern samples can be tiny, e.g. n < 20. This led to the introduction of Bayesian methods such as BATWING²⁷, which considers all possible genealogical trees giving the present sample data, then searches for the tree of maximum likehood. But the BATWING TMRCA is often greater than KAPZ, e.g. for the Cinnioglu⁸ study of Anatolian DNA both methods were applied to the same data and mutation rates. For R1b1a2 the KAPZ had T M RCA = 9800BC compared with 18, 000BC for BATWING. Balaresque⁴ used BATWING to give an origin for R1b1a2 in Neolithic Anatolia c. 6000BC, but their statistics was disputed by Bushby²⁹. All of this was contradicted by Reich²² who found R1b1a2 in skeletons c 3300BC from Yamnaya cemeteries.

Figure 1:

A singular lineage increases variance and apparent TMRCA:

Singular Lineages

A fundamental problem is that present populations have highly overrepresented branches we call singular lineages. A well known example is the SNP L21 which is a branch of R1b1a2. Individuals identified as L21 are often excluded from R1b1a2 analysis because they skew the results. Such a singular lineage causes the variance to be much greater, even though the original TMRCA remains unchanged, see figure 1. For Bayesian methods such lineages are very unlikely giving an even greater apparent TMRCA. However one cannot deal with singular branches by excluding them. For one thing, our method will show that 50% of DYS show evidence of singular side branches, i.e. more than a SD from expected. Excluding them would also remove some of the oldest branches and produce a TMRCA which is too young. Now these singular lineages are very (mathematically) unlikely to arise from the stochastic system which is the mathematical basis of KAPZ (or the equivalent Monte-Carlo process modeling BATWING). We believe that the standard stochastic process is perturbed by other improbable events, which are then amplified by biological processes.

First, the Watson-Galton Process¹⁷ implies lineages almost certainly die out. Conversely, the “kin selection” of W.D. Hamilton¹³, shows kin co-operation gives genetic advantages. Consider three examples with well developed DNA projects. Group A of the Hamiltons has approximately 100, 000 descended from a Walter Fitzgilbert c 1300AD. Group A of the Macdonalds has about 700, 000 descendants from Somerfeld c1100AD, and Group A of the O’Niall has over 6 million descendants from Niall of the Seven Hostages, c300AD. These are elite groups with all the social advantages. One sees lines of chieftains, often polygamous. Our model has many extinct twigs with a few successful branches, whereas current models assume a uniform “star radiation”, see below

Reduction of Singular Lineages

Modelling singular lineages requires a new stochastic system where instead of a single patriarch we imagine many “virtual patriarchs”, each originating at tme t_k ago. Each of these giving a proportion 0 ≤ρ_k ≤1 of the present population. So we now have an inhomogenous expansion. Furthermore the symmetric model for mutations has to be changed to

We introduce asymmetric mutations and show how to compute it. Asymmetry will play a very important role in detecting singular lineages. This inhomogenous asymmetric system is mathematically equivalent to a mixed population. Computing its solution is an “inverse problem”. Unfortunately inversion is un-stable for such systems, also there is no unique solution. However it turns out that, up to a standard deviation SD, most DYS markers show at most one singular branch which is found from asymmetries in the distribution. These singular branches are then reduced revealing the original lineage. We then compute a branching time t_j for each marker j. The effect of reduction is dramatic, see Figure 3. Now the nonuniform branching process causes the t_j to be ran-domly distributed so their mean is not the TMRCA. Large errors in mutation rates means one cannot simply take the max t_j to be the TMRCA. Instead stochastic simulations of the branching process, using robust statistics to avoid outliers, find the most likely TMRCA, see Supplementary Material 1 (SM1) for full mathematical details.

These methods also imply a new way of computing mutation rates, see SM2. Previously, there were methods based on mitosis data or pedigree studies of family DNA projects (which gave quite different rates). We begin with 8 very large SNP projects from FTDNA using 37 markers, of course with unknown TMRCA and find mutation rates as the fixed points of a stochastic process. These take about 3 iterates to converge. After we discard markers with mutation SD > 33% we are left with 29 markers. We find the mutation rates are close to those obtained from mitosis and nearly 1/3 the values obtained by pedigree. Despite the fact that our mutation rates are lower than most studies, reduction of singular lineages produces more recent TMRCA than current models.

Examples

Our clock is the only one with across the board consistent results:

View this table:

Table 1:

TMRCA for Medieval groups.

Archeological finds convinced Marija Gimbutas¹¹ to attribute Proto Indo-European (PIE) to the Yamnaya Culture c 3500BC of the Russian Steppes, see Anthony². This is consistent with mainstream linguistic theory, some even wrote of linguistic DNA. But actual genetics was ignored because this contradicts current genetic clocks. Now the dominant European y-haplotypes are R1b1a2 & R1a1a (which like other y-haplotypes is marked by a unique single nucleotide polymorphism (SNP) mutation). Table 2 shows the expansion times of c3700BC, similar for regions Russia, Poland, Germany and Scandinavia. The times are so close only Scandinavia is significantly later. This data is from FTDNA projects for region X only using individuals with named ancestor from These independent results agree within the standard deviation (SD), with dates matching the Corded Ware Culture, a semi-nomadic people with wagons and horses who expanded west from the Urkraine c3000BC. This is consistent with the oldest R1b1a2, R1a1a skeletons being from the Yamnaya Culture²³.

View this table:

Table 2:

R1b1a2, R1a1a independent comparison

An interesting intermediate step occurs between the medieval and eneolithic. The mythical Irish Chronicles relate that the O’Niall descend directly from the first Gaelic High Kings, which tradition dated c1300-1600BC. The O’Niall have the unique mutation M222 which is a branch of the haplotype L21. For L21, n = 1029, we compute TMRCA = 1600BC and SD σ = 320. These are dates for proto Celtic, i.e. what archeologists call the pre Urnfelder Cultures, c. 1300-1600BC, see SM5. Furthermore L21 is in turn a branch of haplotype P312 which we date to 2300BC. This date suggests the Bell Beaker Culture of Western Europe. Indeed the only known²³ Bell Beaker genome is P312 with ¹⁴C date 2300BC.

Our method requires large data sets and many markers which means we have to rely on data from FTDNA, finding 29 useable markers out of standard 37 they use. In fact many researchers⁴ have used FTDNA data. We think our method of reduction with robust statistics solves any problems with this data. To test this we compared our results with R1a1a1 data obtained from Underhill²⁶ with n = 974(which involved excluding his four M420 individuals and others with missing markers), and 15 useable markers. The result was 2550BC, σ = 400, within the CI of our R1a1a results. Table 5 shows the results of extensive simulations using random subsets of our FTDNA data, for 29, 15 and 7 markers. For the same 15 markers as the Underhill²⁶ the different FTDNA data gives very similar 3300BC, σ = 840 for R1a1a, verifying the correctness of using FTDNA data. However once you get down to 7 markers the confidence interval becomes large, e.g. R1a1a gives 3400BC, σ = 1500. Also it becomes difficult to deal with outliers.

An example with few markers is R1b1a2 data of Balaresque⁴. Our method (this time with 7 useable markers) gave SD > 30%, see Table 6. Now Balaresque⁴ used the Bayesian method BATWING²⁹ to suggest a Neolithic origin in Ana-tolia. With the same Cinnioglu⁸ data our method gives for Turkish R1b1a2 (n = 75) a TMRCA = 5300BC, σ = 3100, i.e. anytime from the Ice Age to the Iron Age. Fortunately, once again, we find good data from FTDNA: the Armenian DNA project, see Table 3. By tradition the Armenians entered Anatolia from the Balkans c1000BC so they might not seem a good example of ancient Anatolian DNA. But some 100 generations of genetic diffusion has resulted in an Armenian distribution of Haplotypes J, G, R1b1a2 closely matching that of all Anatolians, therefore representive of typical Anatolian DNA. We see that Anatolian R1b1a2 arrived after c3300BC, ruling out the Neolithic expansion c6000BC. When dealing with regional haplotypes, e.g. R1b1a2 in Anatolia, the TMRCA is only a upper bound for the arrival times, for the genetic spread may be carried by movements of whole peoples from some other region.

View this table:

Observe that our TMRCA for Armenian G2a2b (formerly G2a3) and J2 show them to be the first Neolithic farmers from Anatolia, i.e. older than 7000BC. In Table 4 we compared J2, G2a2b for all of Western Europe (non-Armenian data). Our dates show J2 was expanding at the end of the Ice Age. Modern J2 is still concentrated in the fertile crescent, but also in disconnected regions across the Mediterranean. The old genetic model predicted a continuous wave of Neolithic farmers settling Europe. But you cannot have a continuous maritime settlement: it must be leap-frog. Also repeated resettlement from the Eastern Mediterranean has mixed ancient J2 populations, and our method gives the oldest date. On the other hand G2a2b shows exactly the dates expected from a continuous wave of Neolithic farmers across Central Europe, consistent with Neolithic skeletons showing G2a2b (e.g. the famous Iceman).

View this table:

Discussion

History, archeology, evolutionary biology, not to mention epidemics (e.g. dating HIV), forensic criminology and genealogy are just some of the applications of molecular clocks. Unfortunately current clocks have been found to give only “ballpark” estimates. Our method is the only one giving accurate time, at least for the human y-chromosome verified over the period 500 - 15, 000ybp. Our methods should also give accurate times for mitochondrial and other clocks.

Many geneticists thought natural selection makes mutation rates too variable to be useful. The problem is confusion between the actual biochemical process giving mutations and superimposed processes like kin selection producing apparently greater rates. Notice that the SD for our mutation rates is on average 14% which is much smaller than the actual previous rates. We believe this small SD proves the reality of neutral mutation rates of Moto Kimura¹⁶.

While our method is accurate for “big data”, applications to genetics, forensics, genealogy require the TMRCA between just two individuals, or between two species. Now for this “2-body problem” we cannot determine what singular lineages the branching has been through: with mutations either exaggerated or suppressed. Thus previous methods for small samples are at best unreliable. It is an important problem to find what accuracy is possible for small samples.

In checking accuracy we ran into the question of the origins of PIE. Although there are genes for language there is certainly none for any Indo-European language. Thus inferences have to be indirect. Marija Gimbutas saw patterns in symbolism and burial rituals suggesting the Yamnaya Culture was the cradle of Proto Indo-European. Also their physiology was robustly Europeanoid unlike the gracile skeletons of Neolithic Europe, but this could be nutrition and not genetic. So it was an open question whether the spread of this robust type into Western Europe in the late Neolithic marked an influx of Steppe nomads or a revolution in diet.

Reich²³ observed all 6 skeletons from Yamnaya sites, c 3300BC by ¹⁴C dating, are either R1a1b1 and R1a1a. But that method could not date the origin of R1a1b1 and R1a1a. Our TMRCA shows both these haplotypes expanding at essentially the same time c3700BC. This, together with our later date for Anatolia, implies that R1b1a2 and R1a1a must have originated in the Yamnaya Culture, c 3700BC. Furthermore, considering the correlation of haplotypes R1b1a2 and R1a1a with Indo-European languages (i.e. all countries with R1b1a2 & R1a1a frequency > 50% speak Indo-European), this provides powerful evidence for the origin of Proto Indo-European.

Supplementary Material 2: Accurate Mutation rates

Any genetic clock depends on reasonably accurate mutation rates. The mitosis method looks for mutations in sperm samples. Forensics uses father-son studies. However typical rates of μ = .002 would require nearly 50, 000 pairs to get an SD of 10%. Small samples have meant large errors. The pedigree approach is to study large family groups with well developed DNA/genealogy data. So inverting the KAPZ formula would yield accurate rates. However, singular lineages makes this problematic. Genealogical data might give mutation rates much greater than the biochemical rates because kin selection etc tend to exaggerate the apparent mutation rate. An inspection of 10 different sources finds mutation rates claiming SD ∼10% yet they differ from each other by up to 100%. We describe a new method.

To compute our rates we apply our theory to the large DNA projects for the SNP M222, L21, P312, U106, R1b1a2, I1, R1a1a. This avoids dealing with populations such as family DNA projects which are self selecting, i.e only those with the correct surname which neglects distant branches. Also we have very large samples, our average n > 1000. Greater accuracy should come from more generations and individuals. The problem is that we do not know their TMRCA.

Asymmetric Mutation

However before computing mutation rates we must consider asymmetric mutations, i.e. the left and right mutation rates μ_j,-1 ≠ μ_j,1. For a uniform stochastic process we again use the asymmetric ratio to define the asymmetric constant A_j ∈ [0, 1] for marker j. For example A_j = 0.5 is complete symmetry. Of course singularities will effect this ratio, however these only occur < 50% of markers. Thus for each marker, SNP we compute this ratio. We find the SD for each SNP is relatively small while the difference between SNP can be large. However for each marker, using 8 SNP enables outliers to be easily removed leaving allowing us to use simple linear regression: i.e. average of the A_j over the remaining SNP groups. We see that asymmetry is a real effect: 50% of the A_j are more than two SD from symmetry A_j = 0.5.

Observe this is significant. The total second moment is

So using all our 33 DYS markers with our μ_j, we compute constants

The KAPZ formula gives variance V = μt compared to the corrected formula μt + τt². The uncorrected KAPZ gives an overestimate > 400% for > 200 generations. This effect can be nullified by using the mean instead of the mode, variance instead of the second moment, however failing to do so gives a large error. Furthermore other methods which assume symmetric mutations will also be inaccurate. Having estimates on the asymmetry is essential to our method because we find singular lineages by looking for asymmetry in the data. Any such anomaly needs to be significantly greater than the natural asymmetry.

Mutation Rates as a fixed Point

Next we compute mutation rates using 8 very large SNP groups. First, using the asymmetric constants we find singular lineages and reduce their effect. We take account of the error in the A_j by a bootstrap technique, which gives the variance for each frequency f (j, 0). For a given SNP k if markers j started their expansion at the same time TMRCA T_k we could calculate mutation rates μ_j via or rather average the 8 different μ_j we would obtain. However because of branching caused by extinction of lineages the different markers do not originate at the same time but at different times t_j. In this case we expect these t_j to be randomly distributed about the log mean over a middle set of times t_j. So, for each SNP group k = 1, ‥8 define mean time T_k, not the TMRCA but the mean log mean over a middle set of markers, which is less. We find that this is very stable. So for a fixed marker j the data τ_k,j = t_j - T_k should be randomly distributed about zero over the different SNP k = 1, ‥, 8. However the wrong choose of μ_j would give a bias. In fact this is what we see if the mutation rates μ_j = .002 were chosen. In appendix graphs show the τ_k,j, k = 1, ‥8 bunched around a nonzero point. Thus we try to find μ_j so that the τ_k,j, k = 1, 2, ‥8 has mean zero. However the τ_k,j, k = 1, 2, ‥8 depend nonlinearly on the rates μ_j, as does the mean T_k, k = 1, ‥8. We find this nonlinear regression problem is solved by an iterative scheme which starts with any reasonable set of DNA rates, finding any reasonable choice iterates to the same final answer. So choose μ_j = .002 to begin. Suppose at some stage we have apparent mutation rates μ_j. Then, for each SNP, and each marker we solve equation (1) to obtain the apparent t_j. For each SNP k = 1, ‥8 we compute the mean log time T_k. At the next step we get new rates from

Averaging , k = 1, ‥8 we get our next set of μ_j of mutation rates. However this method would be effected by a marker showing a singular lineage. Fortunately these are few in number and by comparison between the different SNP we remove the outliers. We then repeat the process, computing T_k again with the new rates, and another set of mutation rates. So we have an iterative process.

One problem is that the iterates could tend to decrease to zero or increase to ∞, as we are only calculatin),g relative rates. To prevent this we renormalize after each iteration so the total Σμ_j is constant. We found the iterative scheme quickly converges to a fixed set of mutation rates, unique up to a constant factor. The CI is computed by bootstrap parametrized by the uncertainties in data and the asymmetric constants. In figure we show the distribution of τ_k,1, k = 1.2, ‥8 before and after the first iteration.

The generation factor γ

This method does not give absolute mutation rates but relative mutation rates μ_jγ, where γ is universal time scale constant. To find γ we apply our method to compute the T = T M RCA of three famous DNA projects and choose γ so the scaled T /γ best fits the historical record. We choose the DNA projects for the O’Niall(M222), Gp A of Macdonald (R1a1a) and Gp A of the Hamiltons (I1). These are large groups with characteristic DNA and fairly accurate times of origin. Of course finding one constant γ from three projects is inherently more accurate than using one project to find 33 different mutation rates. Actually assuming a generation of 27years these three projects yield γ = 1 with about 5% error, i.e. there is no actual need for this correction. This is a constant error (like uncalibrated ¹⁴C dating).

Thus γ is related to the length of a generation. Most researchers use 25yrs for t > 500ybp and 27yrs for t < 500ybp. Balaresque and al used 30yrs based on Finer who sees a 30yr generation for modern hunter-gatherers. (Although for most of the time R1b1a2 were subsistence farmers and not hunter gatherers.) At first glance our theory allows any nominal generation as it really doesn’t matter, being included in the γ factor which we compute in years not generations. Actually its not as simple as that. While our three DNA projects being post 1000AD elites have a 27yr generation the problem is what to do for t > 2000ybp. Now 25y may be appropriate for subsistence farmers but we found that singular lineages of the elite have exaggerated effect so 27 years seems appropriate.

Mutation Rates: Hamilton vs Mitosis and pedigree

View this table:

Asymmetric rates

View this table:

The log distribution of τ_k,1, k = 1.2, ‥8 before iteration at marker j = 1, ie DYS 393, but after reduction ¹(μ_j = .002). The SNP are colored:

After just one iterate we get

So 5 of our τ_k,1, k = 1.2, ‥8 bunch around zero, outliers are U106 and I1.

The iterative scheme converges to stable values very fast, 7 iterates is enough.

References

[1].
Burgarella, C et al. Mutation rate estimates for 110 Y-chromosome STRs combining population and father-son pair data, European Journal of Human Genetics (2011) 19, 70–75; doi:10.1038/ejhg.2010.154
OpenUrl CrossRef PubMed
[2].↵
George B. J. Busby et al. The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269, Proc Biol Sci. 2012 Mar 7; 279(1730):884–92. doi: 10.1098/rspb.2011.1044
OpenUrl CrossRef PubMed
[3].
Fenner, J.N. Cross-Cultural Estimation of the Human Generation Interval for Use in Genetics-Based Population Divergence Studies, American Journal of Physical Anthropology,(2005) 128:415–428.
OpenUrl CrossRef PubMed Web of Science
[4].
Ho S.Y.W, Phillips M.J, Cooper A., Drummond A.J. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Molecular Biology & Evolution 22 (7): 1561–1568(2005).
OpenUrl CrossRef PubMed Web of Science
[5].
Kimura, M. Evolutionary rate at the molecular level. Nature 624 – 626 (1968); doi:10.1038/217624a0
OpenUrl CrossRef PubMed Web of Science
[6].
Zhivotovsky LA, Underhill PA, Cinnioglu C et al. The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time, Am J Hum Genet (2004) 74: 50–61.
OpenUrl CrossRef PubMed Web of Science

Supplementary Material 3: Reduction of Singular Lineages vs KAPZ

We compare results for our method with KAPZ, for the same data, 29 markers and our mutation rates

First we compare for groups with medieval expansions

View this table:

Next we compare SNP G2a2b, R1b1a2, R1a1a, I1, L21, U106, J2, P312:

View this table:

RSL and KAPZ will give similar results if there is a fast expansion and thus insignificant singular lineages and branching. Actually this is to be expected sometimes, i.e. it is not surprising that the results using RSL and KAPZ for O’Niall, R1a1a, U106 are very similar.

However in other cases the KAPZ results are about 30% too old. In the case of the Hamiltons and Macdonalds absurdly so. For R1b1a2 it gives an early Neolithic age, compared with eneolithic for R1a1a, yet these have been dated to the same Yamanya times. The KAPZ dates for L21 “Celtic” is nearly 2000 years before Urnfelder Culture.

Of course one might try to “improve” KAPZ by increasing the mutation rates by 33% so the KAPZ times are decreased by 25%. Then the medieval dates look reasonable but we find 3100BC for G2a2b which is too late. For R1a1a we would get 2300BC which is not only too late but significantly different from the 3600BC for R1b1a2. Also G2a2b would be predated by R1b1a2 even though the latter has never been found in Neolithic sites of Europe. Getting consistent results across the span of history was a problem of previous clocks.

Footnotes

↵¹ The calculations and figures for all 33 markers is shown in SM

References

[1].
Ammerman, A. J. and Cavalli-Sforza, L. L. The Neolithic Transition and the Genetics of Populations in Europe, Princeton Univ. Press, Princeton(1984).
[2].↵
Anthony D.W., The Horse, the Wheel and Language, Princeton Univ. Press, Princeton(2007).
[3].
Arredi B., Poloni E., Tyler-Smith C. The Peopling of Europe. Anthropological Genetics: Theory, Methods and Applications. Cambridge University Press: Cambridge(2010).
[4].↵
Balaresque P. et al. A predominantly Neolithic Origin for European paternal Lineages, PLoS Biol (2010); 8: e1000285.
OpenUrl CrossRef PubMed
[5].
Burgarella, C et al. Mutation rate estimates for 110 Y-chromosome STRs combining population and father-son pair data, European Journal of Human Genetics (2011) 19, 70–75; doi:10.1038/ejhg.2010.154
OpenUrl CrossRef PubMed
[6].
George B. J. Busby et al. The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269, Proc Biol Sci. 2012 Mar 7; 279(1730):884–92. doi: 10.1098/rspb.2011.1044
OpenUrl CrossRef PubMed
[7].
Chiaroni J, Underhill PA, Cavalli-Sforza ll Y chromosome diversity, human expansion, drift, and cultural evolution, Proc Natl Acad Sci USA (2009); 106: 20174–20179.
OpenUrl Abstract/FREE Full Text
[8].↵
Cinnioglu C, King R, Kivisild T, Kalfoglu E, Atasoy S, et al. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet. (2004);114:127–148.
OpenUrl CrossRef PubMed Web of Science
[9].
Edmonds C., Lillie A., Cavalli-Sforza L.L. Mutations arising in the wave front of an expanding population, Proc Natl Acad Sci USA (2004); 101: 975–979.
OpenUrl Abstract/FREE Full Text
[10].
Fenner, J.N. Cross-Cultural Estimation of the Human Generation Interval for Use in Genetics-Based Population Divergence Studies, American Journal of Physical Anthropology,(2005) 128:415–428.
OpenUrl CrossRef PubMed Web of Science
[11].↵
Gimbutas M. The Prehistory of Eastern Europe, Part 1, (1956) Cambridge:American School of Prehistoric Research #20.
[12].
Goldstein D. B, Linares A. R, Cavalli-Sforza L. L, Feldman M. W. An evaluation of genetic distances for use with microsatellite loci Genetics(1995);139:463–471.
OpenUrl Abstract/FREE Full Text
[13].↵
Hamilton, W.D. The genetical evolution of social behaviour. I, II. Journal of Theoretical Biology 7 (1) (1964): 1–52.
OpenUrl CrossRef PubMed Web of Science
[14].↵
Ho S.Y.W, Phillips M.J, Cooper A., Drummond A.J. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Molecular Biology & Evolution 22 (7): 1561–1568(2005).
OpenUrl CrossRef PubMed Web of Science
[15].↵
Hunt, J.S., Bermingham, E., and Ricklefs, R.E. Molecular systematics and biogeography of Antillean thrashers, tremblers, and mockingbirds (Aves: Mimidae)”. Auk 118 (1): 35 – 55 doi:10.1642/0004-8038(2001)118
OpenUrl CrossRef
[15].
Lee, E.J. et al. Emerging Genetic Patterns of the European Neolithic: Perspectives From a Late Neolithic Bell Beaker Burial Site in Germany, (2012) American J. of Physical Anthropology 148 : 571–579
OpenUrl
[16].↵
Kimura, M. Evolutionary rate at the molecular level. Nature 624 – 626 (1968); doi:10.1038/217624a0
OpenUrl CrossRef PubMed Web of Science
[17].↵
Kendall, D. G. The Genealogy of Genealogy Branching Processes before (and after) 1873. (1975) Bulletin of the London Mathematical Society 7 (3): 225–253.
OpenUrl CrossRef
[18].
Mallory, J. P. In Search of the Indo-Europeans: Language, Archaeology and Myth(1989) London: Thames & Hudson.
[19].
Menozzi P, Piazza A, Cavalli-Sforza L. L. Synthetic maps of human gene frequencies in Europeans. Science(1978) 201:786–792
OpenUrl Abstract/FREE Full Text
[20].↵
Myre, N. et al. A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe, European Journal of Human Genetics (2010), 1–7, 1018-4813/10
[21].
Ohta, T., and Kimura, M. The model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a genetic population. Genet. Res. (1973)22: 201–204.
OpenUrl CrossRef PubMed Web of Science
[22].↵
1. M. Abramowitz
Olver, F. W. J. Bessel functions of integer order pp. 355–434 Handbook of Mathematical Functions (1964)edited by M. Abramowitz National Bureau of Standards, Washington.
[23].↵
Reich, D, et al. Massive migration from the steppe was a source for Indo-European languages in Europe, Nature (2015) doi:10.1038/nature14317
OpenUrl CrossRef PubMed
[24].
Renfrew, A.C., Archaeology and Language:The Puzzle of Indo-European Origins, (1987)London: Pimlico.
[25].
Rootsi S. et al. Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus, Eur J Hum Genet. 2012 Dec; 20(12):1275–82. doi: 10.1038/ejhg.2012.86.
OpenUrl CrossRef PubMed
[26].↵
Peter A Underhill et al., The phylogenetic and geographic structure of Y-chromosome haplogroup R1a, European Journal of Human Genetics (2015) 23, 124–131; doi:10.1038/ejhg.2014.50; published online 26 March 2014
OpenUrl CrossRef PubMed
[27].↵
Walsh, B. Estimating the Time to the Most Recent Common Ancestor for the Y chromosome or Mitochondrial DNA for a Pair of Individuals, Genetics 158: 897–912(2001)
OpenUrl Abstract/FREE Full Text
[28].
Wehrhahn, C. F. The evolution of selectively similar electrophoretically detectable alleles in finite natural populations, Genetics 80(1975) 375–394.
OpenUrl Abstract/FREE Full Text
[29].↵
Wilson I. J, Weale M. E, Balding D. J. Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. J Roy Statist Soc A. (2003);166:1–33.
OpenUrl
[30].
Zhivotovsky, L.A. Estimating Divergence Time with the Use of Microsatellite Genetic Distances, Oxford J. Life Sciences etc 18, 700–709.
[31].↵
Zhivotovsky LA, Underhill PA, Cinnioglu C et al. The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time, Am J Hum Genet (2004) 74: 50–61.
OpenUrl CrossRef PubMed Web of Science
[32].↵
1. Kasha, M. and
2. Pullman, B
Zuckerkandl, E. and Pauling, L.B. (1962) Molecular disease, evolution, and genic heterogeneity. In Kasha, M. and Pullman, B (editors). Horizons in Biochemistry. Academic Press, New York. pp. 189–225.
[33].↵
1. Bryson, V. & and
2. Vogel, H.J.
Zuckerkandl, E. and Pauling, L.B. (1965) Evolutionary divergence and convergence in proteins. In Bryson, V. & and Vogel, H.J. (editors). Evolving Genes and Proteins. Academic Press, New York. pp. 97–166.