PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND IGOR YANOVICH Universit¨at T¨ubingen

(1)

HOMELAND

IGOR YANOVICH Universit¨at T¨ubingen

Abstract. [Sicoli and Holton, 2014] (PLOS ONE 9:3, e91722) use computational phylogenetics to argue that linguistic data from the putative, but likely Dene-Yeniseian macro-family are better compatible with a homeland in Beringia (i.e. northeastern Siberia plus northwestern Alaska) than with one in central Siberia or deeper Asia. I show that a more careful examination invalidates that conclusion: in fact, linguistic data do not support Beringia as the homeland. In the course of showing that, I discuss, without requiring a deep mathematical background, a number of methodological issues concerning computational phylogenetic analyses of linguistic data and drawing inferences from them. I conclude with a brief overview of the current evidence bearing on the Dene-Yeniseian homeland from linguistics, archaeology, folklore studies and genetics, and suggest current best practices for linguistic phylogenetics which would have helped to avoid some of the problems in Sicoli and Holton’s Dene-Yeniseian study.

The Dene-Yeniseian language macro-family, [Vajda, 2011], [Vajda, 2013], is argued to consist of the Yeniseian languages in central Siberia (with the only living representative Ket) and Na-Dene languages in North America, including Athabaskan languages, Eyak and Tlingit. The macro-family is still only putative, as many open questions remain (see [Campbell, 2011], [Starostin, 2012], as well as the reply in [Vajda, 2012]). However, the family does appear to be quite likely, and has been widely accepted as such (for instance, [Kiparsky, 2015]). [Vajda, 2018] overviews recent work on Dene-Yeniseian, and assesses the current state of research in a balanced way.¹

[Sicoli and Holton, 2014] apply computational phylogenetic methods to typological data from Dene-Yeniseian languages in order to address the question of where their homeland was. The test that they apply is very simple: Sicoli and Holton examine the shape of the obtained family trees or networks, and determine whether they support a basal split into separate Yeniseian and Na-Dene branches. They find that their analysis does not support such a split, which effectively amounts to saying that there was no Proto-Na-Dene stage that is ancestral to all Na-Dene languages but excludes Yeniseian. In other words, Sicoli and Holton’s analysis says that the basal split was not between Yeniseian and Na-Dene, but between some Na-Dene I and [Na-Dene II + Yeniseian].

(Anticipating the discussion in Section 3: such grouping has no basis in the careful historical- linguistic work on the Na-Dene and Yeniseian families. Even a brief examination of any aspect of

Date: April 14, 2019.

1This paper has greatly benefitted from discussions with and comments by Chris Bentz, Johannes Dellert, Gerhard J¨ager, Alexei Kassian, Taraka Rama, Yonatan Sahle, Johannes Wahle, and Joseph Wilson; from the very helpful suggestions by the four reviewers for Diachronica and the editors of the journal; from the help in locating some of the relevant literature by Alexei Kassian and Elena Krjukova; and from presentations at the EVOLAEMP project grouphttp://www.evolaemp.uni-tuebingen.de/and the DFG Center for Advanced Study “Words, Bones, Genes, Tools”. Special thanks are due to Phillip Endicott for our discussions of genetic evidence in [Flegontov et al., 2016]

and [Flegontov et al., 2017]. Of course, all possible errors in the article are my responsibility only. Research reported here was supported by DFG under project FOR 2237, establishing the said Center for Advanced Study, which is hereby gratefully acknowledged.

1

(2)

the data, be it cognacy sharing, sound changes, or morphological structures, will demonstrate that Yeniseian cannot be in a grouping with some part of Na-Dene to the exclusion of other Na-Dene languages. Sicoli and Holton do not discuss or even explicitly note that their analysis contradicts a large body of research on the languages in question). From this inferred history of splitting, Sicoli and Holton conclude that the homeland of the Dene-Yeniseian family must have been in Beringia rather than in Siberia, or generally Asia excluding Beringia.

Importantly, the conclusions of [Sicoli and Holton, 2014] have been taken up as valid by specialists outside linguistics. The so-called Beringian standstill hypothesis is an important issue in current studies of the peopling of Americas that use genetic data. That hypothesis argues that there was a single (though likely structured) human group that later rapidly colonized the Americas, and that it has been isolated for several thousand years from other human populations even before entering the two new continents. Such a scenario appears likely given current results from genetics. A reasonable place where that isolation period could have happened would be Beringia: Northeastern Siberia, Western Alaska and the Bering land bridge, which is currently under water. [Watson, 2017]

provides a popular review of the history and the evidence for the Beringian standstill hypothesis.

That review cites [Sicoli and Holton, 2014] as implying that humans occupied Beringia during the Last Glacial Maximum (LGM), a period of maximum glacier extent at around 26-20 thousands of years ago [Clark et al., 2009]. [Watson, 2017] cites a p.c. by Gary Holton, one of the authors of [Sicoli and Holton, 2014], saying that their study supports “at least a period of occupation and diversification within the Beringian area, and probably somewhere within the southwestern Alaskan area”. Similarly, [Hoffecker et al., 2016], a careful review examining multiple lines of evidence for the Beringian standstill, also relies on [Sicoli and Holton, 2014] for linguistics, stating that “a recent analysis of the Na-Dene and Yeniseian languages indicates a back-migration from Beringia into Siberia and central Asia rather than the reverse.” [Hoffecker et al., 2016] conclude their paper saying that “many questions remain unanswered regarding the complicated movements of people and/or genes into and out of Beringia after the LGM. Some of the answers have been documented with archeological, linguistic, and genetic data, but others are problematic or disputed”, where

“linguistic data” refers primarily to [Sicoli and Holton, 2014]’s work. In other words, Sicoli and Holton’s conclusion that linguistics firmly supports human occupation of Beringia has become very popular outside linguistics.

While aiming to contribute linguistic evidence to the Beringian debate is commendable, unfortunately, there are several problems with the argument of [Sicoli and Holton, 2014]. First, Sicoli and Holton’s assessment that the shape of the Dene-Yeniseian language-family tree bears on the Beringian question is overly optimistic, as I discuss below in Section 1. Secondly, the tree structure that Sicoli and Holton obtained in their Bayesian phylogenetic analysis is not robust to the choice of tree priors: with a different tree prior than the one used by Sicoli and Holton, we obtain

“strong evidence” for the traditional phylogeny of the macro-family, where the basal split is into the Yeniseian and the Na-Dene group. This means that the resulting shape of the tree crucially depends on a technical choice. It also means, importantly, that the linguistic data in Sicoli and Holton’s dataset are insufficient to infer the true tree of the family: with large amounts of data, the linguistic information should in principle override the preferences induced by the tree prior. I discuss the general logic behind Bayesian Markov Chain Monte Carlo (MCMC) inference of linguistic phylogenies (the computational method used by [Sicoli and Holton, 2014]) as well as the specific problem with tree priors in Section 2. Finally, even though computational methods are of no help in this case (due to insufficient data) for deciding the general shape of the tree, there is plenty of historical-linguistic information that supports the traditional phylogeny against Sicoli and Holton’s novel proposal, Section 3. In particular, Yeniseian and Na-Dene language families are different enough that historical linguists still express caution regarding whether they can be considered a macro-family, see e.g. [Campbell, 2011]. On the other hand, there is no question that the Athabaskan subfamily of Na-Dene is a genuine genetic grouping. The data firmly rule out that

(3)

a subset of the Athabaskan languages within Na-Dene could be more closely related to Yeniseian than to the rest of Athabaskan — that is, the phylogeny that Sicoli and Holton defend.

The Dene-Yeniseian analysis by [Sicoli and Holton, 2014], despite being an innovative take on the issue, thus suffers from three serious problems: (i) different homeland and migration hypotheses are compatible with both types of linguistic phylogenies, so the latter cannot help us decide between the former (Section 1); (ii) the family-tree structure inferred by Sicoli and Holton is not robust to the choice of the tree prior: under a different reasonable prior, the alternative tree structure is inferred, which the authors thought they could reject with certainty (Section 2); (iii) while Sicoli and Holton’s dataset is insufficient for phylogenetic methods to decide between the tree structures they consider, the overall linguistic evidence strongly points to the tree which Sicoli and Holton re- jected (Section3). Each of these three problems alone would have made [Sicoli and Holton, 2014]’s Beringia inference invalid. Section 4 summarizes the argument in this paper, and discusses the overall state of the Beringian homeland hypothesis given the evidence from linguistics, archaeology, folklore studies and genetics. It concludes that the available evidence currently cannot solve the Beringia debate, though based mainly on archaeology, a deeper Asian homeland is more likely than a Beringian one. Finally, Section 5 provides methodological suggestions regarding obtaining and interpreting computational phylogenetic inferences from linguistic data that would have helped to avoid the pitfalls that the pioneering study of [Sicoli and Holton, 2014] fell victim to.

It should be particularly stressed that the technical problems [Sicoli and Holton, 2014] ran into were likely due to treating the employed phylogenetic methodology more or less as a black box.

The authors provided complete logs of their analyses, which is precisely what allowed me to identify some of their technical mistakes. They were completely transparent about what they did. There is no question that [Sicoli and Holton, 2014] worked in good faith. I hope that an extensive, but relatively informal discussion of not-very-transparent technical issues in this paper, including the online appendices, would contribute to computational phylogenetics becoming less of a black box for the field of historical linguistics.

1. [Sicoli and Holton, 2014]’s argument for the Beringia homeland of Dene-Yeniseian [Sicoli and Holton, 2014] use computationally inferred phylogenetic trees based on linguistic evidence in an argument which, according to them, shows that the spread of the Dene-Yeniseian macro-family proceeded from Beringia. Their linguistic data are 116 binary “typological” features, described briefly in their Supplementary Materials 1. Conventionally, the presence of a feature is coded as 1, its absence as 0, but the phylogenetic-inference model that the authors use does not make an internal distinction between presences and absences, being only sensitive to match and mismatch between the same feature in different languages.

Sicoli and Holton’s features (which can also be synonymously called “characters” or “sites” in the context of phylogenetic inference) are highly correlated with each other. For example, features 1-18 concern the shape of the vowel system of the respective languages. Features 1, 4, 8, 12, and 16 are mutually exclusive, as they count how many vowels the system has overall: three (feature 1), four (feature 4), and so on. The 13 other features in the group 1-18 describe the more exact shape of the vowel system, and are conditional on the number of vowels in it. Thus for the 3-vowel systems, there are two binary features “1-1-1” (feature 2) and “2-1” (feature 3). Obviously these can have value 1 only if feature 1 (=having exactly 3 vowels) is 1. Similarly, only one of those two features can be 1 at the same time. Finally, if the system has three vowels overall, this means that all features corresponding to systems with a different number of vowels, that is features 4-18, must be 0. Summing up, features 1 and 2 and features 1 and 3 are positively correlated, while features 1-3 and 4-18 are all pairwise negatively correlated. These 18 features are arguably the most correlated subset in Sicoli and Holton’s dataset, but similar problems occur on a smaller scale in the rest of the data as well. The evolutionary model Sicoli and Holton use for their phylogenetic inference assumes feature independence, which is not the case. To be fair, however, the same problem affects any

(4)

linguistic phylogenetic study that uses lexical cognacy data and codes them as binary characters, which is currently a very common practice in the field. I am not aware of a full-scale quantification of how serious the problem might be. But perhaps more importantly in our case, dependence between features means that the genuine amount of data in the dataset is effectively smaller than 116 binary characters.² This is important, because 116 binary characters is not very much data to start with as far as computational phylogenetics goes. As the effective number was even smaller, it should therefore come as no surprise that Sicoli and Holton’s phylogenetic results depend heavily on the choice of prior distributions, as we will see in the next section. With large amounts of data, the signal in the data can often overwhelm the biases of the prior, but the less data we have, the more influence our prior assumptions will have on the final result.

Sicoli and Holton’s specific goal in their phylogenetic analysis is to compare two hypotheses about the shape of the Dene-Yeniseian tree. (Here, “hypothesis” is meant in the statistical sense, namely, as a theoretical possibility that we can study statistically; this sense is different from the general scientific sense of “hypothesis” as a possibility that is formulated to explain the present evidence.) One hypothesis says that the Dene-Yeniseian tree has the shape [[Yeniseian], [Na-Dene]], with the first split separating the two traditionally postulated language families. The other hypothesis says that the tree does not have that shape, and instead that some Na-Dene languages branch out before the Yeniseian languages branch out from the “stem” of the tree. In other words, the second hypothesis says that the tree has the shape [[Na-Dene I], [Na-Dene II, Yeniseian]]. This hypothesis explicitly contradicts the traditional linguistic classification of the relevant languages — an issue we will discuss below in Section 3.

Sicoli and Holton argue that the two different topologies correspond to different migration scenarios. As this is a crucial step in their argument, it merits a full citation:

We expect the two different migration hypotheses to exhibit different tree topologies.

The out of central/western Asia hypothesis assumes that the Yeniseian languages (and potentially their extinct relatives) branched off of the Dene-Yeniseian family with Na-Dene subsequently diversifying. The tree topology for this hypothesis would place the Yeniseian languages outside of Na-Dene: [Yeniseian[Na-Dene]. The radia- tion out of Beringia hypothesis does not assume that Yeniseian necessarily branched first.

[Sicoli and Holton, 2014, p. 4]

In this passage, Sicoli and Holton claim that origin for Dene-Yeniseian in central/western Asia requires the topology of the family to have the shape [Yeniseian, [Na-Dene]]. This is clearly a false assumption. For example, consider a hypothetical scenario where Proto-Dene-Yeniseian is spoken in Siberia; then splits into Na-Dene I vs. the rest at a later time; then finally Yeniseian and Na-Dene II split. Whether alone or together, Na-Dene I and Na-Dene II move into North America, while the Yeniseian stay in Siberia. This is of course not the most economical scenario, but nota priori impossible. [Sicoli and Holton, 2014] do not discuss why exactly they assume that such scenarios should be ruled out.

2Sicoli and Holton also note that 26 out of their 116 binary characters feature the same value for all languages, and say that they are therefore uninformative for phylogenetic inference. The latter statement is not completely true. What is true is that a uniform feature does not give us any information about the groupings within the family:

obviously, only shared innovations and retentions that affect a part of the family are useful for determining the family’s structure. However, in the statistical framework Sicoli and Holton use, uniform features do contribute information about the rates of change. Through that, they can even affect tree topology, albeit not as directly as non-uniform characters. In the main text, I only report analyses including all the features, unlike Sicoli and Holton who excluded uniform ones. (I discuss the technical aspects of the issue a bit further in Section2.) I checked whether this difference would affect tree topologies by running one analysis in two variants. As there was no significant difference in tree topologies, this choice was not particularly consequential in this case. With other datasets, however, it can be, as I discussed elsewhere [Author, 2018].

(5)

But without that assumption, their argument about the location of the homeland breaks down.

Both a homeland in “central/western Asia” and a homeland in Beringia are in principle compatible with both [Yeniseian, [Na-Dene]] and [Na-Dene I, [Na-Dene II, Yeniseian]] topologies. So by finding out the true topology of the family alone, we cannot resolve the homeland question.

For the sake of the argument, let’s assume with Sicoli and Holton that their assumption is correct, namely that an inner Asian homeland is only compatible with the [Yeniseian, [Na-Dene]] topology, and that the only other homeland option is Beringia. If this were the case, then testing for tree topologies could indeed resolve the homeland question. [Sicoli and Holton, 2014] claim that they did resolve it because they established that the true topology of Dene-Yeniseian is [Na-Dene I, [Na- Dene II, Yeniseian]]. In the next two sections, we will examine whether they actually established this topology. Section2 discusses the computational-phylogenetic side of the issue, and Section3, the linguistic side.

2. The dependence of phylogenetic results on the choice of tree prior The bottom line of this section is very simple: while [Sicoli and Holton, 2014]’s original analysis did not support a basal split between Yeniseian and Na-Dene, if we change one of the settings of the computational analysis, namely the tree prior, the results actually show exactly such a split.

The two analyses both use a priori reasonable settings, but their results disagree.

If you are only interested in the general structure of the argument for the Dene-Yeniseian homeland, that information, illustrated by the consensus trees in Fig. 1, is already enough and you can skip to Section 3. The rest of this section explains the computational analysis involved and its results. Using computational phylogenetic software can be daunting, as the manuals and help pages often presuppose a great deal of knowledge about the technical details, and are written for biologists and geneticists, not linguists. One of the goals of this section and accompanying Online Appendices A-C is to somewhat demystify the process, and at the same time explain the problems with [Sicoli and Holton, 2014]’s analysis, so that one could avoid running into similar problems in the future.

One should distinguish between the general principles of computational phylogenetics and their implementation in specific software packages. In this section, we mostly discuss the software-neutral principles, while Online Appendix A explains how to apply them within the software MrBayes, [Ronquist et al., 2012b], used by Sicoli and Holton and also for my replicas of their computations and additional analyses performed for this paper.

Introductions into the phylogenetic methods are often oriented at a specific software, though they also cover general topics that apply to any implementation. [Ronquist et al., 2009] is a short methodological introduction that gradually works its way towards a mathematical presentation, and is centered on MrBayes. The manual [Ronquist et al., 2011] and the command reference for that program, both provided with its installation, are also useful resources. [Drummond and Bouckaert, 2015]

is a book-length introduction into doing phylogenetics with another popular software package BEAST, [Bouckaert et al., 2014]. [Maurits et al., 2017] describes a software tool intended to make it easier to build linguistic-phylogenetic analyses for BEAST.

In Section2.1, we describe the principles behind Bayesian Markov Chain Monte Carlo (MCMC) phylogenetics; Section2.2discusses the results Sicoli and Holton obtained and those we add in this paper, and what they mean for the Dene-Yeniseian case.

2.1. How Bayesian MCMC works. We start with a brief and informal discussion of the statistical method that Sicoli and Holton used to obtain their trees for the Dene-Yeniseian macro-family.

That method is called Bayesian MCMC (abbreviated from “Markov Chain Monte Carlo”). Infer- ring a language-family tree from observed data intuitively involves finding the tree or trees which are best compatible with the linguistic data.

(6)

To completely describe any given tree, we need many parameters: the topology of the tree and the length of each single branch. The topology is a so-called categorical parameter: there are many alternative topologies, and we want to infer which of those alternative values is true. Branch lengths, in contrast, are numerical parameters. A tree itself is not enough to determine how well it is compatible with the data: we also need to describe how linguistic data change as they get inherited along the tree. In particular, we need some assumptions about the rates of linguistic change across different characters (recall that “character” simply means a recorded/observed linguistic feature in the phylogenetic context) and across different branches of the tree. Describing those rates requires introducing further numerical parameters.

Inferring the optimal tree(s) and the evolutionary parameters is a very complex task: the search space of possible tree topologies alone is astronomically large, and we need to infer the numerical parameters in addition to the topology. Moreover, different parameters are not independent from one another, making the problem even harder for computational statistics. Furthermore, rather than there being one unique absolutely best tree, in this type of model there are usually very many different trees each of which explains the data relatively well. Bayesian MCMC is precisely the kind of method for such complex situations. It is able to search through very hard-to-analyze parameter spaces, and it outputs not a single tree as its answer, but asample of trees that are each reasonably well compatible with the data.

Here is how Markov Chain Monte Carlo works. The algorithm defines a Markov chain (hence the first MC in the name), a mathematical construct that moves through the search space of possible trees and evolutionary parameters according to certain rules, but retaining a degree of randomness (hence the second MC, “Monte Carlo”, metaphorically referring to the randomness component through association with casinos.) The randomness part is crucial to ensure the correctness of the algorithm. At each step of the chain, a new tree is picked together with new evolutionary parameters, essentially as a guess.³ This is our new hypothesis. We compute the probability that language change would have generated exactly the data that we observed assuming that our new hypothesis is correct. That probability is called the likelihood of our hypothesis in statistical parlance. Normally, that probability will be very low even for the best cases, because there are very many ways in which language change can proceed along any given tree. That is why that probability is normally described on a log-scale: it is much harder to work with numbers like 10⁻¹⁰⁰⁰ than with log₁₀(10⁻¹⁰⁰⁰), which is just −1000. We also compute theprior probability of our hypothesis. For trees, for example, their prior probabilities characterize how likely we (or rather our algorithm) consider them to be the true tree a priori, without looking at any actual data. A distribution determining this probability is called atree prior. Generally, prior probabilities are different from likelihood because likelihood depends on the observed data as well as on our current hypothesis, while prior probabilities only depend on the hypothesis.

For each new hypothesis about the tree and the evolutionary parameters, it is the product likelihood * prior that is relevant for the Markov chain. (When reading the output of a phylogenetic software, recall that a product on the normal scale corresponds to a simple sum on the log scale.) In technical terms, that product is proportional to the probability of our hypothesis given the data, which is what makes it a very useful quantity. Even though the absolute value oflikelihood * prior will be quite low for any hypothesis, there will still be an enormous difference between more likely and less likely outcomes of language change. So we want to see how exactly our new hypothesis fares compared to others. For that, we compare the productlikelihood * prior generated assuming that our hypothesis were true, with the same quantity computed for our previous hypothesis. The fact that we only ever compare two hypotheses may seem unintuitive at first: don’t we need to compare our hypothesis with all possible others? The beauty of MCMC is that even though we only

3For the algorithm to be efficient, that guess has to be somewhat informed, but the details of that are not relevant for our purposes here.

(7)

use pairwise comparison, the resulting sample that we obtain in the end contains the information about how all possible different hypotheses fare comparatively. The MCMC pairwise comparison uses a special rule to decide whether to keep the old hypothesis or to adopt the new one instead:

the higher the product likelihood * prior of our new hypothesis, the more likely we are to adopt it, discarding the previous one. Importantly, the adopted hypothesis is not necessarily better than the old one (with goodness here measured by likelihood * prior). The point of the algorithm is crucially not monotonic improvement. This again might seem strange at first, but in fact it is needed to obtain the mathematical guarantee that in the end, we will have a sample from the true posterior distribution of our model. The posterior is the probability distribution over the space of our hypotheses (that is, trees and evolutionary parameters) conditional on the observed data. In other words, MCMC allows us to determine which trees are more likely given our data — intuitively, that is exactly what we want as scientists. The interesting thing about the MCMC chain is not the final hypothesis that we observe, but the whole sequence of hypotheses that the chain passes through.

Our Markov chain is defined in such a way that it can run literally forever. In practice, of course, we are interested in actually getting the results out, so we want to stop the computation at some point and examine what we’ve got. The mathematics of the chain guarantees an astonishing fact: if we run the MCMC algorithm long enough, we are bound to start sampling from the true posterior at some point. This means that our sample will really show which hypotheses are more likely given the data. However, this will only happen after a certain moment. In a run-of-the-mill MCMC analysis, the chain will start with some random hypothesis. Chances are that that random hypothesis would be pretty bad. Technically, it will have a low likelihood, which means it explains the observed data very poorly, and the productlikelihood * prior will correspondingly also be very low. However, the setup of our chain is such that it will start quickly moving towards better and better hypotheses, and at some point we will usually see that the likelihoods of our hypotheses are not climbing up anymore, but rather stay at roughly the same level. When we reach that plateau, we are likely to have started sampling from the true posterior; another wording is to say that the MCMC has likely converged. In Online Appendix A, we discuss how to diagnose convergence in practice. The initial portion of our chain, which is presumably not yet sampling from the posterior, is discarded from the final analysis, and is calledburnin.

Even though we start sampling from the posterior as MCMC progresses, this does not mean that all hypotheses that we sample are equal. Because many parameters in our hypotheses are continuous quantities, it is theoretically impossible to sample precisely the same hypothesis twice.

In that uninteresting sense, all hypotheses are the same. However, some hypotheses may come from regions in the tree space and the parameter space that are “densely populated” with good hypotheses, while others may come from regions less likely on the whole. In the end, we are more interested in this density at the level of regions rather than in individual hypotheses. For example, we may be interested in the tree structure that a clade of languages A, B, and C has: there are three logical possibilities [[A,B],C], [[A,C],B], and [A,[B,C]]. It could be that in the true posterior, shape [[A,B],C] occurs 30% of the time, shape [A,[B,C]] 70% of the time, and shape [[A,C],B] never occurs. If this is the case, then our MCMC samples should also be roughly 30% [[A,B],C] and roughly 70% [[A,C],B] starting from when the MCMC converged.⁴ It is such distributions — over clade shapes, over specific evolutionary parameters, and so on — that are ultimately interesting for us as analysts.

To run an analysis leading to a sample from the posterior, we need to select the desired settings, including models to be used and prior distributions. The specific ways in which we set up the

4Why only roughly? If we ran our MCMC for an infinite amount of time, the numbers would be exactly as in the true posterior. But because in practice we only run MCMC for a finite time, we obtain asample from the posterior rather than the full posterior. If you draw 10000 samples from an infinite can with green and red balls with a 30%

share of greens, you will get very close to 30% green in your sample, but probably not exactly 30%.

(8)

analysis differ by phylogenetic software, but the general principles are the same, and it is often possible to replicate one and the same analysis in different software packages.⁵ The settings can be divided into three groups: (i) model settings governing the computation of likelihood; (ii) prior distributions governing the prior probability; and (iii) technical settings governing the behavior of the MCMC chain that samples our parameter space. Here, we informally summarize the settings relevant for an analysis like Sicoli and Holton’s. Online Appendix A provides a more in-depth technical overview, explaining how the relevant settings are represented in MrBayes.

The model settings are:

• The type of evolutionary model of change, which is often also referred to (somewhat con- fusingly) as data type. In the Dene-Yeniseian dataset of Sicoli and Holton, all characters are binary, with states 0 and 1. There are different models of varying complexity for such data. For example, one question we need to answer is: do we allow the rates of change 0 →1 and 1→0 to differ? ([Sicoli and Holton, 2014]’s analyses answer “no”, but “yes” is not any less legitimate, and leads to slightly differently results.⁶)

• The coding scheme for our characters. Did we record all the available data according to a list of features prepared in advance? Or did we only record characters that had different values in some of our languages (thus excluding characters where all languages agree)? Depending on that, the likelihood model changes, so it is crucial to set this option correctly (see e.g. [Lewis, 2001] for theory). As Sicoli and Holton used a pre-compiled list of typological features, the correct choice here is to say we recorded all characters.

[Sicoli and Holton, 2014] mistakenly say in their analysis file that they only recorded vari- able characters, which contradicts what they actually did.

The prior distributions include:

• A prior for the probabilities of different states at the root. In MrBayes, when we allow the base rates of change 0 → 1 and 1 → 0 to differ, the probabilities at the root are automatically fixed. But when we use equal rates of change, we are allowed to let root probabilities to vary with a particular probabilistically controlled amplitude. In no case are we allowed, in phylogenetic software known to me, to supply specific values for our characters at the root node (which is unfortunate for linguistic applications, where we may have good evidence for particular linguistic features being present in the proto-language corresponding to the root.)

• A tree prior, which can be decomposed into a prior for the topology of the tree (i.e. the shape of the tree without regard to branch lengths) and a prior for the branch lengths.

In many common tree priors, the prior for the topology is strikingly flat, meaning that all topologies are equiprobable, and it is only the branch-lengths part of the prior that is non-trivial. Because of that, what is called a tree prior in BEAST, is called a branch-length prior in MrBayes. Tree priors are discussed in more detail in Online Appendices A and especially B.

5This is not always possible for a practical, not a principled reason: the sets of available settings implemented in different programs are not identical, so one program may e.g. lack an evolutionary model that another program implements.

6For Dene-Yeniseian data, usingrestrictionmakes the Yeniseian clade more distinct from Na-Dene than in Sicoli and Holton’s original analysis, but this effect is mild compared to that of changing the tree prior, described in the next section and Fig.1. See analysesDY-clocked-uniform-strict-restriction vs. DY-clocked-uniform-strict in Supplementary materials. I report in the main text only analyses using thestandarddata type. First, this makes comparison more favorable for Sicoli and Holton’s results, which I am arguing against. Second, since 0s and 1s in different characters of Sicoli and Holton’s do not actually represent identical states, because they refer to very different linguistic entities, it is far from obvious that unequal rates are any better than equal rates; both settings do not describe the reality well, as most characters in the dataset would presumably each have their unique true rates of change between 0 and 1.

(9)

• For some tree priors (e.g. the “clocked uniform” used by Sicoli and Holton, and for the birth-death family of priors), it is also required to have a prior on the tree height. For others (e.g. the coalescent tree prior), the tree height is already implicitly encoded within the tree prior itself.

• A molecular clock prior. Branch lengths must be expressed in some units. The molecular clock relates those units to the only “natural” underlying unit of phylogenetics: the expected number of changes along the branch in a hypothetical character with the “base rate” of change. If we are not interested in connecting our tree to some real-world units of time, the most sensible setting for the molecular clock prior is to simply fix it at 1. This means, for example, that on a branch of length 0.2, a base-rate character will have a 0.2-chance to have changed. These units of expected changes are what both Sicoli and Holton and I used, and are thus what the lengths in Fig. 1below represent.

If instead we want to express branch lengths in e.g. calendar years, we need to somehow define how to translate changes in our characters into real-world time. This is usually done via specifying calibration points: assigning to some splits in the tree a certain real-world temporal interval when we think the corresponding split really happened. However, trying to get calendar-year estimates for the nodes in linguistic-phylogenetic trees usually results in very wide inferred intervals, on the order of many thousands of years, and is therefore of little practical use.⁷

• While the molecular clock determines the base rate of change along a branch, we may wish to use a relaxed molecular clock on top of it. The idea is that language change (just like biological DNA change) is not guaranteed to have the same pace along the whole tree: there could be languages changing faster or slower than others. A relaxed clock allows each branch to have its own multiplier adjusting the effective rate of change relative to the base rate.

If we do not have a relaxed clock, we are said to have a strict clock. There are two broad classes of relaxed clocks: autocorrelated and uncorrelated (see [Ho and Duchˆene, 2014] for an overview, in the biological context). Both types select for each branch a multiplier at random according to some prior probability distribution. In autocorrelated clocks, multipliers on adjacent branches are required to correlate (so for example sister languages are likely to have similar effective rates of change), while in uncorrelated clocks, each multiplier is independent of its neighbors. [Sicoli and Holton, 2014] compare the performance of the strict clock with one type of autocorrelated relaxed clock, to which we will return below.

Whichever relaxed clock we use, we need to parametrize it with some prior governing how different the branch multipliers are allowed to be.

7It is unfortunately a common practice in linguistic-phylogenetic papers to focus on the mean or the median of a very wide posterior distribution of dates for e.g. proto-language nodes. This is highly misleading, though the practice continues even in recent publications. For example, [Kolipakam et al., 2018] claim in the abstract that “[their] results indicate that the Dravidian language family is approximately 4500 years old”, while their actual analyses place the date of the Proto-Dravidian split between around 2800 and>7000 years ago with 95% posterior probability. Perhaps the correct “our results indicate that the Dravidian language family is approximately 2800-7000 years old” does not sound as interesting a claim anymore.

In the case of Dene-Yeniseian, due to Sicoli and Holton’s dataset being very small, the uncertainty about the root age is huge even in terms of expected changes along the branch, as can be examined in the consensus tree.tre-files in Suppl. Mat. 1 by visualizing the attribute height 95%HPD in e.g. FigTree tree-viewing software, http://tree.

bio.ed.ac.uk/software/figtree/. Translating those wide intervals in terms of expected changes into intervals in calendar years would only add more uncertainty. On the general theory of non-eliminable statistical uncertainty in time estimation see [Dos Reis and Yang, 2013].

(10)

• A prior for the rate heterogeneity across characters. Just as different languages can change at different rates, different linguistic features (i.e. characters) also can. Gamma-rate heterogeneity [Yang, 1993] is a method allowing us to correct for that. It uses a gamma distribution whose parameter is governed by a prior we need to supply.

As can be glanced from the above, determining a rate of change for a particular linguistic feature on a particular branch of the tree involves combining many different parameters: (a) the base rate of change from the molecular clock (in case we use the units of expected number of changes, the base rate is strictly 1); (b) if the 0→1 and 1→0 rates are allowed to differ, a multiplier determined by whether we have a 0 or a 1⁸; (c) if we have a relaxed clock, a multiplier corresponding to the current branch; (d) if we use gamma rate heterogeneity between characters, a rate multiplier from one of the categories of the discretized gamma distribution used. What this complexity underscores is that modern phylogenetic methods allow for many distinct forms of rate heterogeneity.

It is important to note that in most cases when we use priors, we try to use very permissive ones, also called non-informative. The idea is that the prior should be wide enough for our data to override it. Generally, if we see in a Bayesian analysis that our results strongly depend on the prior (which we will in the Dene-Yeniseian case in the next section), this means that our results are significantly driven by oura priori assumptions, as opposed to by our data. This is usually not what a scientist is after, especially when it is hard to independently argue that particular priors are more realistic than others, as is often the case in linguistic phylogenetics.

The technical settings include how many MCMC steps we ask for, how frequently to sample the current hypothesis, etc. They are discussed in Online Appendix A.

2.2. MCMC inference on Sicoli and Holton’s Dene-Yeniseian data. In this section, we discuss three issues: (i) Sicoli and Holton’s test between strict and relaxed molecular clock, and how one conducts such tests; (ii) Sicoli and Holton’s test for the favored tree topology under their chosen analysis settings; and (iii) a different topology that results from changing the tree prior used by Sicoli and Holton with a different one. The last point ultimately means that phylogenetics alone is not sufficient to resolve the Dene-Yeniseian tree topology based on the very small dataset [Sicoli and Holton, 2014] use.

Before starting their main analysis, [Sicoli and Holton, 2014] wanted to determine whether the strict clock or the TK02 relaxed clock was a better model, where TK02 [Thorne and Kishino, 2002]

is a particular model from the family of autocorrelated relaxed clocks implemented in MrBayes.

Whether to apply computational methods of model selection in this case depends on the researcher’s judgement: strictly speaking, the strict molecular clock is a special case of the relaxed TK02 clock, arising in the limit of no rate variation. So in one legitimate sense, TK02 cannot be worse than the strict clock. However, we can, equally legitimately, ask how the two models compare on average.

Even though a relaxed clock is by definition more flexible, perhaps on average it performs worse than the simpler strict clock.

A popular approach to model selection that implicitly favors simpler models involves comparing theirBayes factors, and was used by [Sicoli and Holton, 2014] for strict vs. relaxed autocorrelated clock. The definition of the Bayes factor is deceptively simple: it is the ratio of marginal likelihoods derived under the two models. The key here is word marginal: it means that we are comparing not the best possible values of likelihood, but average the likelihood under all possible parameter settings. Online Appendix C discusses this in a bit more detail, but the crucial point is that Bayes

8This simplifies slightly; there can be several changes 0→1 and 1→0 along one and the same branch, so it is not like we have a unique rate multiplier for the whole branch. Instead, we integrate over the possible histories along the branch, which uses both rates, but does depend on the initial state.

(11)

factors look into average rather than best performance. We are thus comparing the performance of the two models under a wide range of parameters.⁹

Though theoretically taking the ratio of marginal likelihoods (i.e. computing the Bayes factor) is straightforward, in practice it is not easy to obtain marginal likelihoods. Fully accurate aver- aging across the whole parameter space is out of the question for all but the simplest and most well-behaved models, and phylogenetic models are definitely not in that class. Because of that, most Bayes factors reported in the literature are only estimates of the true Bayes factors. Sicoli and Holton try two different methods for estimation: the harmonic mean and the stepping-stone methods.

The harmonic-mean method involves comparing the harmonic mean of the likelihoods in each sample from the posterior that we obtained via MCMC. That quantity is easy to compute, and is in fact reported in MrBayes’s standard output. However, it is well-known to statisticians to be an absolutely terrible estimator for the likelihood value we are seeking. One way to show it is to simply note that the variance of that estimator may be infinite (e.g., [Raftery et al., 2007]). In informal terms, this means that we have no idea how far our estimate is from the true value. Another way to explain the problem (requiring more mathematical background than the current paper assumes, but very convincing for those who can make it through) may be found in a blog post by statistician Radford Neal [Neal, 2008]. Intuitively, the issue is that to accurately estimate the overall likelihood, we need to gather likelihood values from regions of the tree space and evolutionary-parameter space that do not explain the data particularly well. By design, our MCMC chain will pass through such regions only rarely. So our MCMC sample, where the harmonic mean comes from, would normally lack information crucial for the accurate computation of the marginal likelihood. The true marginal likelihood is usually much lower than the harmonic-mean estimate.

Hence the second method that [Sicoli and Holton, 2014] use: stepping-stone sampling [Xie et al., 2011], conveniently implemented in MrBayes. The mathematical idea of the stepping stone method is not too difficult, but nevertheless lies beyond the level of the current largely informal discussion. What is important is that stepping stone estimation (when defined) is far more accurate than the harmonic mean method. In particular, when the two disagree, the stepping stone results are very likely to be the more accurate ones.

[Sicoli and Holton, 2014] obtained very close stepping stone estimates for the strict and the TK02 relaxed clock. This means that allowing the language-change rate to vary along the tree did noton average help to explain the data better, though it did not do harm either. Based on that, we can conclude that a simpler model with the strict clock should be used on practical grounds (as there is no gain in using the more flexible model). Sicoli and Holton, however, observe that the harmonic mean estimate for the strict clock was substantially higher than the harmonic mean estimate for TK02. Based on that difference, they declare that the strict clock model better fitted the data.

This is incorrect: as explained above, when stepping stone and harmonic mean estimates disagree, the stepping stone ones should be accepted. Sicoli and Holton’s decision to use the strict clock is justifiable, but not by the argument that they employ.

The main computational question of [Sicoli and Holton, 2014] is: which tree structure is more likely for the Dene-Yeniseian macro-family given their typological data, [[Yeniseian], [Na-Dene]]

or [[Na-Dene I], [Na-Dene II, Yeniseian]]? To answer that question, it is sufficient to look at the posterior distribution of Sicoli and Holton’s baseline analysis. It is summarized in the majority-rule consensus tree in Fig.1(a), based on my replica of Sicoli and Holton’s analysis. Themajority-rule consensus tree contains only clades that are present in more than 50% of the trees in the posterior sample. The fact that the Na-Dene family is not shown as one clade means that in more than

9It follows from this that the result of Bayes factor comparisons depends on the range of parameters we deem acceptable for each model: for instance, if we exclude beforehand some parameter values that we know to be terrible for one model, the resulting narrower model will have better likelihoods on average. It is thus important to bear in mind that Bayes factors are sensitive to how we define the priors.

(12)

half of the trees, Na-Dene do not form a single sub-family to the exclusion of Yeniseian. In other words, the posterior probability of the topology [[Yeniseian], [Na-Dene]] is less than 50%. In fact, examining MrBayes’s output in more detail, we can determine that the posterior probability of that topology is only about 18%. Just as [Sicoli and Holton, 2014] argued, family structure [[Yeniseian], [Na-Dene]] is not strongly supported by their analysis. The alternative structure with the most support, namely with 58% posterior probability, is [[Yeniseian, Californian Athabaskan], [other Na-Dene]].¹⁰

We have reviewed what [Sicoli and Holton, 2014] did in their computational analysis. Now we get to the crucial point of this section: something they did not. In particular, [Sicoli and Holton, 2014]

did not check the robustness of their analysis to different choices of tree prior. When the dataset is large, evidence from the data would usually overwhelm the tree prior, though this always needs to be tested empirically by checking how different priors work on one’s data. But the Dene-Yeniseian typological-feature dataset is very small, with only 116 binary features and 84 unique patterns of their distribution over the languages. To put this into perspective: [Ritchie et al., 2017] examine the effect of tree-prior choice on three empirical biological datasets. In the smallest dataset, they find that the tree prior strongly and adversely affected the results, while the other two were relatively fine. The problematic dataset contained 14K DNA letters, with 188 unique patterns across the taxa. The two larger datasets contained respectively 14K DNA letters with 5765 unique patterns, and 21K DNA letters with 6355 unique patterns. The Dene-Yeniseian dataset is obviously much smaller than even the smallest, problematic dataset of [Ritchie et al., 2017]. It is therefore quite likely that the Dene-Yeniseian analysis might be sensitive to the choice of tree prior.

The clocked uniform prior that [Sicoli and Holton, 2014] used is a mathematical abstraction that does not arise from any known evolutionary process. It takes all branching times to be uniformly distributed within the admissible intervals, which in our case simply means the interval between the root of the tree and the present, when the living languages are sampled. On the surface, this might seem like a nice choice that does not bring any dangerous assumptions into our analysis. As we are about to see, this is not quite true.

I ran exactly the same analysis as Sicoli and Holton, but with the birth-death tree prior.¹¹ Unlike the clocked uniform prior, the birth-death prior arises from a specific interpretable model of the tree-generating process (see [Gernhard, 2008] for mathematical analysis as well as references to earlier work). The birth-death model assumes that all languages (or species) are always equally likely to split into two, with constant rateλ, and also equally likely to die off, with constant rateµ.

These assumptions lead to specific predictions regarding the probability of the timing of branching events in the tree. (The model accounts for the fact that there would often exist branches of the family that left no living descendants, in which case we can only detect a subset of all branching events.) One of the model’s predictions is that branching events become more common as we approach the present. This is intuitively appealing: language families would usually contain more

10Sicoli and Holton themselves argue for the [[Na-Dene I], [Na-Dene II, Yeniseian]] topology using a different method, namely the Bayes factor comparison between marginal likelihoods, as described above for the choice between two clock models. Unfortunately, their application of the method in that case was affected by a mathematical mistake:

their stepping stone likelihood estimation actually favored the [[Yeniseian], [Na-Dene]] topology instead. However, their stepping-stone comparison involved arguably wrong alternatives, and in Online Appendix C, I report a more careful stepping-stone comparison that actually supports the conclusion from [Sicoli and Holton, 2014], even though their own stepping-stone comparison did not.

11For the birth-death tree prior, MrBayes implicitly enforces the improper uniform root-age prior from 0 to infinity.

Even if the user tries to set a different prior, this is overridden by the program without any message informing her of that. This is all right for exploring the posterior, but makes MrBayes stepping-stone estimates undefined for birth-death priors. As a practical consequence, the stepping-stone technique should not be applied to analyses with birth-death priors in MrBayes. Importantly, MrBayes cannot and does not check this and therefore would report such estimates without error when asked; the responsibility to avoid the mistake is completely on the user. This serious issue is one of the hidden perils of computational phylogenetic analysis.

(13)

(a)

0 . 0 2

apj

d g r apw

m t l w l k

t t m N h o i

k t o

gce apc

srs

t o l t f n

i n g t x c

tceS cco

haa tcb x s l k k z eya

c r x h u p

chp z k o t l i aht

bea scsh gwi koy apl apk

tau k u u

taa ket nav 9 2

1 0 0

8 7

7 8

8 4

6 2 9 1

6 5

7 9 7 2

6 9 5 9

5 5 5 5

5 3

5 8

1 0 0 7 2

5 9

7 7 1 0 0

5 8

(b)

0 . 0 2

t x c x s l

t f n tau chp

m t l t o l cco gwi

t t m N

k u u

h u p aht

gce

nav scsh

k t o ket

tceS

apw koy

eya

apk c r x

i n g

srs apj

apl haa tcb

w l k z k o

taa bea

k k z

h o i d g r

t l i

apc 5 1

5 7

8 8

5 7 8 1

8 0

6 7 5 6 7 7

7 2 6 0

7 0 6 4 5 6 1 0 0

8 2

1 0 0 6 0 8 7 9 9

9 5 8 0 1 0 0

Figure 1. Majority-rule consensus trees. Two Yeniseian languages Ket (kto) and Kott (zko) highlighted in blue; four Californian Athabaskan languages (Na-Dene) in orange.

Numbers on clades show posterior probabilities in percents. (a) Modified replica of [Sicoli and Holton, 2014]’s analysis of Dene-Yeniseian excluding unrelated language Haida:

gamma rate heterogeneity with 4 categories, strict molecular clock, branch lengths represent expected number of changes per character. The replica differs from the original analysis in assuming that all sampled characters were included in the data, whereas [Sicoli and Holton, 2014] mistakenly coded the data as only containing non-uniform characters. Tree prior: [Ronquist et al., 2012a]’s “clocked uniform” prior. (b) The same, but with a different tree prior: birth-death. Trees prepared in FigTree, free software by Andrew Rambaut, available athttp://tree.bio.ed.ac.uk/software/figtree/.

languages closer to the present, and it is natural to assume that among 10 languages, a single split will happen sooner than if we have only 1 language. The birth-death prior prefers to have branching events closer to the tips, other things being equal. This differs from the clocked uniform prior, where branchings are distributed uniformly over time, and it is as probable to have one split among 10

(14)

languages in a unit of time as in 1 language. In effect, the clocked uniform prior implicitly assumes that splitting probabilities per language were greater in the distant past than in the recent past.

If anything, the birth-death prior is arguably a better a priori option for language evolution than the clocked uniform prior, though I stress that no existing prior can be expected to model language evolution precisely. I also stress than generally, we want our results to be driven by the observed data, not by the assumptions implicit in our prior — even if the prior itself is reasonable (I discuss common tree priors and the assumptions behind them in more detail in Online Appendix B.)

Fig. 1(b) shows the majority-rule consensus tree from an analysis that is exactly like Fig.1(a), but with the birth-death tree prior. It is evident that the tree prior greatly affects the results.

While the clocked uniform prior, Fig. 1(a), did not support the [[Yeniseian], [Na-Dene]] topology, the birth-death prior does, Fig. 1(b). In fact, under the birth-death prior, the posterior support for a Na-Dene clade, and consequently for the existence of Proto-Na-Dene, is 99%, so the results under the birth-death prior are much more “statistically certain” than they were under the clocked uniform prior.

Some of the differences in the consensus trees are not hard to trace back to the assumptions induced by the tree prior, highlighting its strong influence. Under the clocked uniform prior, language splits are evenly distributed through time, and indeed we can see in Fig. 1(a) that the nodes of the tree occur at very different heights.¹² The birth-death prior, on the other hand, favors trees where most splits are relatively recent, and in accordance with that, most nodes in Fig.1(b) occur relatively close to the leaves. However, this general difference in branching times does not by itself explain thetopological difference between Fig.1(a) and Fig.1(b): it could well have been that a tree with the topology like in Fig. 1(a) had a different distribution of node heights, on average closer to the present... This illustrates that the choice of the tree prior can affect our predictions about the language family’s topology in non-trivial ways, which are hard to predict without actually running the analysis.

To reiterate, in the good case when we have a lot of data, the data should to a large extent override any preferences induced by the tree prior, [Ritchie et al., 2017]. The fact that we get very different results under these two tree priors simply means that our data are very scarce. Fig. 1 shows that we cannot rely on phylogenetics to derive the right topology from Sicoli and Holton’s dataset.¹³ Importantly, this is not to say that phylogenetics is generally useless: with more data, it could well give us a good answer.

Let us sum up. Sicoli and Holton’s argument for the Beringian homeland of the Dene-Yeniseian macro-family was based on their computational-phylogenetic analysis not supporting the [[Yeniseian], [Na-Dene]] topology. However, it turns out that their computational result depends on the choice of the tree prior. This in turn means that we do not have enough linguistic data in the dataset for phylogenetics alone to decide which Dene-Yeniseian topology is closer to the truth.

3. Lexical and morphological evidence regarding the shape of the Dene-Yeniseian tree

For the sake of the argument, let’s suppose that (i) computational analyses provided strong evidence against the [[Yeniseian], [Na-Dene]] topology (which is actually false, as we just discussed in Section2), and furthermore that (ii) the alternative tree topology [[Na-Dene I], [Na-Dene

12One should be cautious interpreting the heights of nodes in consensus trees, as they represent the mean values of usually very wide distributions. In the text, I only use them to make an informal illustration. For any formal argument, one would have needed to examine the heights in each individual tree in the posterior.

13In principle, one could try to apply standard model selection techniques to determine which tree prior is a

“better fit” for the Dene-Yeniseian data, just as with the clock models we discussed above. However, this would not be advisable. The fact that the results are so sensitive to the choice of tree prior is an important indicator: it shows that we have too small an amount of data. It is not very meaningful to answer the question of which prior results in a better fit for a dataset that is simply not informative enough.

(15)

II, Yeniseian]] strongly supported the Beringian homeland hypothesis (which is also false, as we showed in Section1). At least in this hypothetical case, could we conclude that the Dene-Yeniseian homeland was in Beringia? Arguably, still not: the topology computationally obtained by Sicoli and Holton is contradicted by the large body of research on the Yeniseian and Na-Dene families.

Sicoli and Holton’s result implied that there was no Proto-Na-Dene that did not include the Yeniseian languages. As in any statistical investigation, that result should be cross-validated:

we need to check whether it agrees with data external to the analysis. In the case at hand, it is obvious that it does not. Yeniseian languages and Na-Dene languages are quite different from each other, and there is no comparative-method reconstruction that supports the notion of Yeniseian being a closer relative to some Na-Dene subfamily than the rest of Na-Dene is. In fact, the very reconstruction by [Vajda, 2011] that defends the Dene-Yeniseian hypothesis employed Proto-Athabaskan forms as reconstructed by [Leer, 2008] and Vajda himself, and crucially not the material of individual Athabaskan languages.

Table1illustrates the unity of Athabaskan within Na-Dene using lexical data. One should note that the amount of likely lexical correspondences between Yeniseian on the one hand, and Na-Dene on the other, is quite limited in the first place — it is overall ca. 100 sets at present [Campbell, 2011].

This can be compared with ca. 800 cognate sets used by [Nikolaev, 2014] for the Na-Dene family.

But even if we look at the sound shape of the arguably Dene-Yeniseian corresponding words, Table1, ignoring those Na-Dene correspondences that currently have no parallel in the Yeniseian, this is enough to demonstrate the point. Consider the Yeniseian and Athabaskan words for ‘fire’ and ‘foot’

in Table 1. Within Athabaskan, the initial consonants of these two words represent one and the same pattern: Hupa x, Katok^hw, Ahtena q^h, Degexit’an q^h. Such regular sound correspondences are a hallmark of convincing arguments for lexical cognacy (that is, for words descending from the same ancestral word). In contrast to that, the correspondences between the Athabaskan words and their likely Yeniseian cognates are much less transparent. The initial consonants belong to the same series in Athabaskan in the two sets of words, but correspond toqin ‘fire’, butk in ‘foot’ in Yeniseian. The coda correspondences are not unproblematic either, as Vajda himself discusses, and the meanings in Yeniseian also deviate from the meanings in Athabaskan. Of course, this does not by itself mean that the words are not related: meanings may shift, and as for the initial consonants, [Vajda, 2011] proposes that thek sound in Ketki’s is due to a differential development of uvulars before front vowels in the early Yeniseian, so the different reflex of the same proto-consonant may well be due to a difference in phonological environments. However, when such explanations are based on a small number of examples (and that number naturally depends on the overall number of known likely cognates), they have much less explanatory power than when we observe exact sound correspondences as we do in the Athabaskan words. Similar points can be made with respect to two other likely correspondences in Table 1. But the bottom line is that Athabaskan, including Californian Athabaskan, clearly function as a group, while Yeniseian is more different.

Morphology provides similar evidence. One particularly striking example, suggested by an anony- mous reviewer, is valency-changing/voice morphology. The Athabaskan languages have a rich and generally highly productive system of marking valency/transitivity changes, implemented morpho- logically via the so-called “classifier” morphology in the Na-Dene verb (see [Rice, 2000] for an overview; also [Krauss, 1968], [Kibrik, 1993], a.o.) “Classifier” morphemes in Athabaskan are obviously cognate, both judging from their phonological shape (e.g., [Kibrik, 1993] lists Hupa di, Navajo d, Sarcee d, Slave di for the “*d classifier”); their positioning within the verbal morphological template (generally immediately preceding the verbal stem); and their valency/transitivity effects (e.g., changing zero classifier with the * l classifier often creates a causative, across the Athabaskan). [Krauss, 1969] establishes the connections between Athabaskan classifiers and their Eyak and Tlingit counterparts, thus covering the whole of Na-Dene. In contrast to this complex and productive system of valency-related operations, the Yeniseian language Ket is remarkable in hardly having any valency-changing morphemes at all. Its verbs are instead “lexically atomistic”,

(16)

Table 1. Selected likely lexical correspondences between Yeniseian and Na-Dene. All rows represent items considered Dene-Yeniseian cognates in [Vajda, 2011]; transliterations and cognacy judgements within Na-Dene according to [Kassian, 2016], “–” indicates the language is not reported to have the relevant cognate. For simplicity, only the first variant of the word is provided where several related items are available. Ket is a Yeniseian language.

Hupa and Kato are Californian Athabaskan, declared likely to be close relatives of Yeniseian by Sicoli and Holton’s analysis. Central Ahtena and Degexit’an are Athabaskan languages spoken in Alaska. Under [Sicoli and Holton, 2014]’s analysis, Hupa and Kato are expected to be closer to Ket and Kott than to Ahtena and Degexit’an.

Ket Hupa Kato Cent. Ahtena Degexit’an

‘fire’ q`oN ‘daytime’ xoNP k^hwo:NP q^h˜oP q^hUnP

‘foot’ ki’s ‘leg’ =xe-P =k^hweP =q^he-P q^ha:-P

‘cloud’ q´on ‘dark’ – – q’os q’UT

‘earth’ ba’N ninP neP – –

largely not organized into derivational schemes, [Vajda, 2015]. The whole of Na-Dene, and the Athabaskan languages in particular, are starkly different from the Yeniseian Ket.

More generally, one of the difficulties in the Dene-Yeniseian reconstruction is that on the surface, the Na-Dene and Yeniseian complex verbal templates, though similar in terms of their general principles of organization, contain a number of important differences which a successful reconstruction needs to resolve, [Vajda, 2017b]. It suffices to examine the Yeniseian and Athabaskan verbal morphological templates to see that clade [Californian Athabaskan, Yeniseian] of Sicoli and Holton can hardly be tenable.

Let’s assess the situation. The bulk of linguistic evidence points to the existence of the Na-Dene language family on the one hand, and the Yeniseian family on the other. Specialists arguing for the Dene-Yeniseian connection on classical historical-linguistic grounds accept this as fact, and consequently compare the reconstructed proto-languages of the two families, not the modern languages.

There is nothing in the published linguistic evidence known to me that would suggest that Califor- nian Athabaskan (or any other Athabaskan subgroup) is more closely related to Yeniseian than to other Athabaskan, whereas evidence to the contrary is plentiful, as the examples above illustrate.

The comparative method, by which these conclusions are reached, has been tested since the 19th century on multiple language families of the world, and rightly remains the scientific standard for proving language relationship.

At the same time, [Sicoli and Holton, 2014] come to a different conclusion based on a computational phylogenetic analysis. Their data consist of only 116 binary features per language, often highly correlated with each other. Note that if we recode the words in Table 1 into the binary format, we will easily have more binary features than Sicoli and Holton used — though this rep- resents only a tiny bit of available data!¹⁴ On the basis of their limited dataset, Sicoli and Holton obtain best support for topologies with a [Yeniseian, Californian Athabaskan] clade. But as we have shown in the previous section, this happens with the clocked uniform prior, but not the birth-death prior, so their result is prior-dependent. In other words, with their insufficient data, computational phylogenetics cannot resolve the family topology. Finally, even if Sicoli and Holton’s dataset were larger, and therefore sufficient for robust topology inference, we do not have a guarantee that their evolutionary model would have uncovered the true history of the family. This point affects all phylogenetic methods, as the models employed in them are very simple and sometimes dictated by

14It is a different question how reliable such data would be for tracing the language-family history, but then again we have no reason to believe that all of Sicoli and Holton’s features are particularly informative about language genealogies either.