
Implications
Of the proposal here
If it is the case, as argued in the proposal here, that human language evolved over a period of something in the region of a million years or more, unlike the very shsort and recent period proposed by Berwoick and Chomsky (2017), there are a number of implications.
- Pathway. There is an evolutionary pathway from the first human words to the competence needed to explain to an apprentice some skill (such as flint knapping) and a corresponding developmental pathway from the modern child’s first words to his or her fully-mature, adult speaker’s competence nine or so years later. The evidence here confirms Darwin’s hunch that modern humans share a common ancestor with African apes. By a general consensus, the evidence of DNA shows human ancestors diverging from chimpanzees at some point between six and seven million years ago. I assume here that at the point when the two ancestral populations diverged, they shared a system of communication essentially similar to that of modern chimpanzees. Some chimpanzee calls, like the shriek of pain as an individual is attacked, vary in intensity according to the severity of the attack, and crucially, are understood that way by other chimpanzees. But an infinity of calls by the grading of fear, pain and distress is quite different from the infinity considered here. It may be that in some species, particularly birds, different calls can be combined, sometimes one inside another. Observations to this effect have sometimes been interpreted as countering the claim that recursion in the system is human-specific. Similar interpretations have been made of variations in the calls by chimpanzees, vervet monkeys, and others. It may be that some species, particularly chimpanzees, are better able to remember a large repertoire of calls. And other species are better able to understand a variation of them separately, and to combine a given sign and its variation. But a combination of two calls is limited to the square of the calls. The output is finite. The interpretation that either of these evolutions are steps on the evolutionary path to human speech and language misses the point that that the end-state of linguistic competence involves the simultaneous variation of both form and meaning. It is this, I propose, which allows the human-specific property of free compositionality and a similarly human-specific pathway to linguistic competence in normally developed human adults, as taken for granted in all human cultures. But this leaves the core question, how did this human specificity emerge? By the proposal here, it can only have emerged by minimal steps in a population with a well developed communication system mapping calls to meanings, but making a fundamental cognitive break by defining the relation here. In the newly diverged human ancestors, the number of calls may have increased in response to the extreme dangers and opportunities of a ground-based environment. If that happened, the capacity to remember and understand different calls may have been pushed to the limit, perhaps beyond the limit. At some point, at least one call was reconfigured in such a way that it could constitute the first step to modern speech and language. From the fact that there are corresponding phenomena in all languages, it is reasonable to suppose that all of the steps proposed here were made separately, as an evolutionary sequence, in a sequence of populations from which all humans alive today descend. At each point when a necessarily very obvious and visible step was taken, this was valued throughout and across a population. It had to be, or it wouldn’t have diffused and fixated. But while the capacity for speech and language clearly distinguishes humans from any other animal, and mostly develops naturally without any active intervention, this is obviously not the case for all, with 1 child in 10 having minor problems with speech and language, 1 in 1,000 having major problems, and perhaps 1 in 100,000 being unintelligible in adulthood other than to close family and friends, if at all.
- Interfaces. By conceptual necessity, there are two interfaces, one involving physical expression (either by speech or by sign), the other involving the analysis of meaning. Information has to be sent to these interfaces in suitable, necessarily-different forms. The distinctively human lexicon is defined by the way this relation is defined in the brain at the point of acquisition. Both interfaces are limited by general factors of human cognition and the physical universe, including the acoustic phenomenon of sounds dying away as the energy is absorbed by the atmosphere or the fact that once a sign is replaced by another, the first is gone forever. The proposal here makes no particular claim about how the cognitive evolution of speech and language connected up with cognition itself or with any of the physical changes. We just have to note that these changes were in one species, and cognition, language, the physical apparatus, and lifetime experience, must have complemented one another.
- Steps. The Faculty of Language, FL, as it currently exists, could not have evolved its precise character other than by discrete, necessarily ordered steps, each one unconscious, most of them originally proposed by Chomsky. This is understanding FL in a very broad way, distinguishing all the various uses to which language is put, including the case of irony where the intended meaning is quite different from the literal meaning. The steps proposed here provide a framework for the much more narrowly defined structures of what is known as Universal Grammar, UG, and a foundation for language acquisition. The seven steps postulated here were taken by a species which had forsaken the safety of the trees for a much more dangerous life on the ground, after making at least seven significant precursor adaptations. This was a population which either lived on its wits or died, as many sub-populations did. The over-all population of human ancestors remained very small, but ranged across Africa while what is now the Sahara desert was forested and well-watered. Within this population, individuals or groups of individuals must have started restructuring some of their expressions in detectably advantageous ways, but by no more than one term at a time, so that, over the course of thousands of generations, the innovation could diffuse across the population, and (separately) become part of the genome. Following Progovac (2015), there must have been a series of protolanguages, each likely to have left fossils. As the linguistic genome evolved, the effects of the steps interacted with one another, giving the complex variations which Roberts (2022) characterises as ‘building blocks’. An abstract Universal Grammar UG is derived from the evolution of the human species. But there is cross-linguistic variation in how it is used – for example in which parts of the sentence structure are projected where – with global effects on word order and other aspects of what is commonly characterised as ‘grammar’. While a language may not express one or more parts of UG, all languages, spoken and signed, are built from it. The universality here is only very partially expressed at birth, just as the abilities of particular bird species to hover, stoop, dive and soar, are expressed only as the fledgling develops. But all of these steps are entirely unconscious,. In a way more complex than the particularities of bird flight, the full integration of UG elements into FL continues until at least ten or so for most children. The evidence for the evolutionary sequence proposed here is from the acquisition of language, language disorders, the differences between languages, creoles, signed languages, the commonalities of unrelated languages, the detailed examination of any one language – for our purposes here, English, and the special, possibly world-unique, case of Nicaraguan Sign Language. The acquisition evidence is from the similarities between the examples given and the fact that they occur in a matching sequence across all members of a sample of children. By the reasoning here, modern speech and language acquisition replicates its evolution, in a way unlike other areas of comparative biology. In the absence of any evidence that acquisition proceeds differently from evolution, acquisition and the Nicaraguan evidence may provide the closest approach to direct evidence of the possible, probable, or even necessary course of speech and language evolution.
- Timescales, tools and talk. The evolution of speech and language began earlier and proceeded more gradually than by Berwick and Chomsky’s proposal, but still very briefly for a change of such complexity and significance. From paleo-anthropology, it is possible to define an earliest plausible beginning of the evolution of speech and language – no earlier, I submit, than the first manufactured tools intended to last (by the results of Sonia Harmond and others (2015) about 3.3 million years ago), and no later than about 135,000 by the results of Shigeru Miyagawa and his colleagues (2025). I propose that human style stone tool making must have preceded the evolution of language because knapping flint requires an awareness of geometry, quite different from and cognitively far beyond the various skills exhibited by chimpanzees and other non-humans. This was plainly an enormously difficult cognitive step, taking at least three million years from the point at which human ancestors diverged from chimpanzees. The likely time scale of human language evolution is by hundreds of thousands of years or thousands or tens of thousands of generations, in contrast to normal, modern-child development over months and single years. This says says nothing about the exact time scale here for each step or about how quickly the steps in language evolution fixated across the ancestral population. If it took modern human ancestors at least 3 million years to learn the essential geometry of knapping, it would seem reasonable to assume that the incomparably more subtle process of encoding UG on a spine must have been similarly challenging.
- Encodability. By each step, by evolution and by modern acquisition, the human organism’s sensitivity grows to particular, mathematically defined, degrees. The sequencing of the steps is by five factors, the internal logic of the steps themselves, the dictates of discourse and conversation, general human cognition, the criterion of heritability, and the mathematical representation of biology. The factors are justified by the clear evidence of biology in disorders of all sorts, including stammering and problems with the articulation of words and putting them together in grammatical structures. All of these things run in families in ways not accountable by immediate contact. For instance, a child can sound like an uncle or aunt at the same age or a close relative brought up in another language. Such sorts of genetic evidence are found in around 30 percent of all disorders. But biology does not, cannot, operate with any properties defined solely on linguistics, such as consonants, vowels, or sentences. The binary branchedness assumed here (on the basis of 40 years of research on this point) has to be definable in a way that can be can be encoded mathematically, as set out by Matilde Marcolli (2022). This biological ‘encodability’ allows linguistic structures to be entered into a computation in a way applying to any natural language, whether spoken or signed, now fixated as a defining genomic character of our species, anatomically-modern Homo sapiens. As shown by Sandiway Fong (2023), this genomic factor is limited by neuro-physiology; synapses take around a millisecond to transmit from one nerve cell to another, and much longer to recover. Given the complexity of what has to be transmitted, this is a potential bottleneck.
- The reverse of discourse. Language, as characterised by UG, is defined on structures, which are put together so as to lay the basis for a potentially infinite output. UG contrasts with discourse, anchored in the here-and-now of conversation, expressing the use of language to relate utterances to the context in which they are uttered, to express emotions, to interest, to entertain, to elicit information, or to be ironic by reversing the overt sense of an utterance. Even at the very beginning of the evolution, it is possible to imagine discourse functions as soon as there are distinct meanings in particular expressions. Modern language is both used for discourse and subject to UG. UG is structured in such a way that meanings can be both shared and defined. But the structure of UG is quite different from the recognition of other speakers and other points of view in discourse. Discourse and UG are separate domains. Neither makes sense without the other. By the proposal here, both are separately articulated in relation to grammar. There is interplay between the two systems in both directions, with syntactic expressions becoming curses and attitudinal expressions getting turned into words. Plainly, the first words are not exclusively defined by either system. As a child’s language develops, these articulations of discourse and UG become increasingly well-defined.
- Reducing the infinity. in a rather surprising way, each of the steps postulated here in the evolution of UG with its infinite generative capacity, involves the step-wise reduction of the elements involved at any one point in the derivation. The first step, characterised here as Lexicon, first decomposes, then recomposes, two sorts of unlike atom, one expressive, the other semantic, and defines this relation for what it is. Lexicon gives a starting point for this aspect of evolution and individual development or ontogeny. It allowed what is known as Universal Grammar, UG, to start evolving. By the proposal of Martina Wiltschko (2014), UG is defined on a ‘spine’ or a headed decision-tree, with binary branches’, rather than on a set of grammatical functionalities such as passives, as in “She was hit by a falling tree”. Language-specific variations, such as the form of the passive, are defined on derivations from the spine, interactions between these derivations, and the ways that these things are implemented in speech. These variations are part of the learnability space – what has to be learnt in different ways according to the language being learnt, as finitely varying points of variation, often known as ‘parameters’. By the proposal of Chomsky (2000) and much subsequent work by him and others, the derivation is factored into ‘phases’, each defined on a minimal set of elements, such that at any given phase, much of the content of previous phases is no longer accessible to the computation. The ‘workspace’, as this is referred by Chomsky and others (2023), is minimised. The force of a structure is treated separately from its propositional content. By the proposal here, the spine is itself a mathematical structure. This reduction of the workspace is arguably what makes the grammar finitely learnable, as it patently is. By conceptual necessity, this has to be the last of the steps proposed here. Both the notion of a phase-based grammer and the term ‘spine are now widely accepted. By the proposal here, the spine itself is phased.
- The recognition of fitness. Each step must have been noticed and recognised by potential mates for what it was, a greater fitness, leading to a consistent bias in mate selection, ensuring that it eventually became inheritable. In terms of statistical dynamics, the greater fitness may have been marginal. But a slight bias applying consistently over one or more thousands of generations can effect a change in the genome.
- A single sequence of steps conferring a single faculty. Resurrecting the approach of Chomsky and Halle (1968), the seven steps here give speech as well as language. Crucially this evolution provided a grammatical apparatus which was, and is freely used in assembling words together and in the building of speech sounds, in ways that the child has to learn. While the phonology of this falls outside the main scope of the evidence here, the apparatus may be over-used in the process of speech acquisition so that children often use devices in the building of words which should be used only in the assembling of words into sentences. The phonological apparatus is such that speech-disordered children from different generations or parts of a family often have recognisably similar issues. By virtue of the sequence, the grammar becomes available in parts. In terms of syntax, the topic of most of the evidence here, a normally developing child of two and three quarters can say “A clock tells you what time it is” displaying the first evidence of Phase long before the full functionality of the grammar has emerged, as it mostly has around seven years later. All that can be said is that the encoding is separate from the communicative advantages. For instance the convenience of using pronouns has no obvious relation to the unobvious adjacency of levels on the spine. If the proposal here is on the right lines, one or more of the last evolutionary steps may have occurred after the divergence between modern human ancestors and Neanderthals, and before more or less anatomically modern humans appeared in what is now Western Morocco around 300,000 years ago. Inheritors of the epigenetic changes by the last step in particular would have learnt to talk faster, more accurately, more reliably, and crucially more completely. We often refer to someone having ‘the gift of the gob’ as a characteristic talent of chat show hosts and comics, uncommonly able to spot and develop a double entendre and more. The first inheritors of Phase would have sounded even more talented, standing out even more sharply in competition for mates. By the resxults of Miyagawa and his colleagues (2025) Phase must have fixated across the ancestral stem of anatomically modern Homo sapiens by around 135,000 years ago. It seems likely that Phase critically reduces the learnability space, making speech and language finitely learnable for the overwhelming majority in what Eric Lenneberg in 1967 called the ‘critical period’ for language acquisition, normally ending around the age of ten. This linguistic advance would have marked homo sapiens apart from any pre-existing human species, including Neanderthals already established in Europe and central Asia. It thus may be that it is Phase which makes language finitely learnabable as it demonstrably is for the overwhelming majority, that language was not finitely learnable without it, that without Phase, there was a wide range of linguistic competence across the ancestral population, with only a minority having access to anything resembling the complexity of modern grammar. How small this minority may have been, how grievous the effects of relative incompetence were, and in what proportions, where the grammatical defects may have appeared, are all impossible to guess. As Phase, and conjecturally finite learnability, spread across the population, communication between conspecifics sharing this faculty became critically more reliable. In dealing with everyday emergencies, at critical points in hunting dangerous prey, in disseminating advances in technique and technology, reliable comnunication became a decisive asset. Finite learnability and the consequential reliability of communication between conspecifics would seem to have given a great advantage at the point of population survival to those having a phase-based grammar in relation to any group not having it. The difference may have been critical, with Neanderthal mastery of speech and language mastery uneven, with no expectation of common understanding. Neanderthals may have been stuck at the point when only a fortunate minority had a full mastery of their linguistic inheritance, whatever that may have been, and the rest of the population had only varying degrees of competence and little or no metalinguistic ability. In competition for scarce resources, the reliability of communication between conspecifics may have enabled modern Homo Sapiens to prevail decisively over the established Neanderthal population in a few thousand years, gradually developing the first indications of modern culture in jewelry, wall-paintings, sculpture, musical instruments, not to mention stone tools. While there is a learning process which is normally completed across the whole population, this is not so for all. As Carol Chomsky showed in (1969), many ten year olds are still misunderstanding sentences like “I’m asking you what to feed the dog” as “I’m telling you what to feed the dog”. She suspects that some individuals may not proceed to a full understanding of this point. On a phase-theoretic analysis of the error here, the subject of the ‘feed the dog’ phrase is incorrectly not projected up to the topmost phase, represented in this case by the first person pronoun, I.
- Infinite generatiive capacity. Despite the seemingly obvious progress towards the infinite productive capacity of UG by a succession of small increments, the basis for the infinity is already there in the normally developing one-year-old’s first word.
- Clinical effects. In relation to less than fully competent speech and language, diagnosed as delayed or disordered, the proposal here effects a conceptual economy. Rather than postulating a series of separate disorders, it is possible in principle for parts of UG to be incompletely specified in some individuals. Most developmental disorders involving those aspects of speech and language which are necessarily learned, phonology, syntax and morphology, are by the effect of failures in the specification of a genomically defined UG which makes it possible for humans to learn to talk the way they do without needing to be helped, other than to learn what not to say. This makes it unnecessary to postulate a corresponding series of specific malformations. Many common issues are more accurately definable. There are useful points of measurement and focus points for intervention. And the range of plausible interventions is increased. For instance, many children have difficulty with both case and tense, as in “She loves me” where the S in loves expresses both present tense and agreement with the singular property in She. Such children may go on saying things like “Love me” many years after most children have learnt that in a statement, both the she and the S in loves are forced in English. To help children with the common developmental issue here, it may be useful to allow them to discover the ‘sisterhood’ relation between the subject marking of she, known as ‘nominative case’ and the S in loves, known as ‘third person singular’. And to do this, it may be useful, as argued in more detail in Nunes 2023, to focus on the basis of that relation. A history of delayed or disordered speech is likely to be co-morbid with literacy problems. The characteristic multifactoriality of speech and language disorders is predictable. There are likely to be speech errors by misapplying what should be syntactic processes in the phonology. Many characteristics of child speech are likely to be reducible to the lack of any proper definition of phonemes, syllables, words, and so on. Children with speech and language disorders are likely to have characteristically poor metalinguistics. Many apparent disorders, even some with names in popular speech, such as ‘lisping’, fall out from the notion of decompositon, by the proposal here, by the first Lexicon step in the evolution of speech. Another group of children sometimes say monopoly as OPOLI. If such errors persist they can lead to stigma or mockery. Monopoly as OPOLI involves the non-pronunciation of the first three sounds, with only the domain of stress pronounced. The child may be confusing the stress domain and the word. The proposal here does not take account of subsidiary, though still important, psycholinguistic considerations such as auditory memory and auditory discrimination and more. It is just assumed here that the domain of speech and language is large and complex, and that the process of mastering it is likely to begin with only occasional successes and much more common failures. Advances towards adult competence are likely to be at first only very occasional – hence the recommendation here about keeping a diary. The proposal here is just motivated by the idea that speech and language therapy has everything to gain by aligning itself with advances in linguistics. These advances are not monolithic. Choosing between them is no easy task. But there is clinical value in the effort.
- A cogntive centrality. If, as proposed here, the acquisition of speech language follows the evolution of the faculty, this has a large bearing on how the process can go wrong, as it does to a small degree in perhaps one child in ten and to progressively greater degrees in progressively smaller numbers. If the proposal here is correct, it is unlikely that there are common speech problems in the strength of the relevant musculatures. In evolutionary terms these date back perhaps 100 million years in the case of the infant’s suck response, and perhaps 500 million years in the case of maturely developed feeding. These are thus highly conserved competences. Speech is a much more recently evolved competence, at least in its modern, most highly evolved form. So speech and language represent a much less securely evolved competence than sucking and feeding, and are correspondingly much more likely to be be subject to some sort of developmental failure. This does not mean that the relevant musculatures are always correctly and appropriately developed. But malformations here are much less likely than issues with the very complex and only recently evolved cognitive mechanisms for speech and language.
- The autonomy of grammar. No version of any of the steps postulated here is reducible to the needs of communication or social interaction. There could not have been any external input because by their very nature, the properties here are formal rather than interactional. A phase-based spine can only be defined on general, i.e. non-linguistic, principles. It cannot directly reference any categories which would only come into existence by virtue of the evolution. The grammar must encompass the entire apparatus which yields the linguistic categories. A category may seem to occur only very rarely or even in only one of the six or seven thousand known languages. Some categories are idiosyncratic, But if a category occurs at all, the learnability space must be configured accordingly.
- A buffer. A system by which linguistic structures of all sorts were derived in real time would seem to have favoured the secondary evolution of a buffer between the derivation and the articulation of speech. Such a buffer is both contingent on the formation of these structures and developmentally vulnerable. By the proposal of Nunes (1994), an incorrect specification of the buffer characteristically leads to stammering. Stammering occurs in all human populations, at a rate of between one and two percent in those using speech and around a tenth of this rate in those using sign. If the functionality commonly characterised as ‘Merge’ has triggered the buffer as a supporting adaptation, this pushes the evolution of at least some of the steps proposed here back in time to a point significantly before anatomically modern homo sapiens started spreading, first across Africa, and then across the rest of the world.
- One stem. All humans alive today must be descended from one African stem. The ancestry may be from more than one point on the stem, which may have migrated and introgressed (See Chris Stringer (2016), Aaron Ragsdale et al (2023) for a different point of emphasis). But at a given point of descent, modern UG was necessarily complete. Or there would be groups of humans, genetically incapable of ever learning one another’s languages. There is a contrary view, due largely to the work of David Reich and his colleagues, emphasising the significance of introgressions into the African step from Europe or the Middle East. But this does not seem to me easily compatible with the evidence of Miyagawa and his colleagues (2025) pointing to a probable date of modern language completion around 135,000 years ago.
- Relative timescales. Following seven precursor cognitions, the specifically linguistic steps postulated here are exactly recapitulated in the language development of a modern child – except that what the child is hearing is a fully-developed, modern language, and the child is the inheritor of a corresponding genomic capacity. Modern acquisition seems likely to be of the order of a million times faster than the original evolution.
- Early complexity. The proposal here involves what is sometimes known as ‘early complexity’, on the understanding here, complexity as early as possible, but no earlier and no later. This is to say that as one evolution is built on by another, the earlier evolution cannot be amended. So evolved properties are plausible only at a given point of evolution. No property evolved in this way can be jettisoned on a ‘Use it or lose it’ basis without fatally compromising the rest of the apparatus.
- Specificity to homo sapiens: From the study of ancient and modern DNA, it can now be computed that for most of six or so million years since the point at which human ancestors diverged from other primates (not chimpanzees by current research, but a separate species), the world population of humans was very small – in the single thousands, with breeding groups often becoming extinct. This is likely to have continued even when human ancestors were decorating the walls of caves with an artistry that impressed Picasso. The surprise is that humans survived at all. But a commonly shared FL may have contributed to the current position of humanity as the dominant species. By the results of Miyagawa and others (2025), it is inconceivable that Neanderthal humans, having separated from the homo sapiens stem at least 500,000 years earlier, had a grammar with Phase. If the language sampled here is representative, as I believe it is, Neanderthal language would have been stuck at the level of a modern child of between two and a half and two and three quarters. This would have made it very difficult to talk about truth, causality, probability, and moral imperatives. As adults they would be able to understand simple instructions and make simple declarations and observations of fact, but crucially no more. The sticking point would have been at the level of the genome. No amount of loving help and coaching could help bridge the gap. Modern humans and Neanderthals could partner up, have babies, and the babies could grow up and have children of their own, with some inheriting the capacity for Phase, and others not. Given the obvious advantages of the Phase inheritance, it is not surprising that in the long term it was only this inheritance which endured. Denisovan and Neanderthal humans cannot have enjoyed the fully developed capacity of modern language. When they partnered up with modern humans, their speech may have sounded like the speech of a one or two-year old today sounds to modern adults. Relationships would seem likely to have been highly uneven.
