What makes us human

Where there are problems in any aspect of children’s development it makes sense to investigate how it may have evolved. Going back some 400 million years, this applies to human arms and legs. Each has one bone in the upper half of the limb, two bones in the lower half, special bone structures in the hands and feet, and five fingers or toes at each extremity. All of this has evolved from the fins of fish which found it useful to be able to survive on land, initially for short periods. Going back further, this applies to the nervous system with the front and back halves of each limb each with its own nerve supply, with this as our inheritance from segmented animals, the ancestors of fish. Going back just one million or so years, it makes sense to apply the same thinking to speech and language. While this capacity clearly distinguishes humans from any other animal, and mostly develops naturally without any active intervention, this is obviously not the case for all. So how might this complex capacity have evolved?

A biological architecture

Two things drive me to the conclusion that human speech and language are the way they are by four evolutionary turning points, each having helped to define the most recent, identifiable speciations in the human lineage.

One consideration is originally from Noam Chomsky: to the effect that a key part of what we know more about how language works, we could not have learned by experience, and can therefore only know by a species-specific modification of the genome. The other consideration is the early patterns in children’s speech.

Four necessary turning points


An archaic population of modern human ancestors must have started to use sounds or gestures, not in the sense of warning or manipulation, but to pick out some individual, entity, event, or circumstance, as something unique in experience. A reference can always go beyond itself as a hint of interest, care, or love. And this can be informative or inspirational or deceptive. This is inconceivable without a sense of what Jean Piaget called ‘object permanence’ – if we lose sight of something and then see it again where we might have expected to see it, it is probably the same thing. This appears in children just before they start talking as a sudden realisation that things can disappear and then reappear in a different, but predictable, location. There is a view, particularly from Michael Tomasello, that primordial referencence was entirely by manual signing. But from the way we behave when we have no shared language, such need-driven communication may have just used whatever was most convenient – either sign or vocalisation. While there are communities with a high rate of congenital deafness which are bi-lingual between signed and spoken language, there are no hearing communities with only signed language.


At some point ancestors started combining these signs or ‘glueing’ them together. This hugely increased the vocabulary, allowing reference to birth, novelty, absence, disappearance, or death, with meanings n0t easily distinguished by simple gestures or icons. Initially this gave words and other expressions, as it still does in black hole, Big Bang, and Corona Virus. Then it gave speech sounds or phonemes and sentences at opposite ends of the spectrum of complexity. But like many other skills, this capacity to glue may have been unequally distributed across the population.


It became possible to configure a glued structure to ‘fit’ to some need or situation. This made it possible to ‘do things with words‘ in a more precise and nuanced way – like making commitments, adjusting the emphasis, asking questions, and more. But the grammar was complex, hard to learn, prone to mis-tuning, and was possibly even more unequally spread across the population.


The least obvious, but arguably the most important turning point made it possible for a whole population to learn a grammar reliably, in full – for most individuals in around ten years, from the variable, uncertain, logically inadequate, and entirely accidental evidence of what we happen to hear said. This is known as ‘finite learnability’. Inheritors could share a common linguistic heritage. All of those inheriting a full copy of this functionality could reliably progress to a point which Chomsky used to characterise as ‘competence’. This cannot have been any later than the divergence of anatomically modern humans between 200 and 300,000 years ago somewhere in what was for much of this time a fertile area between the Rift Valley and North Africa.. While all of the categories by the first three turning points are recognisable in modern speech and language, their status as categories was an accidental effect of the way they had evolved. I propose that the categories now crystalised into entities in their own right. All elements of all structures of any degree of complexity were now grouped into chunks or Phases, to use the term proposed by Noam Chomsky in relation to part of this process in 1998, and now widely accepted. Let us suppose that learning to talk is by a process involving several thousand possible points of decision, some more important than others. But if all of these decisions are with respect to small sets of phased events. If for one phase there are five decisions, there are just five factorial logically possible orders of decision, five times four, times three, times two, times one, 120 in all. Not nothing, but possible, rather than the obvious absurd impossibility of a thousand factorial decisions. This fourth turning point made us what we are today, at least almost uniformly smart when it comes to learning to talk.

Chances for inheritors

In the competition for scarce resources, each of these turning points was uncommonly adaptive, giving great advantage to fragile and vulnerable populations, making it much easier to defend against rivals, competitors and predators, to plan and execute hunts, to discuss and develop techniques, to plot, to groom, befriend, sympathise, and romance.

At a societal level, this functionality is the wherewithal of every joint venture from from a hunt to a start up. Whether the issue is h0w to understand a broken twig or a misleading CV or financial statement, “You might have been mistaken” is reliably understandable. In hunting or in business, life or fortune may hang on the assumption that there is no need for clarification.

If any of these turning points had been missed by some ancestral population, there would be human populations lacking one or more. And no such populations have been found. The most developed population always had a decisive advantage.

But while the entire cognitive apparatus by these four turning points has diffused across the whole of the modern human population, it is still vulnerable.

Three stages of ‘protolanguage’

If there was, as I contend here, a sequence of evolutionary events, this forces the conclusion that there must have been corresponding stages of ‘protolanguage’.

There is weak evidence for such stages in the typical ordering of language acquisition. First there is a long period during which the child says only single words or what are sometimes called ‘holophrases’ – expressions which sound like they might contain more than one word – but not occurring on their own. Then words start to be put together. The child says something like “duck bath” with two elements relating to two significant entities in the child’s universe, such as, in this case, duck, and the duck’s place, in this case bath, glued together in that order, in a primitive prototype of a phrase or sentence. And the child starts seeming to understand things like “Oh, look. Your duck’s on the floor”, perhaps understanding only duck and floor and mentally discounting the rest of the structure. Then at some point, between a week and two months after saying or understanding the simple declarative, the child either asks a question like “Where duck?” or answers a fully formed corresponding question by an adult like “Where’s your duck?” by an appropriate and plausible reference to place, possibly by a single word.

But interestingly, at least by my observations, never in the opposite order. In other words, one word answers to questions don’t come before two word declarations, even though the one word answer might seem to be simpler. 

In this period of between one and eight or so weeks, the child’s learnability system is starting to yield an appropriate analysis of what I am calling glued structure, with words like where correctly linking to an abstract element on the right edge of the structure from which it has been copied. This seems to happen to the various ‘WH’ forms, who, where, why, and so on, one by one.

But there is nothing like a proto-language spoken by any modern adult human population. All fully developed adult languages have a significantly greater structural complexity.

Anatomically modern humans

By a sequence of four strictly sequenced evolutionary turning points, I am postulating that the end of the sequence was us, as we are today. It now made sense to think of shared entitlement and responsibility, in a way that would not have made sense previously. I am not saying anything about how this self-evidently cognitive event connected up with a more gracile physique, a lighter musculature, with less craggy eyebrows, a more pointed chin, and a more highly-doomed forehead. I am just assuming that these conversational creatures were not the bone crunching savages of the movie, 2001, but indistinguishable from other modern humans alive today, fully capable of becoming fully-competent native speakers of a modern language, and of learning to become musicians, or chess-players, or computer-programmers.

There is moving evidence of this from the finding of an anatomically modern human couple who would seem to have been caught and killed by a pyroclastic event like the one which obliterated Pompei. But they had with them a worn fragment of a child’s skull. It would seem that like modern humans who suffer the great loss of a loved child they wanted to preserve the memory with a cherished memento.

Modern skills in art, music, farming, jewellery, domesticating animals, all took a long time to learn, and may have been vulnerable to loss, just as the skill of hot-riveting steel ships and clinker building wooden boats are both close to being lost today. These skills all require long, supervised apprenticeships. Even the most gifted apprentice has to be carefully told how to do whatever it is. And that requires commonly shared language. And that requires functionalities of the sort characterised here as Glue, Doing things with words, and Phase, all specified in the modern human genome.

This is, of course, very different from the absurd, straw-horse idea of humans being born knowing how to talk, that is sometimes trotted out to rubbish the idea of any sort of genomic inheritance.

Different species

At the point of genetic divergence there are inheritors and non-inheritors. The difference may not be not obvious, but perceptible. If the innovation is adaptive, two populations may emerge with the innovation more concentrated in one than in the other. For a while the two populations may co-exist. But those inheriting the adaptation have better chances of breeding success. Non-inheritors may develop some compensating character. If the innovation is biologically costly – and modern human cognition is very costly, a compensatory adaptation might be in physical strength. Such a compensation could delay the effect of the breeding advantage. But if the advantage by the innovation is sufficient, the non-inheriting population is doomed. Before this happens, however, there may be some breeding between the two populations. There is evidence of this having happened between modern humans entering Europe and an established Neanderthal population, maybe a hundred thousand years apart. So in modern European populations today there are much stronger traces of a vestigial Neanderthal inheritance than there are amongst Africans. But at the point when modern human and Neanderthal populations encountered one another in Europe the linguistic inheritance of the modern population may have been an evolutionary step more advanced than that of the Neanderthals. I hypothesise that this was indeed the case, with only modern humans having evolved the decisive functionality of Phase, allowing the intricate interplay between segmentality and metricality characteristic of modern speech and language, and making this finitely learnable across the whole population.

More evidence needed

My main evidence here is from the speech of children learning English in South West London. The asymmetries and co-morbidities could just indicate some peculiarity in our local dialect. So there is a need for evidence from the speech of children with all sorts of language backgrounds. While there is already a vast amount of data on children’s speech and language, the coverage of asymmetries and co-morbidities is, to say the least, uneven.