Before the beginning
Laying the foundations
Before speech and language could start to develop, a population of modern human ancestors, must have already developed a common cognitive architecture with elements only marginally or indirectly reflected in the modern, fully developed faculty, by six necessary precursor steps, not forcibly ordered with respect to one another, and not events, each likely to have taken thousands of generations to fixate across the species.
- There must have been regular occasions at which the need was felt to gaze at one another in a way that could be recognised and understood. This must have been important because human ancestors lost the pigmentation in the whites of the eyes, making the direction of the gaze easier to read.
- There is an interpretable, symbolic act by pointing to some individual. This has to have been a cognitive innovation by a human ancestor after the divergence from the ancestors of chimpanzees. It is seemingly not understandable to any non-human.
- Mimicry implicitly picks out some some entity or class of entities in the universe, as known to speakers and listeners. As pointed out by Merlin Donald (1991), some group of human ancestors must have started to use mimicry with sounds or gestures to pick out individuals as individuals or as members of a group, class or set, implicitly referring to them.and contributing to the process of original vocabulary formation. What we can be sure about is that humans found ways of agreeing about what referred to what, most probably, referring to individuals. While most modern humans believe they can mimic some sounds from nature, there is wide variation in this skill. For the most skilled exponents, this becomes an entertaining party trick, a circus performance, a military deception, or part of a hunter’s repertoire. Linguistic onomatopieia supposedly involves a degree of mimicry. But the onomatopeia of even the least skilled mimic is by a trained user of an evolved speech system. The imperfect onomatopoeia of mooh or oink may partially recapitulate this precursor step. Mimicry is implictly referential. But only implicitly. True reference requires a system defined on features and compositionality.
- In a hunting community it is valuable to be able to aim a throw. Modern chimpanzees can fling a stick. But this is different from an aimed throw with a defined target. The aim defines an angle in three dimensional space. The distance between the throw and the angle can be measured. The cognition here defines a projection,. relating the thrower to the target. Projection will become a cognitive building block for grammar.
- In a community with a high level of interactivity between the members – a commonality between humans, other apes, and cetaceans in particular, and in a situation with one or more known individuals, there are regularly occasions calling for some sort of reaction. Between chimpanzees these are vocal and gestural. The gestures are sometimes readable by humans – to express pleasure, displeasure, approval, disapproval, and so on, to greet, flirt, command, soothe, empathise, curse, and more, with hoops or whoops, some complex. Jane Goodall demonstrates the richness of such a system in talks and videos, with one hoot translatable as “Good to see you”. For the particular pragmatic purpose in “Good to see you”, the hoot has various acoustic features and at least three semantic features – the pleasure, the encounter, and the individual who is being encountered. But there is no evidence in these gestures of the various semantic features being organised or categorised, as they are in human discourse, where such expressions can be qualified, as by “Good to see you, but I’m sure you’ll be wanting something from me”. Similarly there is no evidence of the hoops and whoops combining with one another. The system is not compositional. There are traces of the primordial, pre-linguistic system in modern language in single words in greetings like hello and goodbeye, and comments like yes, no, please, thank you. As in the case of thank you, many of these are derived from competent modern language. And all of them exploit the fully developed sound system of modern language, even if this is only for the sake of single sound like Shhh. But they only enter the grammatical structure of language as marginal pairings.
- One survival skill, the making of stone tools with points or edges, involved the notion of properties in objects. This goes far beyond the notions of purpose and utility in stripping the buds off a twig, as it goes beyond any other any endeavours by non-humans. Stone tools involved the abstract notions of sharpness, hardness, durability, intersecting planes, and the combination of these properties. As Asa Kerem Bayırlı (2023) points out, human cognition has the special property of being able to go beyond the world of experience. This property Kerem Bayırlı calls ‘Lambda abstraction’. defining it in a set-theoretic notation. The cognition here is a natural tool of invention and innovation. But it took at least three million years of evolution for human ancestors to develop this distinctively human cognition – allowing that a stone could have one or more flakes taken off it, to yield a useful knife or point – 3.3 million years ago by the discoveries of Sonia Harmand and her colleagues (2015). But significant though this cognitive evolution doubtless was, the linguistic application of it involved a further cognitive advance, involving not just the co-ordination of elements, but an inseparable, intrinsic relation between the unrelated phenomena of sound and meaning.
On the assumption of a constant mutational rate for evolution, it is commonly accepted that the last common ancestors of modern humans and chimpanzees lived about six or seven million years ago, although see Søren Besenbacher et al (2019) for a view by which the divergence was actually some millions of years earlier. But whatever the true age of the divergence, it marks the beginning of the process by which humans have evolved their species-specific characters, being able to throw lethal projectiles, and walk or run long distances on two legs (and thus hunt prey to the point when the prey would collapse out of exhaustion). This evolution involved larger brains, flatter, more vertical faces, a highly-doomed forehead without craggy eyebrows, a smooth top of the skull, longer legs and feet better for running, hands better shaped for delicate, precise manipulation, smaller teeth, a deeper, more pointed chin, a larger tongue, a longer vocal tract with a greater distance between the larynx and the lips and a larger nasal cavity. Stone tool making was plainly an immensely difficult cognitive precursor, taking at least three million years from the point at which human ancestors diverged from other primates.
This evolution happened in a population which had gravitated from a life in the trees to a riskier lifestyle, spending more time on the ground, and developing a new way of getting around – on two legs, possibly for the sake of a more varied, more protein-rich diet than could be obtained in the trees.
Ragsdale et al (2023) find what might appear to be a number of different ancestries across Africa. When humans with our distinctive upright, flat, modern faces first appeared, around 300,000 year ago, and a point between 150,000 and 100,000 when there was the last known split in the African population, the apparatus for modern speech and language must have spread across all African populations, and fixated. The modernity was ‘reticulated’ in the language of population geneticists. It must have been like a genetic equivalent of the parcel in the children’s game, pass the parcel.
The totality of the precursor steps provide a foundation for the cognition of speech and language in the broader context of discourse and the species-specific ability to learn to talk without external help – by the proposal here, on the basis of a novel way of encoding grammatical information.