
Before the beginning of speech and language
Laying the foundations
Before speech and language could start to develop, a population of modern human ancestors, must have already developed a common cognitive architecture with elements only marginally or indirectly reflected in the modern faculty, by six precursor steps, each likely to have taken thousands of generations to fixate across the species.
- There must have been regular occasions at which the need was felt to gaze at one another in a way that could be recognised and understood. This must have been important because human ancestors lost the pigmentation in the whites of the eyes, making the direction of the gaze easier to read.
- There is an interpretable, symbolic act by pointing to some individual. This has to have been a cognitive innovation by a human ancestor after the divergence from the ancestors of chimpanzees. It is seemingly not understandable to any non-human.
- Mimicry implicitly picks out some some entity or class of entities in the universe, as known to speakers and listeners. As pointed out by Merlin Donald (1991), some group of human ancestors must have started to use mimicry with sounds or gestures to pick out individuals as individuals or as members of a group, class or set, implicitly referring to them.and contributing to the process of original vocabulary formation. What we can be sure about is that humans found ways of agreeing about what referred to what, most probably, referring to individuals. While most modern humans believe they can mimic some sounds from nature, there is wide variation in this skill. For the most skilled exponents, this becomes an entertaining party trick, a circus performance, a military deception, or part of a hunter’s repertoire. Linguistic onomatopieia supposedly involves a degree of mimicry. But the onomatopeia of even the least skilled mimic is by a trained user of an evolved speech system. The imperfect onomatopoeia of mooh or oink may partially recapitulate this precursor step. Mimicry is implictly referential. But only implicitly. True reference requires a system defined on features and compositionality.
- In a hunting community it is valuable to be able to aim a throw. Modern chimpanzees can fling a stick. But this is different from an aimed throw with a defined target. The aim defines an angle in three dimensional space. The distance between the thrower and the target can be measured. The cognition here defines a projection, relating the thrower to the target. Projection will become a cognitive building block for grammar.
- In a community with a high level of interaction between the members – a commonality between humans, other apes, and cetaceans in particular, and in a situation with one or more known individuals, there are regularly occasions calling for some meaningful gesture – with both an external form and separately a meaning.
Any animals capable of distinguishing different calls for whatever purposes must be able to distinguish between these aspects of any call. Between chimpanzees these are vocal and gestural. The gestures are sometimes readable by humans – to express pleasure, displeasure, approval, disapproval, and so on, to greet, flirt, command, soothe, empathise, curse, and more, with hoops or whoops, some complex. Jane Goodall demonstrates the richness of such a system in talks and videos, with one hoot translatable as “Good to see you”. For the particular pragmatic purpose in “Good to see you”, the hoot has various acoustic features and at least three semantic features – the pleasure, the encounter, and the relation between the individuals. These expressions can be varied to show the degree of pleasure. And such modified expresssions are appropriately understood. But there is no evidence in these gestures of the various semantic features being organised or categorised, as they are in human discourse, where such expressions can be qualified, as by “Good to see you, but I’m sure you’ll be wanting something from me”. Similarly there is no evidence of the hoops and whoops combining with one another. The system is not compositional. There are traces of the primordial, pre-linguistic system in modern language in single words in greetings like hello and goodbeye, and comments like yes, no, please, thank you. As in the case of thank you, many of these are derived from competent modern language. And all of them exploit the fully developed sound system of modern language, even if this is only for the sake of single sound like Shhh. But they only enter the grammatical structure of language as marginal pairs.
- One survival skill, the making of stone tools with points or edges, involved the notion of properties or ‘features‘ in objects. Either broken stones had to be searched for and found. Or they had to be made. This goes far beyond the notions of purpose and utility in stripping the buds off a twig, as it goes beyond any other endeavours by non-humans. Stone tools involved the abstract notions of sharpness, hardness, durability, intersecting planes, the combination of these properties, and the act of cutting or stabbing with the stone tool. Four sorts of thing have to be combined in the mind.
For any useful result, to achieve a particular geometry with at least one broken edge and another edge or tangent, each blow has to fall in some exact place. But by the discoveries of Sonia Harmand and her colleagues (2015), the development of stone tools took at least three million years of evolution or something in the region of a hundred thousand generations from the point of divergence from other apes. As is obvious from the range of tools in museums, the art of stone tool making grew slowly and gradually from the first ‘Oldowan’ tools to ‘Acheulian’ tools three million years later.
In the case of the last in particular, clearly extending from before the beginning of definable language to a point clearly after it, ss Asa Kerem Bayırlı (2023) points out, human cognition has the special property of being able to go beyond the world of experience. This property Bayırlı calls ‘Lambda abstraction’. The cognition here is a natural tool of invention and innovation. But significant though this cognitive evolution doubtless was, the linguistic application of lamda abstraction involved a further cognitive advance, involving not just the co-ordination of elements, but an inseparable, intrinsic relation between meaning and some perceptible expression (spoken, signed or acted out). and some recognition of this relation.
By the proposal here, the finest acheulian tools plainly involve an apprenticeship in the way they are made. And such an apprenticeship requires modern language. Where the available flint is not quite hard enough, as in the area studied by Curtis Marean and others (2007), the flint can be hardened by annealing. This involves heating the flint up and then alllowing it to cool over a day or so, as has to be done with glass. This requires exact control of the heat of a fire and / or the distance from the point of greatest heat. This is most easily done by a team. The development of this skill and the teamwork to implement is not imaginable without modern language. Marean and his team were studying a population living on what was then, around 130 thousand years ago, a South African sea shore. In other words, this was not before the beginning. The population already had modern language. But by the proposal here, this required the completion of a series of biologically and mathematically defined steps. 130,000 years ago, the final step may have been less well fixated than it is now.
On the assumption of a constant mutational rate for evolution, it is commonly accepted that the last common ancestors of modern humans and chimpanzees lived about six or seven million years ago, although see Søren Besenbacher et al (2019) for a view by which the divergence was actually some millions of years earlier. But whatever the true age of the divergence, it marks the beginning of the process by which humans have evolved their species-specific characters, being able to throw lethal projectiles, and walk or run long distances on two legs (and thus hunt prey to the point when the prey would collapse out of exhaustion). This evolution involved larger brains, flatter, more vertical faces, a highly-doomed forehead without craggy eyebrows, a smooth top of the skull, longer legs and feet better for running, hands better shaped for delicate, precise manipulation, smaller teeth, a deeper, more pointed chin, a larger tongue, a longer vocal tract with a greater distance between the larynx and the lips and a larger nasal cavity. Stone tool making was plainly an immensely difficult cognitive precursor, taking at least three million years from the point at which human ancestors diverged from other primates.
This evolution happened in a population which had gravitated from a life in the trees to a riskier lifestyle, spending more time on the ground, and developing a new way of getting around – on two legs, possibly for the sake of a more varied, more protein-rich diet than could be obtained in the trees.
Ragsdale et al (2023) find what might appear to be a number of different ancestries across Africa. When humans with our distinctive upright, flat, modern faces first appeared, around 300,000 year ago, and a point between 150,000 and 100,000 when there was the last known split in the African population, the apparatus for modern speech and language must have spread across all African populations, and fixated. The modernity was ‘reticulated’ in the language of population geneticists. It must have been like a genetic equivalent of the parcel in the children’s game, pass the parcel.
The totality of the precursor steps provide a foundation for the cognition of speech and language in the broader context of discourse and the species-specific ability to learn to talk without external help – by the proposal here, on the basis of a novel way of encoding grammatical information.