2. Steps towards a genome

By the proposal here, different cognitions were recruited for minimally necessary evolutionary steps, referred to here as: Point, Mimic, Glue, Label, Wall, Move and Capsule, as points on the pathway to modern human thought and language, the last like the key to a lock with the effect of ensuring that all speakers would have the same understandings. By the proposal here, language did not develop without thought. The development of one helped the development of the other.

As one example of the cognitions, in the British Museum there is a flint tool from around 500,000 years ago with a scraping edge and a sharp point. This was after the time when, by most estimates, neanderthals had diverged from modern human ancestors, and at least 200,000 years before modern humans had evolved. This tool fits comfortably into the right hand in two orientations, each at roughly 90 degrees to the other. One way it could scrape meat off a bone; the other way it could split a bone in half to expose the tasty and nutritious marrow. This takes a heavy, well-aimed blow. The maker has to have a clear idea of the impress of a tightly-clenched right hand, and how to make the opposite fit for scraping. No such tool is made today. But for a modern carver with modern tools for carving wood or stone, to reproduce this shape would be quite challenging.

Another example is the rather subtle skill of pitching a roof to keep out the sun or the rain. Carpenters would once say: “I know how to pitch a roof” but in some cases without really knowing, as in the case of whoever pitched the roof on our house 150 years ago. It keeps out the rain, but the rain doesn’t run off it as the architect intended. At the most basic level, poles or rafters have to meet at a ridge, each at an angle to the wall on which it sits. The whole structure then has to be covered. If this is with tiles or leaves, the covering starts with the largest at the bottom and finishes at the ridge with the smallest. Without diminishing the cleverness of a nest or web, a roof is in another league. The first computations involve trigonometry. Then the hypotenuse has to be divided by the sizes of the covering.

Many children find the trigonometry of a roof difficult to draw, often putting chimneys at right angles to the pitches, rather than in line with the verticality of the overall structure.

The same cognition is involved in the building of a stairwell. For each flight all the treads and risers have to be in the same relations to one another equally distanced between floors and landings. Roofs and stairs all involve trigonometry and division. The carpenter has to start with a geometrical conception of the end result. The essence of the geometry is the spanning of all the previous operations, vertically in the case of a stairwell, horizontally in the case of a roof. The span encapsulates a totality.

By my proposal, various components of such cognitions were recruited one by one. Over a time-scale of tens and hundreds of thousands of years, these adapted cognitions became inheritable characters. These biological steps happened much more slowly than the process known as ‘grammaticalisation’ by which Latin ille and illa became modern French le and la. Unlike productions of a new word or construction, these biological steps were global in their effects.

Each of these evolutionary steps was quite different from all of the previous streps. Added together, they all came to define the human species, as it existed at the time. Each was a speciation or point at which followers or inheritors of the step would constitute a new species. None of them involved a magic mutation. But over time, by the Baldwin effect, the human genome changed bit by bit.

Necessarily, each of these changes was small, but advantageous. Underplaying the complexity of a seed, if it wasn’t small, it couldn’t have seeded. If it wasn’t advantageous, it couldn’t have grown.

Speciations

It was by the last step that any two learners (of modern English) can expect to converge on a single grammar, and agree that:

• “You believe that it’s true?” and “You believe it’s true?” mean the same thing, both with one clause embedded in another – no matter whether the word that is present or absent; 

• “That rabbit is ready to eat” means two things, one good for the rabbit, one not good, with different eaters in each case;

• “Mightn’t the ball that won the match that the bookie keeps talking about have been being examined by the umpire?” is meaningful, no matter how improbable the sentence. 

Against the proposal here, it might seem possible in principle that the speech and language faculty evolved separately in different cultures and societies by what is known as ‘convergent evolution’ like the separate evolutions of structures for flight in insects, fish, reptiles, and mammals. But in the case of speech and language, this is vanishingly unlikely. All modern languages show similar residues of the evolutionary steps proposed here. On the simplest assumption, the evolution happened just once.

Language contrasts with other distinctively human abilities, like aiming a throw at a target. Very possibly, this was the first ability to evolve, even before the first step towards speech and language. But even after millions of years of inheritance, accurate throwing is very unevenly spread. Gross variation still shows in the British game of darts. Some can land three darts in one bed. Others have difficulty landing three darts on the board. Of course, the champion thrower has invested thousands of hours practise to reach his or her standard. But there are also great differences in native ability. To a degree, the skill of accurately throwing, kicking or hitting balls and projectiles seems to run in families. It involves high speed, complex mental computations. There has to be a genetic component. But it’s not distributed in such a way that there is any sort of genetic fair play.

The ambiguity of “That rabbit is ready to eat” is instructive. Here the same words have different meanings according to the structures which are assigned to them. Some children will hear examples of both structures. But not necessarily all children. So how do all native speakers know about this particular sort of ambiguity?

By the simplest possible answer, the alternation between the structures is given by a species-specific property of the genome. By the proposal here, the genome is from these widely separated steps of speciation, the last at least 200,000 years ago.

Behavioral modernity?

By a notion often referred to by David Reich and others as ‘behavioral modernity’ there is evidence from achievements in flint-tool making, jewelry, cave painting, sculpture, flute making. Most of the evidence is from within the last 100.000 years. It is thus conjectured that there must have been a key evolutionary step in that time, that if the last step of language evolution had been any earlier, there would be evidence in corresponding cultural and scientific achievements.

But by my proposal here, this reasoning is wrong. The last major evolutionary step was at least 200,000 years ago, not 100,000.

Doubtless, modern language was the precondition for this modernity. But the expectation of rapid progress from language to modernity may underestimate the difficulty of those achievements. The paintings at Lascaux were executed with a skill which requires long training over generations. And that requires a continuous, unbroken infrastructure. The corresponding skills are easily lost, as many 19th century skills have now been lost. Like sailing a 27 ton pilot cutter single-handed while the pilot pilots the client ship. The fragility of skills and the economic cost of maintaining them are easily underestimated. I contend that the emergence of modern language cannot be estimated from the first evidence of the modernity to which it gave rise. That is to confuse the absence of evidence for the evidence of absence.

Contradicting the assumption of recent linguistic modernity – within the last 100,000 years, human biological diversity is greater across Africa than across the rest of the world. If the last linguistic speciation was significantly less that 200,000 years ago we would expect to find African populations not benefitting from it. But no such population has been found.

Before the beginning

It is not assumed here that the last common ancestors of modern humans and modern chimpanzees were just ancient chimpanzees, as Darwin’s first critics mocked. It is possible that these ancient common ancestors were quite chimpanzee-like. They may have made nests in trees each night on branches too slender to support the weight of a leopard. They may have shrieked, hooted, panted, roared, in various situations for various reasons, as documented by Jane Goodall. They may have differentiated between predators such as leopards, eagles and snakes, with corresponding alert and alarm calls for each one. We still use a variety of calls to flag up humour, pleasure or sorrow, fear, agony, or triumph. But apart from laughter, the communication is hit and miss. While any such differentiation is clearly a step forward from any lesser degree 0f differentiation, it cannot signal the fact that the signer is thinking about that particular sort of animal or what those thoughts might be. It does not refer.

But the development of the last common human-chimpanzee ancestor is irrelevant to the proposal here.

Point

By the first step postulated here, a population of modern human ancestors, must have found regular occasion to pick out some individual, entity, event, or circumstance, as something specially interesting, as a hint of interest, care, or love. This could be a friend, family member, potential prey or predator.

Initially this could be by gazing or touching. Gazing must have been important because human ancestors lost the pigmentation in the whites of the eyes, making the direction of the gaze much easier to read. But at least among chimpanzees there is great sensitivity here. Chimpanzees perceive eye contact, especially if it is prolonged, as a threat. Between modern humans ‘giving the eye’ is more ambiguous, as anything from a threat to a hint of sexual interest. Touch is also ambiguous. It can be an invasion of personal space or worse, or an expression of tenderness. And as death approaches, as all the other senses are lost, touch may be the last sense to survive.

But on a hunt where a living prey is larger and more powerful than the hunter, touching is not a safe option. And gazing does not communicate to other hunters. But there is a readable, symbolic act by pointing. Here the gesture is now as ambiguous as a gaze or a touch. But in the modern world it is as offensive as prolonged eye contact is to a chimpanzee

Pointing has to have been a cognitive innovation. It is seemingly not understandable to any non-human. The understandability is robust. It was adopted in the first signposts. And it survives as the standard icon for clickability on the internet.

Informal rule.

§ Project a bearing, X, in three dimensional space aligned with the outstretched fore-finger; Look at X.

If the resolution is 10 degrees, there are 36 cubed bearings: 46,656. The direction is to only a small proportion of the logical possibilities of a partner’s attention. This may be to a class or category or some member of one of these or some unaffiliated individual or singularity. Non humans are occasionally involved in acts which show a clear awareness of uniquenesses in non-conspecifics, some cetacean with Jonah, a dolphin inviting a diver to help free a fishing line in which it had got entangled and which would have killed it without the disentangling, and many more. Such acts show awareness of both species and uniqueness within it. So the range of X is broad.

By the proposal here, although this is still not reference, this was still a step on the pathway to it, and from that to speech and language.

But Point suffers from two limitations. Even with 46 thousand bearings, it is imprecise. X may be hiding behind a tree, only giving itself away by the tip of a tail. And it only works where X is in plain sight.

Mimic

Overcoming both of the main limitations of Point, some group of human ancestors may have started to use mimicry with sounds or gestures to pick out individuals as individuals or as members of a group, class or set, even when they are out of sight.

Such mimicry could be squealing like some species of primate or opening and closing the forefinger and thumb like the beak of a bird. For the sake of clarity and actual communication, the mimicry had to achieve some degree of accuracy..

Such behaviour has been observed at least once in chimpanzees – by an observant member of a television crew. An alpha male, who was limping from an injury, was leading his group in line. A younger male started copying the limp of the alpha male. Then the alpha male turned round. And the younger male promptly reverted to his normal gait.

I once saw some draymen using a rope to lower wooden barrels of beer down a ladder into a pub cellar. They had a small dog with them. At one moment While they weren’t looking the dog grabbed in its mouth a coil from the rope had been left on the ground, and coil by coil, dropped the rope into the cellar. The dog could not plausibly have seen the rope being lowered without the barrel. This was one small dog’s joke. The draymen, who had to go back the long way to retrieve the rope, were not pleased to see me laughing. I thought for a moment that I was going to get hit.

By a dog or a chimpanzee, this is teasing. Although at least some dogs and chimpanzees clearly have a sense of humour, such behaviours are not part of the everyday repertoire.

Teasing does not make sense without a theory of the mind of whoever is being teased. Mike Tomasello has conducted numerous experiments showing that both in the wild and in a more controlled, but less naturalistic, way in captivity chimpanzees do have a theory of one another’s minds. They have some awareness of what other chimpanzees know or think, just as Washoe several times showed that she was aware of the feelings of her trainers.

It is argued by Tomasello, Michael Corballis, and others that the first precursors of language was all with the hands, that the mimicry was all signed. But from the behaviour of modern humans with no shared language, such need-driven communication would seem likely to have used whatever was most convenient – either sign or vocalisation. While there are communities with a high rate of congenital deafness which are bi-lingual between signed and spoken language, there are no hearing communities with only signed language. And as Maggie Tallerman and others have noted, the onus is on Tomasello, Corballis and others who propose that sign language came first to show how signing was dropped so completely in favour of speech.

By the proposal here, referential mimicry must have evolved from being either visual or vocal or both visual and vocal to being mainly vocal. Or language would not be primarily vocal.

By the most minute and subtle observations of his three children, Jean Piaget showed how mimicry develops from imitation. One of his children was watching a cat on a wall, and then mimicked the movement with a match box. The symbol formation here is inconceivable without a sense of what Piaget called ‘object permanence’ and the sense of a valuable object, something to be proud of and worth having. By object permanence Piaget meant that if we lose sight of something and then see it again where we might have expected to see it, it is probably the same thing. This is often around the age of 18 months just before the first word combinations. 

As pointed out by Merlin Deacon, mimicry is a first step towards overt expression.

Informal rule:

§ For some entity, X, take some obvious, distinguishing characteristic, x of X; Mimic x as accurately as possible; Adopt or copy x as a standard, conventional way of picking out X.

But the human ontogeny is over more than a year. The human phylogeny may have taken 100,000 years or more.

Not onomatopeia

Because speech is such a powerful system, competent speakers tend to think they can use it playfully in what is known as ‘onomatopoeia’ – from the Greek for name and making. The renaming sometimes falls outside the ‘phonotactics’ or rules for grouping sounds in a language. This is taking a step backwards to something more like real mimicry.

In English, oink for pig, accommodates the phonotactics to what is perceived as a sound of nature. There are no dictionary words with any sort of long vowel before NK or NG. In French, meuh for cow – the vowel of French feu for fire is prolonged in a way that violates French phonotactics, as shown by the H.

In both moo and groin. the accommodation is complete. Both are perfectly possible words in the language in which they are used. But the adaptations are language-specific. French speakers do not relate English moo to cows. Nor do English speakers relate French groin to pigs. Neither French nor English speakers understand the other language’s onomatopoeia for pigs and cows.

In none of these cases is the onomatopoeia accurate enough to allow the non-native speaker to understand the reference.While most modern humans can mimic some sounds from nature, there is wide variation in this skill. For the most skilled exponents this becomes an entertaining party trick, a music hall performance, a military deception, or part of a hunter’s repertoire. But mimicry for reference is lost. It would be laughable for a speaker to mimic a pig in everyday reference to the animal. The imperfect onomatopoeia of oink partially recapitulates what may have been the first step on the pathway to language. But if, as I am proposing, primordial reference was by mimicry, this was still a long way from anything like speech or what we know as onomatopoeia. To get from one to the other, the hisses, clicks, grunts, sighs, moans, of mimicry had to be reorganised into the acoustic elements of sonority, resonance, harmonics, and distributions of aperiodic noise that characterise modern human speech.

A primordial taxonomy

For all the imperfections of supposedly onomatopoeic forms like English oink and moo and French groin, and meuh, and their incomprehensibility other than to native speakers of English and French, they represent efforts to accommodate to mimicry. This is shown by cross-linguistic commonalities between the phonetics. When humans mimic pigs they use what phoneticians call an ‘ingressive airstream’ with the air drawn into the lungs and the back of the tongue raised to the point that the soft palate vibrates. English oink accommodates to this by the closure with the between the back of the tongue and the the soft palate. French does something similar with the back of the tongue gesture of French R. Both English and French accommodate to the mimicry by opening of the passage to the nasal cavity by the N. Similarly, both languages accommodate to the mimicry of cows by an initial lip gesture with the nasal cavity open – M – followed by a long round vowel. In French meuh the length is achieved by a device falling outside French phonotactics, as shown in the written form by the H. In both cases, in English and French, the onomatopoeic representation is much shorter than by true mimicry. The natural sound of the cow is much slower.

Development, limitations, vestiges

In the mimicry of chickens, the lips are initially closed, not allowing the airstream to pass through the nose, but allowing the vocal chords to vibrate as soon as the lip closure is released – effectively a B. Thus in the mimicry of pigs, cows, and chickens, we use the lips, the tongue, and the opening to the nasal cavity, in various combinations. But in mimicry, there are just attempts to reproduce the grunts, bellows, and cackles of nature. Of course, when modern humans do this, they do so as trained users of an evolved speech system. And that may influence their efforts in various ways. But reference by mimicry alone is very limiting. Only a very small proportion of all the funny, interesting, or important things there are to refer to can be defined understandably. Tastes, feelings, preferences, surprises, are just a fraction of what is excluded. Jane Goodall mimics 60 or so whoops of chimpanzees, and explains when and why each and every one is used, seemingly none involving mimicry.

For a social species carving out a new and precarious niche in dangerous territory, just starting to use mimicry as a form of reference, there are clear and obvious advantages from every expansion of the inventory of mimicked items. This could have been done by varying one aspect of the mimicry. But with a system defined only by mimicry, it is hard to track any variations unless the gestural properties are varied one by one.

The first variations may have been by any of the properties available to the human mimic, the length, the pitch, the point in the vocal tract at which closure is effected, in all probability just one at a time. One possible way in which this may have started is by the mimics having noticed that their mimicries of different sorts of animals all using an action by one articulator, in some cases with the airstream passing through the nose, as by M and in other cases not, as by B. If this is what happened it prefigured the discovery of the four clever letters, by the hypothesis here itself the first step towards the modern analysis of the sound system of languages.

No variation will get off the ground in relation to a communication system unless it can be reproduced across the community of system-users. As one property is identified, it can be combined with another, the length in time, the point in the vocal tract and the articulator by which a closure is effected, the tongue or the lips, and so on. This is taking a small step towards what are known as ‘features’, as described in Sounds and Bits of sounds, differentiating vowels and consonants from another. But only a very small step. These variations are defined by what is being varied. They are essentially taxonomic.

Thus mimicry would seem to have developed by reassembling the elements of auditory perception and vocal articulation one by one into an organised and memorable acoustic system, defined not on particular animals or any other focus of reference, but on contrasts. This was taking one small step towards onomatopoeia, abandoning the defining property of mimicry for the sake of more coverage. A fair swap. But mimicry was still very limiting. A proposition was impossible. No direction could be given. Nothing could be explained. And there were no shades of meaning. A pretend limp could pick out an individual, but with no possible way of distinguishing mockery from love or respect.

In a context where mimicry is the closest approach to reference, there are innumerable points on which there is n nothing to mimic, where we still use a single click to agree or approve, two clicks to do the opposite, Shhh to try and get silence, and so on.

We still use the functionality by mimicry from time to time, when there is no dictionary to hand, or none exists, or there is no mobile phone for Google Translate, if communication is important enough, and we choose to run the risk of sounding silly. Primordial mimicry leaves a healthy vestige. .

Glue

By an entirely new sort of mechanism, two elements could be brought together, not by disassembly, but by assembly.. By an idea from the 1990s, now closely associated with the Minimalist Program, structures are built by the simplest possible device of putting two representations together, by what is known as ‘Merge’. Here, for the sake of evolutionary plausibility, I break the idea down into parts. Ljiljana Progovach calls the precursor stage ‘Proto merge’. I prefer to call it ‘Glue’ to resurrect a term from Willhelm von Humboldt, educational pioneer and the first European linguist to investigate a non-European language other than Hebrew. By Glue, the only ordering is by the assembly itself.

Informal rule:

§ Take two maximally contrasting representations, X and Y, and join them.

By Glue, “Pa see” and “See pa” meant just that there was a relation between see and Pa, without specifying who could see who or what. The meaning could only be figured out from the context. As Progovach points out, we still have a vestige of this in expressions like tell tale, cut throat and skin flint. Where people are involved, the vestiges are mostly pejorative or derogatory, not involving any sort of throat or flint or any actual cutting. In modern language a term like Pa or the more evolved Papa encodes a feeling about a parent. Glue expresses an attitude as much as a reference.

Sounds, then words

By my proposal here, Glue did not just involve the assembly of words. Even before that, it must logically have involved the sounds from which the words were built. As Robbins Burling points out, there could be no words without sounds as their components. Or there would have been nothing to join. And like the other steps on the pathway to language, Glue adapted a pre-existing cognition, such as the cook’s insight that some flavours are enhanced by eating two things together. But the insight that the flavour of both meat and leaves is enhanced by salt, does not involve an ordered set. It was the set theoretic formulation of the insight that made it possible to unconsciously incorporate it into a fundamental mechanism for speech and language.

The components of sound were no longer the attributes of actions of mimicry, but self-standing, independent elements. They could start to evolve. perhaps very slowly, towards what are known as ‘features’, as in Sounds and Bits of sounds, differentiating one vowel from another in the same series, EE with the tongue at the front of the mouth from OO with the tongue at the back, and differentiating B with the airstream completely stopped for a brief moment from M with the airstream through the nose while the lips are closed in both cases.

There were immediate advantages: any increase in the combinatorial power of primordial features allowed an exponential increase in the vocabulary. Multiplying two variables gives a four times increase. Mulltiplying three variables gives a nine times increase. And so on. And new ideas with no obvious ostensive expression, could be expressed by arbitrary and abstract combinations of elements, merged together in steps of increasing complexity.

By my proposal here, Glue is still part of modern language. Speech sounds or phonemes are built by language-specific sequences of steps. In most languages, the tip of the tongue contact with the roof of the mouth just behind the teeth in T and D is built early in the building or what is known as the ‘derivation’. But in Russian the articulation is built later with the effect that the articulation is against the back of the upper front teeth, sounding very distinctly different.

Minimalist Merge does a great deal more than this. But Glue, a sort of Merge Mark 1, provides a basis for Label, effectively Merge Mark 2.

Glue allows elements to be combined with one another, a sort of lazy and haphasard marriage bureau for ideas.

Label

Some important aspects of early human culture and economy involved differentiating between one sort of tool and another, distinguishing different sorts of function in the mind. One was thinking of one sort of stone as a hammer and another as a spike or chisel. Another such distinctions was in fire-making. By drilling a hard twig into a larger piece of softer wood, eventually a red glow appears, finally becoming hot enough to light small shavings.

None of this is possible without a human ability to conceive the differences between instances of a material, differences which can be expressed as labels.

Abstracting away from different sorts of material, different roles could be read into the labels. In the case of “Pa see” the elements within the structure could be labelled, pa as referential, see as non-referential, defining see in “Pa see” as the head and active element from the projection. This had the significant effect of making it possible to state or think of a simple proposition, a fact which was either true or untrue.

The label defined a relation within the structure, as Pa doing the seeing, or someone else. The topmost level is said to ‘project’, defining the structure as a whole. But there is n0 reason for thinking that this happened other than over over tens or hundreds of thousands of years. This is on a different scale from observations like those of the elderly in 2021 who may struggle to understand younger speakers who have adopted some new grammaticalisation or lexicalisation like isn’t it, as in “That might look good, innit?”

More generally, Label anchors the elements within a structure to one another. But along with the element of structure by labelling come two crucial notions of grammar:

• ‘Subjecthood’, the notion of some entity universally having a special and unique role in every sentence, as by the difference between “Rabbit eat” and “Eat rabbit”, where rabbit is the subject of the first, but not the second.

• ‘Transitivity’ as by the difference between “See Ma” and “Pa know”, where the first is transitive and the second is intransitive.

Informal rule:

§ Of X and Y as X Y, Label X and Y by their most minimal attributes; Mark X or Y as the head of X Y; Project the head of X Y as a head.

What Label labels

Label defines and introduces the major categories. There are elements which are referential, from Point and Glue, functioning syntactically as nouns like man, silence, cruelty. There are terms which modify like certain, improbable and instantaneous, classified as adjectives. There are terms like in, between, pre and post, as in ‘pre 9.11’ and ‘post 9.11’, in some languages like English, coming before what they relate to, and known as prepositions, and in other languages, following. There are terms which do none of these things, like kill, negate, falsify, misleadingly called ‘doing words’, when many do nothing at all. And there are numerals. All of them called ‘parts of speech’ or members of ‘major syntactic categories’ – because ‘speech’ is a misnomer here.

• Label defines Tense, as in “Dropped baby”,

The major syntactic categories are all distinct from functors, of which the primordial element may have been not, as in “Not Pa!”

There are still traces of language without Label, in everyday greetings, in expressions of surprise and apology, in questions by intonation alone, as in “Eating seaweed?” Significantly, the rising intonation here is almost universal across languages. It would thus seem possible that speakers with Glue found this way of asking questions before Label as a tool had evolved. But when it evolved, Label was hugely beneficial, taking a step towards reconciling the pragmatics and the syntax.

Label gives elements jobs, by the hierarchy of the structure, some more important than others.

Wall – and not just one wall

A child represents a house as a front wall. But this is only a partial representation. A wall on its own can topple over, as walls sometimes do. To stand up permanently and securely, a wall needs to be buttressed by others, by one tradition and conception at right angles, by another as circular, by a continuous series of tangents, with improved effect. Either way, this demands a specific cognition.

The evolutionary originality, the advantage to the species, is in the buttressing. Every new wall buttresses a previous wall by the fact that they are at angles to one another. This is recursive – on the understanding of recursion, by the application of a device to its own output. So the same structure can be extended indefinitely. The advantage of this is conceptual economy. The system does more with less, by recycling the raw materials.

Once one set of walls has been built to make a house, another set can be built for a room inside, or for an extension outside. This is quite different from the builder of the nest who adds one twig or branch at a time, with no thought of any relation between one set of twigs or branches and another. The key is the recursion which can be taken to any level. With walls, at a very humble level the house is divided by a wall into two rooms, one for the people and one for the animals. Such houses were still inhabited in Britain in living memory. At the most grandiose level, there is the celebrity’s mansion with its wings for servants and for security staff and for the foibles of the owner or owners.

Speech and language

The same set-theoretic cognition can be applied to speech and language where the projection is the point of contact between structures, but the setting at different angles is no less crucial than in building the walls of a house or mansion. The relation between these applications of the same cognition is not relevant here. But there is one significant difference. In building a house or mansion, the corner between one wall and the next is a day’s work, with time to stand back, think, and check the angle. In building any sort of linguistic structure the process is incomparably faster, so much faster that it is quite beyond human introspection, only discovered in the mid 1950s, and represented in the technical literature by a triangle with a point at the top, representing the projection. It is the projection which articulates with a higher level of structure, allowing the recursion.

Pragmatics, semantics, syntax, phonology

By the realities of discourse, a proposition is stated as an event in time, with some purpose – to shift the focus of attention, to remove a doubt, to get some information, to correct an error, to reset a time scale, to make a commitment, to make some necessary assertion or flippant comment.

A proposition, which can be true, false, half-true, self-evident, misleading, is built from different sorts of element. A verb, ,defined as such by its label, has a particular sort of semantic relation with some entity according to its meaning. This relation is traditionally known as the ‘object’, but also the ‘complement’. So it is possible to kill a flower, an animal, time, an idea, but not a house or water. These verb phrase relations are expressed by the first step of Glueing.

But there is a different, looser, less semantically-restrictive relation between the verb phrase and the agent of the killing, traditionally the ‘subject’. To use Hagit Borer’s terminology, almost anything can be ‘coerced’ into the subject relation: an animal, water or a house can all kill a flower, even time, with some rather heavy coercion.

The difference between these relations can be expressed by the angle between the projections. But at this point the connection with the physical world breaks down. In order to capture significant variations in time and closeness, this sort of structure is best conceived on different planes, like the front wall, side walls of a house, and the pitches of the roof. We need what linguists call a ‘cartography’, by the cartography proposed here, not possibly on a two dimensional surface, but with more than two dimensions.

There is a similar contrast between different aspects of the speech sound, some given by the shaping of the vocal tract, by the point of greatest narrowing, whether this is by the lips or the back of the tongue, and another given by relative timings of different actions. Essentially, these are different sorts of thing.

Angles and planes

By the Minimalist Program, in “They loved football” there is one sort of relation between loved and football, with love as the head, and a different, higher sort of relation between they and loved football with the -ED of loved as the head. By my proposal here, this difference is captured by an angle between the planes. The plane headed by the suffix -ED heads the other plane.

These angles and headships are each as real as each of the seven estimated twists in DNA and the links between the strands. They are not theoretical metaphors.

By the Minimalist Program, in “What did they love?” there is a still higher relation between what and the rest of the structure, defining the question force, by my proposal here, at yet another angle.

By this notion of recursive hierarchies, it is possible to define and contrast different degrees of domination, the least possible with some equal or greater degree. This is the main part of a relation known as ‘C-Command’ (for ‘Constituent Command’) defining many aspects of syntax,

The movement is some distance ‘up’ the structure, for the sake of simplicity, a minimal distance. The process is repeated if the conditions are still met after the first application, as in the case of “Who does she think the television is saying won the election?” with who moving several times.

Informal rule:

§ X Y and Z W, each with its projected head X or Z; Set Z W at an angle to X Y; Project X or Z.

Wall helps a labelled structure fit some need or situation. The difference between the planes allows the cartography to adjust the relative saliences between elements, to glue two elements indissolubly together, as for a sailing boat or aeroplane, or just lightly as for a violin. The indissoluble glueing is what von Humboldt probably had in mind when he invented the term ‘agglutination’, as between the n’t and do in don’t, as opposed to the much looser connection between do and you in “Do you do that?”, which can be separated (though not conventionally for the vows) by really, and so on, and the loosest of all possible connections where the overt pronunciation of an element is optional where the only definiti0n is of its exact or approximate position in sequence. (The approximation applies only in those languages like classical Latin and modern Russian with much looser definitions of word order than allowed in English). English is only very mildly agglutinative. Strong glueing applies only to contracted elements like n’t and whatever they get glued to.The native languages of North America are much more strongly agglutinative. According to Barbara Mithun who was tasked with writing a dictionary of Mohawk, one can ask a Mohawk speaker: “What was the last word you said?” And the answer is what would be a sentence in English. The difference is in the tight way that Mohawk glues words together.

What Wall does

What and not are both archetypal functors, outside the system of syntactic categories. Both exploit the possibilities by Wall.

Wall exploits the salience of particular structural positions. The most salient of these positions is the left edge or what is known as the ‘left periphery’. I say the left edge because words, phrases, and sentences all have left edges.

• The most salient of left edges is the left edge of the entire structure. This allows requests for some specific information by one of a special set of words or phrases, in English who, what, which, when, where, why, how, how many, how much, how often, and so on, in English, as in many languages, mostly on the leftmost periphery, as in “What did you say?”

• In almost all modern languages, the marking of edges helps to distinguish between positive and negative statements of fact, denials or affirmations of truth, commands, questions, by forms such as not. More simply than modern English, a now archaic English just had not after the verb as in “I think not” and “I know not” By contrast in modern English, not is sequenced on the end of an auxiliary element such as can. The contracted form n’t always appears indissolubly glued to the right edge, without the vowel, as n’t, exploiting the difference between the planes, as in “I can’t see you” from “I can not see you.”

• The functor that alternates between being pronounced and not being pronounced in “I say that he died” and “I say he died” , again on the left edge.

• The marking of edges also helps to define the statement of possibility, permission, compulsion, relevance to the present, what is known as ‘evidentiality’. All of these are expressed in in English, as in almost all languages, on elements of the verb. But English does this in an uncommonly complex way, up to the limit represented by “Couldn’t Father have been being honoured?”. Evidentiality defines the contrast between fact and uncertainty. Many languages, such as French and Italian, use a form ot the verb known as the ‘subjunctive’ to allow this to be encoded. For this, an abstract form has to be entered on the left periphery. The subjunctive is becoming archaic in English, but it is still interpretable by most speakers in “I demand that she be admitted”. Here be is a relic of a subjunctive once much more widely used. In such a case, the subjunctive encodes a degree of doubt or uncertainty about whether she will be admitted or not. English happens to be poorly stocked with ways of expressing evidentiality, and makes do with words like allegedly and expressions like “It is alleged that ….” I once heard one of my children say, “I think I might have misunderstoodended that.” What is known as ‘multiple marking’ – in this case by the four elements in stood, -en, -D and -ED. Misunderstoodended may have been an attempt to encode evidentiality more definitively than English allows.

• Respect or deference, sometimes known as ‘register’, is expressed in most languages by one or more special terms for English you. Familiarity is shown by tu in French, du in German, etc., equivalent to thee and thou. Most modern English dialects don’t have a word for marking respect. There is a special vocabulary for marking disrespect in words like lurk, amble, shamble, ramble, toddle, witter, babble, but this does not fill the gap left by the disappearance of thee and thou. Normally, children only start learning this aspect of language between three and four. On the basis of an insightful observation by my youngest son at the age of nine, respect is expressed indirectly in English, by not referring to a third party by their relationship to the addressee. So to say “I just saw your colleague / wife” implies equality or superiority. How this is learnt is not obvious.

• The forms, generally known as ‘personal pronouns’, I, you, he, she, etc., refer according to the relations between the participants in the conversation. Mostly a shorthand for fuller reference, pronouns are dropped by ‘Pro-drop’ in Italian-type languages in which “I love you” is said as the equivalent of “Love you”. In English “Mind if I come in”, both the pronoun you and the verb do go unpronounced. In English, as in almost all languages, commands are mostly issued with a bare ‘verb phrase’ like “Come in”, “Go to sleep”, “Put your coat on”.

• The two commonest words in English, a and the, relate a referent to the history of the conversation or the immediate world of the speaker and listener or listeners. In “A woman laughed” we may know nothing about the woman. But in “The woman laughed” we know who she is.

• The perception of some entity as animate or of one sex, as something unique like a child, partner, lover, parent, or grandparent, in some large numbers like the fruit on a tree, in every case restricting the scope of reference. A category known as ‘number’ expresses the difference between single and plural entities, in modern English regularly written as S or -ES.

By my proposal here, in a novel way, Wall is no less powerful in relation to the sound system or phonology.

• On time scales over a hundred years or more, vowels can be changed. Take name and tide, pronounced in the time of Chaucer with an AH vowel in name, and an EE vowel in tide, and a short AY in the second syllable. By the effects of what Otto Jespersen called the ‘Great English vowel shift’ now qqq. often called the GVS, both AH and EE vowels became diphthongs with the tongue moving up towards the front of the mouth in the course of articulating them, in name from a mid point in the mouth and in tide from down low. This change happened over a number of generations. But speakers who were making the fractional and mostly imperceptible changes in their speech were, by my proposal here, exploiting the functionality of Fit.

• Syllables could be doubled with some contrast in length or volume as by modern Mama and Papa in English and French, or combined with one another, as in Moonie and Barney. Vowels could become more internally complex towards something like the near limit case represented by English. Phonemes could merge the properties of vowels and consonants, as in W, Y, L and R. Diphthongs could be built from single vowels for greater contrast. Clusters could be formed by combinations of consonants towards the limit cases represented by some of the languages of Georgia. Since there is no limit to this recursion, there is no maximum limit on the length of words or sentences. But very long words come at an obvious price. Languages tend to keep the most important, most commonly used words, short and simple.

• By what is known as ‘lenition’, the T in little and between the N and the M in huntsman is rqqqeplaced by a glottal gesture, effecting an increase in the acoustic contrast with the two closest or most similar sounds, P and K in supple and nickel.

• By the seemingly opposite devices known as ‘assimilation’, as in phrases like “good morning” as GOOB MORNING and “ten girls” as TENG GIRLS, and ‘dissimilation’, as in little as LIKU (where these processes have the effect of changing what is already there) Fit adjusts levels of contrast. In the much commoner assimilatory cases, a tongue tip articulation in the D in good is lost to the labial articulation of the M in morning, becoming a B. And the N in ten is lost to the back-of-the-tongue articulation of G in girls. Here Fit diminishes contrast. In the cross-linguistically rare case of dissimilation (rare across the world’s languages, but available from the toolbox of devices), Fit increases contrast. This occurs mainly in child speech. It is, in my view, hugely underestimated in studies of child speech, and often not even recognised or listed. Assuming that the small child correctly identifies the final L segment in little as being by a tongue tip articulation, the childish pronunciation as LIKU involves a sharpening of the contrast between the T and the L. Although the LE syllable is a full part of the root in modern English little, this was plainly not always the case, as shown by settle, as a place to sit for a period, fettle, to fit the handle onto a cup or jug, ladle a deep spoon for loading a soup or stew from one container to another, handle as part of an object doing the job of a hand. And in the case of little it seems to have originated in Proto-Indo-European as leud meaning small, where the LE may have originally emphasised smallness or perhaps dearness. And this may have been enhanced by the dissimilatory change of articulator if LIKU once persisted longer than it does in modern acquisition.

• A set of words including that in English is used to define what is known as a ‘subordinate clause’, as in “I know that I am right” Or “I know that you think that I am wrong.” The functor that marks the embedding or subordination of one clause within another, exemplifying recursion. Its role is emphasised by the fact that it is alternately pronounced and left unpronounced.

• By what is known as ‘reduplication’ or doubling the whole structure, as in chop chop, Fit can either mark the structure as non-literal or emphasise it, as in very, very good, or for some speakers, to denote an extreme example, as in “I went to a school school. If you looked out of the window you got tied to the chair.”

By my proposal here, again in a novel way, by the different angles, it is possible to define different sorts of entity, such as contrasting relations of time, and long distance immediacies:

• The presence or absence of a pause in what are known as ‘restricted’ and ‘unrestricted’ relative clauses, as by the different readings of “I don’t like butch men who abuse women”. With a pause between men and women, the inference is all all butch men abuse women. Without the pause the inference is that only some do.

• The difference between syllables and words, a sequence which is sometimes reduced within the phrase as in “Good morning” in English, almost lost as in a pinta milk, and between functors as in Don’t, or increased as in “Australia R or England” as in English English but not in Scottish English;

• Length variations between long vowels like the EE in me where the tongue is squeezed and tensed forwards and upwards in the mouth, shorter, less far into the corner, without tensing in him, with length in the vowel used to mark the voicing of the B in rib, and in diphthongs like OY in boy where the tongue moves forwards and upwards throughout the articulation;

• The contrasting edges of the affricates at the beginnings of Chay and Joe, and the on-glides and off-glides in the AY and OE vowels or nuclei;

• The variable delay between the release of the stops in tie and die;

• The doubling or ‘gemination’ of N in words like unknown and unnerving.

• The finest steps of a derivation, unfolding too fast for conscious human perception.

Wall makes structures stand up, facing the right way, helping to ‘do things with words’ precisely, with the detail and nuance largely from the angling of the planes.

Move

By what Chomsky and most other interpreters of the Minimalist Program now call ‘Internal merge’, a functor like who, what or where is not MERGED on the left edge it is MOVED there. By Move, I am resurrecting Chomsky’s terminology from the 1950s. By my proposal here, Move represents the benefit of a different, more evolved cognition, seeking to increase the pragmatic salience of a particular position in the structure by moving an element to it, rather than just finding it there.

The value of separating the function by Move from the positioning by what the Minimalist Program calls ‘External Merge’ is recognised by the syntactician Cedric Boeckx and a number of biologists. It gives a better account of evolution.

On some current interpretations of the Minimalist Program, Move either copies an element X leaving X in its original position, or just moves X leaving a trace which is then linked to X in its new position, thus defining the history of the movement.

Informal rule:

§ Take a functor X; Move X to a C-commanded position, leaving a trace of X, Xi: Link X and Xi.

• The left periphery becomes a container for the pragmatic, aspects of the sentence, as in “What did the man hit?” There is evidence of the movement in the fact that it also makes sense to ask “The man hit what?” where the force, to use a term from Austen, is one of surprise, rather than a request for information. But in English, as in most languages which have this movement, it only happens once. Hence a doctor might ask about a child showing symptoms of poisoning “What did the child eat when?” or “When did the child eat what?” But as shown by Tom Roeper and colleagues, normally developing children mostly understand such multiple WH questions correctly despite their rarity. But this is not so for many children with language issues. Not all languages move the equivalent of Wh words. Some languages like Japanese and at least most Chinese languages don’t move them at all. Other languages, like Hindi and Kurmanji, only move some of them.

• By a mechanism nowadays mostly called ‘agreement’, properties like singularity, plurality, person, gender can be moved or copied from a noun element to a verb element. So in “There are two cups on the table”, the plural form of are copies the plurality of cups, just as the singular form of is copies the singularity of cup in “There is a cup on the table.” And in “I am talking” and “We are talking”, am and are copy the singularity and plurality of I and we in the subject. English makes relatively little use of agreement, much less than most Western European languages. Many languages have more than one form of this. But for the purposes of morpho-phonology, English has three forms of the S sound for singularity in verbs and plurality in nouns, as a plain S in pats, as a Z in pads, and as a syllable in patches, wedges and messes.

• A tensed element associated with the verb is copied to its left by Do support to point up a question in  “Did the man hit the ball?” 

• In spoken modern English, the negative form not, or a shortened form, is moved to the right edge of whichever element of the verb is used to bear tense, often do, does or did, as in “I do not drink”, or  “I don’t drink”, or  “I didn’t drink” always on the left edge of the overall structure of the verb.

• The irreducibly necessary sub-structure traditionally referred to as the ‘subject’ is replaced by an element Moved from a position elsewhere, in “The question was batted aside by the politician” reflecting an interest in what happened to the question rather than the politician’s words. English happens to make extensive use of this device, known as ‘passive’. Not all languages have it.

• By another device an element which would traditionally have been characterised as the ‘object’ is moved left to emphasise its significance as a topic or point of focus, as in “Questions from that journalist I don’t like at all”. This is rather less used in English than in other languages, in English mainly to emphasise some significant judgement.

The same Move functionality is involved in the formation of a class of expressions moving elements within the structure of existing words:

• By extracting from some word its stressed vowel and everything else on the right of that, what is known as the ‘stress domain’, doubling the stress domain on the left, and adding an H as the initial consonant, as in hodge podge, hardy gurdy, helter skelter, higgledy piggledy

• By doing the opposite and doubling two forms with a match only between the initial consonants as in trick or treat

• By stringing together items differing only in the stressed vowel, where the first is always short I and the second is either a short A or short O, as in chit chat, knick knack, riff raff, pitter patter, flip flop, tip top, hip hop.

It is possible to devise a grammar which generates words like where on the left periphery, i.e. without copying or moving anything anywhere. But it is only by copying or moving that it is possible to capture the relation between “Where are you going?” as a simple question and “You’re going where?” as an expression of surprise.

Move gives particular functors particular roles, sharpening clarity, strengthening contrasts. The possibility of copying and moving labeled elements made language more reliably understandable – but at the expense of learnability. The mechanisms by Glue, Label, Wall and Move may have been complex and hard to learn, with the effect that the learner’s best hope of converging on the grammar was by guess work and hoping that the guesses were mostly correct. There was no guarantee of all leaners progressing to a point of competence. There was just a wide spectrum of approximation – from good talkers to less good, in a way similar to Basil Bernstein’s theory of restricted and elaborated codes. Competence in the full apparatus of grammar may have been the privilege of only a small élite. 

Capsule

The ability to summarise a seemingly complex argument in a few words is rightly admired and respected. It’s like taking a complex operation and reducing it to bare essentials without leaving anything out – what the Bauhaus movement sought to do in architecture. By my proposal, something like this grew within a community, not in the sense of a story or argument, but in pithiness and brevity, by an optional device, expressed in various constructions.

By a series of recent grammatical insights by Noam Chomsky, Hagit Borer, Klaus Abels, Cedric Boeckx, and others, key aspects of syntax are squeezed into the smallest possible span of derivational structure. The effect is to encapsulate the total apparatus by Glue, Label, Wall and Move in the smallest possible derivational span. Generalising across these proposals, I am calling this encapsulationThese Capsule.

Chomsky describes this process in terms of ‘spelling out’ the derivation in ‘Phases’. By Spell out the derivation is sent to the phonetic interface to be pronounced and to the semantic interface to be interpreted or understood. Obviously as soon as this has happened nothing else can happen, the derivational content becomes ‘inaccessible’ in Chomsky’s terminology. Chomsky suggests that there are just two Phases, one including verbs like make and let as in “The detective made her plead guilty” and other when the structure is complete to the highest level of complexity, as in “What did I tell you the journalist said he thought the detective made her do?” with what appearing five clauses up from where it is understood – as the effect of the detective’s pressure. Here there is no limit to the complexity which might be encapsulated at the highest level.

Borer proposes what, with characteristic wit, she calls the ‘Exo-skeleton’, which is essentially a small chunk of functional structure which defines the totality of possible variation in human language. By what has become known from her PhD work under Chomsky’s supervision as the Borer Chomsky conjecture, all linguistic variation is by the properties of functors, the very elements which small children characteristically leave out. So if the conjecture is correct, as I believe it is, in order to progress to a final state of linguistic competence, children have to be focusing on the very things which are least apparent in their actual speech. The theory of the Exo-skeleton takes this a step further narrows down the possible variation further by eliminating the very notion of nouns and verbs, hard-coded as such in what she calls the ‘encyclopedia’, requiring that what appear as nouns and verbs acquire these roles by the way the functioal structure is expressed in the Exo-skeleton.

Klaus Abels, proposes more Phases. Cedric Boeckx proposes a tighter definition of what it is for derivations to proceed locally.. Chomsky, Abels and Boeckx and all follow the same sort of derivational approach.

The fundamental. common insight is to encapsulate the possible variation, what the child learner has to learn, into a relatively narrow space. For the purpose of evolutionary adequacyt=, the task is to express this in terms which can be read biologically within the genome. Since I have no idea how to do this, I am not going to try here. This is the first task for my current research program.

The scope of the inquiry

In the study of speech and language there is an obvious tension between the externality of what is heard and the inner structure of what can be given a meaning like the ordering of  the two words in “Something good” and the almost complete uninterpretability of “Good something”. On one approach, the external expression and the inner structure are quite separate. On another they are part of one system.

In 1968 Chomsky and his then colleague, the late Morris Halle, published the Sound Pattern of English, arguably the most talked about book in the history of linguistics. It is one of the few books to have been republished in paperback after both authors had discarded all of its main theses. By one of these theses both speech and language are derived by a process which to the greatest extent possible uses a common apparatus in both areas, and because it is using same apparatus cyclically, does this as economically as possible. Both Chomsky and Halle, for different reasons, changed their minds on the desirability of a common apparatus. The proposal here upholds the common apparatus view from 1968. This is partly motivated by the typical multifactoriality of speech and language disorders, partly by the strange and anomalous distribution of English children’s speech errors (limited evidence suggests similar distributions for children learning other languages) and partly by the conceptual necessities of evolution.

But going several steps further, by the proposal here, the whole process of speech and language evolution has been from particular cognitions which were recruited and adapted for a special purpose, to run at very high speed, and possibly to have parallel effects in human thought generally, – with great consequences for the process of acquisition and thus for child speech. The general principle is to do as much as possible with as little apparatus as possible (stretching the derivation, increasing the scope and range of contrasts, minimising the phoneme inventory). If the learnability space is broken down ink this way, and if the parameters thus have to be set in small groups, the combinatorics of parameters are reduced to a point of manageability.

By a speculation here, the only true speculation here, the effect of Capsule may be by any one or more of a variety of optional processes, all helping the immediate pragmatics of everyday discourse, reflected to different degrees in different languages and cultures, but some reflected in all languages and cultures. One example of this known as ‘elipsis’, leaving an element unpronounced to emphasise the rest, as in “She volunteered, but he didn’t” (not pronouncing volunteer). When somebody volunteers, “I am ready to go,” one possible response would be, “Do you really want to?” not pronouncing the word go. But the elipsis leaves what is commonly known as a ‘trace’. Without the elipsis the response might be “D’ya really wanna go?” With the elipsis, the word to has to be pronounced, and can’t be reduced, as by “I really want to”. The reduction is blocked next to the trace.

By a variety of other constructions either the topic or the focus of a sentence is moved the left edge, as in “The chocolate, we stole because we were hungry” or preserving the order of the words in the clauses and adding a clause, by a process known as ‘clefting’ as in “It was because we were hungry that we stole the chocolate” or by a different sort of cleft “What we stole was the chocolate, because we were hungry.”

These processes have been known and described for many years. What is not so simple or obvious is how they should be categorised – singly, or in groups? And if in groups, which groups? But however Capsule applied it must have been visible, such that it could be copied, and beneficial and worth copying. As it was copied, by virtue of the encapsulation, the combinatorics were reduced to the point that the parameters of speech and language became finitely learnable. Thwas an immediate and enormous benefit to the human species because all members would now understand things the same way. This had to be the main point of the learner’s attention. At that point speech and language became finitely learnable.Successive re-applications move a functor to the left periphery of the sentence. Or it gets glued to the right edge of the leftmost word in the case of n’t in “Haven’t you finished the washing up?” by the special functionality of the Z plane.

There are two major proposals about how this happens, the first from Noam Chomsky, the second from his one-time student, Hagit Borer.

Capsule and speech

In the case of T, the least marked consonant in English and most languages other than those with very small inventories, the particular pronunciations are given by separate steps, defining the articulation by the tip of the tongue, the short period of closure and complete release, the complete closure of the nose, the relation in time between this action and an adjacent vowel. This process of ‘derivation’ builds the phoneme as pronounced. But at a given point in the process of sound formation, each phoneme is defined. This goes on all the time in productive speech and in reverse in processing speech, but so fast that speakers are unaware of any such process. 

Take the affricates at the beginnings and ends of the words church and judge. Languages are likely to differ in how such segments are defined. Phonological specificities surface in:

• The differences between English R, the Russian and Scottish burred R, French and German back-of-the-mouth R, and Spanish R with what is known as a ‘tap’;

• The difference between Russian T and D and T and D in the languages of Western Europe;

• The differences in the timing of ‘unvoiced’ or ‘voiceless’ stops in the North of England and Ireland, London, Paris, and Southern France;

There are only so many logically possible sequences in the building of the features. Some are easily heard and articulated with only minor variations, with the effect that neighouring languages like English and French can and typically do have very similar settings for phonemes like T – though not exactly the same, with Russian T more different (articulated ith the the tongue tip against the points of the teeth). From the work of the Russian linguist, Nikolai Trubetskoy, these combinations of settings are often characterised as ‘unmarked’. 

An angle on stammering

There is negative evidence of Roof from the disorder of stammering. In 1994, in the opening paper at the International Child Language Seminar I proposed a way of reconciling psychological and physical aspects of stammering by a defect with respect to a buffer, already proposed by other authors. The notion of a defective buffer helps to explain three things. First, stammering is never attested on the first words, but only when language is already in full development, most often between the ages of three and six, but sometimes later. Second the speech of normal speakers is massively disrupted by hearing themselves speak over a given delay; In the terminology of those who treat stammering, they block violently; for most speakers, the greatest effect is from a delay of around a third of a second; this is known as Delayed Auditory Feedback, or DAF; but for most stammerers, the effect is reversed; they become more fluent. Third, there is the equivalent of stammering amongst native users of American Sign Language, albeit with only one tenth of the prevalence of stammering in those using voice (America is the only country with enough native signers to get reliable statistics about a phenomenon which only occurs at a rate of one or two per thousand signers). 

The greatest utility of such a buffer is in relation to the left periphery, evidenced in language after language, allowing Wh words to be correctly understood as in sentences like “Where do you think they said we might have put the car keys?” The buffer stores a left shifted element until it can be interpreted.

The timing of the buffer is hard-wired, a special adaptation for language with the power of moving elements leftwards. It does not vary across languages. For stammerers, DAF restores the normal effect of the buffer. But the buffer itself, as an expression of Phase, is universal.

Making us human

By the proposal here, what I am calling Capsule made humans what we are today, at least almost uniformly smart when it comes to learning to talk. At this point in human development it made sense to think of shared entitlement and responsibility, in a way which would not have made sense previously. But the functionalty which makes speech and language finitely learnable does not resolve all of the issues. It does not work perfectly for everyone.