I hypothesise that within a population with Glued structures involving words and the features of phonemes, there was a later equally decisive innovation by which elements within a glued structure could be fitted to social or situational needs. Let us call it Fit. First elements are glued. Then they are fitted. Fit helps to reconcile what is known as ‘pragmatics’ or language as a tool and ‘syntax’ as the mechanism of word assembly by which meanings are defined. But by my hypothesis, Fit, like Glue, applies both to ‘syntax’ or grammar and to the sound structure or ‘phonology’.
Making things fit
My motivation here is to explain some seemingly contradictory phenomena in the speech of both normally developing children and many of those with speech disorders, phenomena which are otherwise very hard to explain.
In phonology, one key aspect of Fit is to adjust contrast, either increasing or decreasing this, and resetting time-scales from the imperceptible differences between the speech of different generations to much coarser differences. One of the coarsest time scales is in diphthongs in words like die, between the on glide with the tongue low in the mouth and the off-glide with the tongue high and at the front of the mouth. There is a much finer time-scale difference between the ways that T, P, and K in tea, pea, and key are pronounced in the North and South of England. The difference is in the delay between the release of the closure by the tongue or the lips and the measured opening of the vocal chords, allowing them to vibrate spontaneously like two flags in the wind, with the effect known as ‘voicing’. In the North this delay is longer than in the South.
This, like at least one much-finer time-scale difference, is crucial to the operation of Phase, which I propose was the last turning point in the evolution of modern speech and language.
Within the human lineage, Fit came after Glue. But like Glue, Fit was hugely beneficial.
Fitting sound to structure in phonology
There are several ways in which languages use the Fit apparatus in the phonology.
One use of Fit in phonology is what is known as ‘lenition’ when one of a group of sounds or ‘phonemes’ is articulated, not by the tongue tip, but in the voice box by what is known as a ‘glottal stop’. This happens in the T in little in Cockney and many other varieties of British English and in the T in huntsman in possibly all British varieties. The effect is an increase in the acoustic contrast between T in little and subtle and the two closest or most similar sounds, namely P and K in words such as supple and nickel. And in huntsman, the glottal articulation of the T increases the contrast with the tongue tip articulations of N and S and the lip articulation of M.
Another such fitting is by two, seemingly opposite devices, one decreasing contrast by what is known as ‘assimilation’ or ‘harmony’, the other increasing it by ‘dissimilation’ or ‘disharmony’, the former much commoner than the latter. The commonest instance of assimilation in English is in phrases like good morning and ten girls. The pronunciations are effectively GOOB MORNING and TENG GIRLS. The tongue tip articulation of the D in good assimilates to the labial articulation of the M in morning. And the similarly tongue tip articulation of the N in ten assimilates to the back-of-the-tongue articulation of girls.
By a more puzzling aspect of this fitting, sounds can seemingly exchange positions. This is known as metathesis. It happens in the history of languages, as when Viking Norse Brevik became English Berwick. Here the R metathesises with the stressed vowel to become the onset of the final syllable, leaving the stressed syllable without an initial cluster.
By timescale adjustment, lenition, assimilation, and metathesis, the acoustic contrasts and clusterings are fitted and refitted to the rest of the sound structure, or the metricality.
Universality and vulnerability
Here I shall assume the strongest possible version of what is known as ‘underspecification’ by which to the greatest possible extent phonological structure is built ‘online’, keeping the mental storage of phonemes down to the absolute minimum. The effect of this is that when speech is entirely unconscious as in dreams, there is no online, and the process of production is many times faster. The original insight here is from the 1984 work of Diana Archangeli. She has since rejected the idea. But it can be rescued, I believe, by combining it with the 1995 insights of Carole Paradis and Jean-François Prunet. By this combination of ideas, the tongue tip is the default articulator in English, as the default vowel in derived forms in English is the I vowel in the second syllable of washes, catches, and edges. For obvious logical reasons, the default setting is made last. At this point the formation of the phoneme is complete. But in speech that is less than fully competent, this completion is not always well-defined. The final step of phoneme completion can be delayed or brought forward. So there are various vulnerabilities with respect to the articulator or the position of a closure in the vocal tract. Generalising across the speech of different children, the contrasts between the default tongue tip articulator and the other articulators, the lips and the back of the tongue, are vulnerable in different ways in different words at different stages in the process of acquisition.
In hospital commonly mis-pronounced by children as HOSTIPU and spaghetti as BASKETI or PSKETI, elements of the phonemic structure are copied incorrectly. In hospital competently pronounced, the tongue tip T onset of the final syllable contrasts with whatever is left of the syllabic L, mainly a lip rounding gesture very similar to the vowel in pull, put or book. But the native speaker knows that the origin of this is a tongue tip L, as evidenced in hospitalise and the less common hospitaller. In both of these cases the L initiates a syllable. In children’s speech the lip-rounding of the final syllable in hospital is increased. When the word is said as HOSTIPU, the lip action of the P is copied rightwards to the onset of the final syllable. What is left behind at the start of the second syllable is a stop without a defined articulator. This is then said as a T. The effect is that of metathesis.
In spaghetti, the child’s system has reason to reject the SP cluster at the beginning of an unstressed syllable before the stressed syllable. It may be the only such word in the child’s vocabulary, and easily rejected as a possible word by a natural childhood experience of words like spy, spare, spit. So the S decamps to a more familiar sort of position in the beginning of the stressed syllable, and the G loses what is known as its ‘voicing‘ to match that of its new neighbour and becomes K. The structure is now more familiar except that the P has been left behind. And it usually becomes voiced as a B. But in some children’s speech, the conversion is more ruthless. Only the P is fitted to the stressed syllable. The initial unstressed vowel is left out. And an initial cluster of PSK is formed in a pronunciation as PSKETI. Nobody would call this a natural way of making the word easy to say. But it has an easy derivation in terms of fitting just the S into the onset of the stressed syllable.
On an alternative ‘process account‘ BASKETI is commonly described in terms of ‘migration’. But such an account assumes a ‘process’ which has only one common exemplar. It is more parsimonious to postulate a general Fit functionality which is justified independently in competent speech and language. On such reasoning, a process account is rejected here.
Many normally developing five year olds mispronounce magnet as MAGNIK. Here the back of the tongue articulation of the G in what is known as the ‘coda’ of the stressed syllable is copied into the tongue tip T coda of the final unstressed syllable, without being lost at the point of origin. The two codas contrast in their ‘voicing’, or the time relation between the release of the closure by the tongue. The effect is one of harmony or assimilation, as though the G / T contrast was too great for the child’s system to handle.
This is similar to the two year old saying doggy as GOGI, except that in magnet as MAGNIK the context is much more narrowly defined.
There is in competent speech an opposite process known as disharmonyn or dissimilation. This is uncommon in the history of phonologies. But it is commonly part of the speech of normally developing children, particularly between the ages of two and four, when words like little and middle are often said as LIKU and MIGU. Even though there may be no overt phonetic trace of the tongue tip L in the child’s pronunciation, there is nothing in the child’s experience of sp0ken English to suggest that there could be a word ending with the vowel in full or pull. It would not be a possible word. The presence of the L is signalled in forms like fully and pulling in which it is the onset of a second syllable. And the child’s system retraces the history of the final U sound back to its origin as L, baulks at the fact that it is right next to another tongue tip sound, T or D, and increases the contrast by moving the tongue articulation back to K or G.
Something similar happens in the speech of children of seven or eight, who mispronounce monopoly as MONOKOLI. Here the environment is very narrowly defined with a lip action M before a tongue tip N, a lip action P after a stressed vowel with lip-rounding, and an L in the final syllable, capable of becoming a rounded vowel in other circumstances. Here the dissimilation from P to K has the same effect of increasing the contrast.
In the most puzzling of these cases, in the speech of children up to the age of eight, in a very narrowly defined set of environments, we find what might seem like a process which has not been attested in the speech of younger children, namely assimilation in favour to tongue tip articulations rather than the reverse. This happens in the following cases (and in all probability in others which I have not so far discovered.)
• calculator as KALTALATOR
• cardigan as KARDIDAN
• hippopotamus as HITOPOTAMUS
• archeopteryx as ARTIOPTERIKS
This might seem like tongue-tip harmony or assimilation. In every case there is more than one instance of the tongue tip-articulation, in calculator in both cases of L and the T, in cardigan in the D and the N, in hippopotamus in the T and the S, in archeopteryx in the T, the R and the final S phoneme. But in every case too, there is another instance of the vulnerable articulator, in cardigan in the initial K, in hippopotamus in the second P and the M, in archeopteryx in the K phoneme in the final cluster. So this is disharmony or dissimilation as well as harmony or assimilation. The contrasts between articulators are just marginally adjusted, but only where the vulnerable element has a near match with one other onset, i.e. no other contrast, no difference in voicing, and with matching or near matching metricality.
For this loss of a non-tongue-tip articulator to happen, at this very late point in the derivation, the word must be almost ready to say when the definition of the articulator is suspended, and the default tongue-tip articulation kicks in.
In all of these common phenomena in child speech over the whole range of acquisition, the tongue-tip articulator is involved. In early speech, between two and three, the tongue tip articulation is defined outside the glued structure of the word. Towards the end of normal speech acquisition, this applies in only a small set of special cases where the child’s almost complete system is overwhelmed by particular alternations of contrasts.
The study of the apparatus here began from what seemed like two opposite perspectives. In 1955 Austen published Doing things with words, laying the groundwork for what has become pragmatics or the study of how language is used to reach particular objectives. In 1957 Noam Chomsky proposed two components within the grammar. The second of these, the ‘Transformational component’ went step-wise through a set of constructions, from a simple simple sentence, to negation, to questions, and more. Initially it seemed to most readers that Austen’s and Chomsky’s perspectives had no natural meeting point.
Chomsky’s 1957 analysis referred to a category which he characterised as the auxiliary, slightly extending traditional terminology. The auxiliary encoded possibility, permission, compulsion, relevance to whatever is the current present, tense, relative time, and what is known as ‘evidentiality’ in one understanding of the word might.
By combining two sets of rules, Chomsky showed how the first of these sentences could be reworked in steps.
• The BBC interviewed you
•. Did the BBC interview you?
•.Where did the BBC interview you?
•.The BBC did not interview you
•.Did the BBC not interview you?
•. You were not interviewed by the BBC
•. You might not have been being interviewed by the BBC
•. Why do you think she said you should not have been interviewed by the BBC?
One novelty of Chomsky’s 1957 analysis was to specify not only what the grammar allowed, but what it disallowed. Another novelty was the explicitness of the rules. Another novelty was the step-by-step progression. Yet another was the treatment of the category known as ‘tense’ expressing a time line from the briefest perceptible moment to a historical era, as a category in its own right, separate from the form in which it is contained, including did and were, the auxiliary forms might, and should, and the suffix -ED. Only where the main verb, in this case interview, stands on its own does the -ED suffix attach itself directly. Otherwise there is a sequence of forms in a fixed order, including the auxiliaries, each expressing a distinctive element of meaning, with tense attached to the leftmost. Simple questions are marked by the leftmost of these forms ‘hopping’ over the subject, as in “Did the BBC interview you?” Simple negation is marked by the negative form not before the next element in the structure. The forms do, does and did are thus forced by the rules of negation and question formation, rather than as stand=alone elements.
Questions flagged up by forms mostly written with an initial WH, including what, which, where, and why, involve a link across the whole of the structure. Thus “Where did the BBC interview you?” invites a response like “At home” or “They interviewed me at home” with where linking to an element at the opposite end of the sentence. Such forms are thus spoken in a position far from the position at which they are understood.
By Chomsky’s 1957 analysis, the question element of WH forms moved from a position in the immediate structure of the main verb to a position at the very beginning of the sentence. In a way that quickly became apparent, this could be any number of clauses away. Three clauses away in “Where do you think she said the BBC interviewed you?”
But looking at the sequence above in the light of how this approach has developed in work following Chomsky’s lead, it becomes apparent that there is a loss of innocence as the structure is deployed. “Why do you think she said you should not have been being interviewed by the BBC?” is tantamount to an accusation. The step-wise shift is effectively pragmatic, by a modern understanding of Austen’s 1955 proposal. Chomsky’s and Austen’s projects were not as orthogonal to one another as they first appeared.
Although most aspects of most aspects of Chomsky’s 1957 analysis have been superseded by reanalyses by Chomsky himself and others, the notion of question elements traveling to a distant destination position, has been widely retained. In order to provide a principled account of the fact that such paths from one end of the sentence to the other often seem to result in something being left in the original position, on one analysis of “Where did the BBC interview you?” the structure is derived from “The BBC interviewed you where?” But the fact that this is a question forces a copy to be made of where at the beginning of the sentence. And it is not pronounced where it started off from.
In a set of special cases, as in “The BBC interviewed you where?” where stays put. Such sentences are more expressions of surprise than requests for information.
In 1997 Luigi Rizzi took a step towards reconciling Chomsky’s and Austen’s perspectives. He proposed that that the destination position of words like where should be decomposed into a complex element in what is known as the ‘left periphery’. Here some of the main, effectively pragmatic, aspects of the grammar are packed into a single container.
WH questions are now commonly analysed by combining pair-wise theory of word-assembly with Copy as an additional functionality.
From a modern perspective, taking account of more data from more languages, the set of forms to which elements can attach needs to be expanded to include what are known as ‘subjunctive’ forms. The subjunctive is becoming archaic in English, but it is still interpretable in “I demand that she be admitted”. Here be is a relic of a subjunctive once much more widely used, as it still is in French, German and Italian. In all of these cases, the subjunctive encodes a degree of doubt or uncertainty about whether she will be admitted or not.
Fitting and anchoring
On the hypothesis here, what I am calling Fit is the effect of a specific evolutionary event, fulfilling various, largely pragmatic, functions. Its main mechanism is to assign one or more of a given number of functors to all the major categories, including nouns and verbs. These functors, like a and the and regular expressions of tense, have huge effects. Among these effects, Fit anchors the elements of a sentence to the situation and to the perceptions of listener and speaker and to previous conversation. Fit sharpens the clarity of the message. It adjusts levels of contrast in both directions – just as the anatomy of the middle ear combines the functions of a servo and a shock absorber, dampening or increasing amplitudes, making faint sounds detectable, and protecting against damage by loud sounds. And most importantly, Fit allows any number of meanings within meanings.
In English, as in many other, perhaps all, languages, Fit is often repurposed. English might was once the past tense of may. But it now encodes evidentiality, as in “The BBC might interview you, but they probably won’t.” Some languages do not express parts of the Fit apparatus. But most languages have ways of expressing most of it. And no language lacks it completely.
• Consider the sentence “Dad thinks Jack said Jill died” which can also be said as “Dad thinks that Jack said that Jill died.” with a main clause Dad thinks and two subordinate clauses, Jack said and Jill died. Switch around any of the three individuals, and something quite different is being said. That marks the subordination of one clause to another. It is a functor. This marking is optional in English. Each of the steps here are by what is known as an ‘recursion’. Since there is no limit to this, there is no maximum limit on sentence length.
There is a well-known and much-discussed claim by Daniel Everett that that there is no recursion in one language, known as Pirahã, spoken by one isolated, Brazilian tribe of less than 500 people. If Everett is correct, “Daniel says that there is no recursion in Pirahã” is untranslateable. No discussion of doubt, error, suspicion, or correct report is possible in the language. I find this unimaginable.
There is a simple possible claim to the effect that in all languages the process of subordinating one clause to another triggers an automatic functional marker which may, as in English, go unexpressed. This is simpler than Everett’s implicit claim to the effect that the process of subordinating one sentence to another is itself a variable across languages.
• One instance of fitting is by what is known as ‘reduplication’ or doubling. A word can be doubled for greater emphasis, as in very, very good, or for some speakers, to denote an extreme example of something, as in “I went to a school school. If you looked out of the window you were thrashed.”
• Or a sentence can be restructured to make it clear what sort of response is expected, as questions, expressions of surprise, commands, with Wh questions by fitting a double of the Wh form in what is known as the ‘left periphery’, as in “What is she doing?” or “Why are they here?”
• Also using taking a position in the left periphery, special words including that in English, are used to clearly and unambiguously define what is known as a ‘subordinate clause’, as in “I know that I am right” Or “I know that you think that I am wrong.”
• By the same device, but without any words equivalent to where or that, a chunk of the sentence is shifted ‘leftwards’ to emphasise its significance as a topic or point of focus, as in “Bull mastiffs I don’t like at all”. Or the focus can be shifted by the ‘passive’ as in “Bull mastiffs are feared by some people.”
• Almost all languages have what are often called ‘personal pronouns’, words equivalent to I, you, he, she, etc., which may all refer to the same person, depending on who is doing the talking and who is being directly addressed. These expressions are known as ‘deictic’ or ‘indexical’. Here the fit is by reflecting the situation directly into the syntactic structure. Indexicals are quite different from personal names which pick out individuals in a way that does not very from one context or conversation to the next. The indexicality of pronouns extends to here and there, come and go, this and that.
• The two commonest words in English, a and the, copy the bald fact that a referent is what it is, while relating it to the history of the conversation or the immediate world of the speaker and listeners. In “A woman laughed” we know nothing about the woman. But in “The woman laughed” we know who she is.
• Pronouns like I, you, she, can also be a short hand for a fuller reference. But not always. In “It is rumoured that…” or “I want it to be noted that…..” it plainly does not refer to anything. It just reflects the topic of the note or the rumour, where the doing in Austen’s sense is just the speaker’s reluctance or refusal to commit him or herself whatever is being rumoured or noted. In “I think it odd that….” it reflects the content of the embedded clause.
• Respect or deference, sometimes known as ‘register’, is expressed in most languages by one or more special terms for English you. Familiarity is shown by tu in French, du in German, etc., equivalent to thee and thou in prayers. Respect can be shown by an avoidance of any direct referenc. English is generally thought to lack this entirely. But on the basis of an insightful observation by a seven year old, respect is expressed indirectly in English, by not referring to a third party by their relationship to the addressee. So to say “I just saw your colleague” implies equality or superiority. The terminology reflects the respect. Normally, children only start learning this aspect of language at around three.
• In all languages, negation provides the wherewithal to deny the truth of a proposition, as something that can be done in Austen’s sense. In English negation is mostly marked by not or its shortened form in don’t, as in “I do not drink” or “I don’t drink.”
• In English, as in most languages, there are properties which restrict the scope of reference, including animacy, humanity, gender, singularity or plurality. By a mechanism nowadays generally called ‘agreement’ (or previously ‘concord’), these properties can be copied or doubled. So in “There are two cups on the table”, the plural form of are copies the plurality of complement cups, just as the singular form of is copies the singularity of a cup in “There is a cup on the table.” And in “I am talking” and “We are talking”, the am and the are copy the singularity and plurality of I and we in the subject. The singularity or plurality is restated as an addition to the verb. This reduces the likelihood of mishearing or misunderstanding. Agreement in English is marked by one of three forms of the S sound, as a plain S in pats, as a Z in pads, and as a syllable in patches. English makes much less use of agreement than most of the languages of Western Europe, and is sometimes mistakenly regarded as an easy language to learn on this account. Many languages have more than one form of agreement in the same sentence.
• English is uncommonly poorly stocked with ways of distinguishing fact and certainty on the one hand and doubt and uncertainty on the other. I once heard a child say, “I think I might have misunderstoodended that.” This is sometimes known as ‘multiple marking’. In this case there are four markings, in the stood, the en, and in the two forms with D. Clearly, misunderstoodended encodes a degree of doubt or uncertainty. Most of the other languages of Western Europe encode doubt and uncertainty by what are known as ‘subjunctive’ forms of the verb. So misunderstoodended may have been an attempt to encode doubt or uncertainty more definitively than English allows.
Fit has greatly developed since its first evolution. It is unrecognisably more complex than by the schematic description here.
The uses which languages make of Fit vary from one language to another. Most languages copy words like where at the beginning of the sentence. Some languages like Japanese and at least most Chinese languages do not copy such words at all. Other languages, like Hindi, and the closely related Kurmanji, spoken by the Kurds in Northern Syria and Southern Turkey, copy words like where in a much more restricted way.
The apparatus by the Fit adaptation takes the normally developing child most of ten years to learn.
Some words are staunchly resistant to change. They group together, the personal pronouns, me and my, thee and thy, the numbers from one to ten, members of the blood family, father, mother, brother, sister, the limits on human existence and its guarantees, slay, die, sea, house, Others have to be conjured into existence to express new ideas, inventions, adaptations, imports. Sometimes the importer reveals a name from wherever the import is from, like coffee, tea, sugar, potato, tomato, avocado, tobacco. Sometimes a name is conjured up by putting two familiar names together like pineapple, stir fry, flat pack.
But another way is by manipulating the structure of existing words. This can be done in a number of ways. One is by extracting from some form its stressed vowel and everything else on the right of that, what is known as the ‘stress domain’, and doubling that on the left and adding an H as the initial consonant or what is known as the onset, as in hodge podge, hardy gurdy, helter skelter, higgledy piggledy. Another way is by doing the opposite and doubling two forms with only a match between the initial consonants as in trick or treat. A third way is by stringing together items differing only in the stressed vowel, where the first is always short I and the second is either short A or short O, as in chit hat, knick knack, riff raff, pitter patter, flip flop, tip top, hip hop. In a few cases the whole structure is just doubled as in chop chop.
In all of these cases, there is a sense of unseriousness. Such terms would not fit an innovation of high prestige.
There is a special vocabulary for this in words like lurk, amble, shamble, ramble, toddle, witter, babble, which denote a denigrating or contemptuous attitude towards whoever is lurking or wittering. Within this very narrow range, a word can be fitted to an attitude.
The advantage by Fit
Glue assembles a set of elements. Fit makes the structure work for a purpose. There are still traces in modern language of language without Fit, in questions by intonation alone, in “People eat seaweed?” Significantly, the rising intonation contour here is almost universal across languages. It would thus seem possible that speakers with Glue had found this way of asking questions before Fit evolved to provide a better way. But the old way was not abandoned.