Possible structures

Taken together, speech and language constitute the most complex system known to science. And yet they are effortlessly acquired by the overwhelming majority of children with no necessary help in about ten years. There is effectively a normal destiny. The study of this as a totality is instructive for those children who encounter problems along this road.

Sounds, syllables, feet, words

Speech and language are built as structures. Most of the terminology for describing this we get from studies in Ancient Greece and Rome. Parts of the syllable are built from consonants and vowels. Consonants and vowels are built from features. Parts of syllables combine with each other. Syllables combine with each other to form structures known as ‘feet’. The ‘head’ or ‘strong branch’ is on one side in canoe and on the opposite side in canon. (The metaphors are jumbled, but that may be the natural fate of ideas which have been churned around for over two thousand years.) This structure is often known as the ‘prosody’ or ‘metricality’ of a language – using the rhythm of stress or pitch or ‘tone’ to make one word different from another. Feet and parts of feet combine to form words. Words combine to form phrases. Phrases combine to form sentences. All languages seem to define their grammar on phrases rather than individual words. So in English we say “the rightful head of the commonwealth’s responsibilities” with the ‘S at the end of the phrase the rightful head of the commonwealth rather than after the word head. But the order of the words varies within phrases. So the position of words equivalent to rightful varies across languages, not in all logically possible ways, but widely. And although all languages have deeply complex structure, the complexities vary from one language to another. These variations are with respect to words and phrases, the pronunciations of speech sounds, and more. They all have to be learnt. They lead to standards which are important to our sense of cultural identity, as by the story in the Bible about being able to say shibboleth – or not – with fatal consequences for those unable to pass the test in that particular case. So we are intensely aware of these standards, and highly sensitive to any departures from what we perceive as ‘our’ ways of speech and language – what is sometimes said to be ‘correct’ or ‘normal’ or ‘standard’.

English just happens to have an uncommon degree of complexity in the metricality, the syllable structure, and the inventory of phonemes. English has relatively large numbers of consonants and vowels, and relatively free combinations of both vowels and of consonants before and after the vowel. So there are many more possible syllables (and thus more to learn on this point) than there are in most languages.

The syllable

Many languages are much more restrictive than English with respect to what sorts of syllables are allowed. The following restrictions are common across languages:

• All syllables begin with a consonant, and so words like arm are disallowed because of not having a consonant before the vowel;

• No consonants are allowed after the vowel, so words like arm are disallowed because of the final M;

• All syllables consist of just one consonant followed by one vowel, as in tea, bee, knee, Betty, canopy, what are often known as CV syllables.

Despite what might seem to be extensive evidence on this point many, perhaps most, children have a difficulty here, at least at first. By the proposal here concerning the action of what I am referring to as Glue, syllables with just one consonant and just one vowel are what they are not because of that formal simplicity, but because this is the simplest way of achieving the greatest contrast with respect to resonance or ‘sonority’.

Speech sounds or phonemes

All languages have sounds or ‘phonemes’. In English there are often said to be 44 of these, although the exact number is a matter of argument. But across the world’s languages, the number ranges from 11 to almost 200. Although no child presumably attempts to count the phonemes, children become acutely aware of what their target language includes or doesn’t include, in its ‘phonemic inventory’. Some languages – not English – have only a simple inventory of simple sounds

Sound combinations or phonotactics

Apart from the inventory, there is significant variation with respect to which phonemes can be ‘clustered’ together in what is known as the ‘phonotactics’. Some languages, particularly the Slavonic languages, allow complex combinations of consonants one after the other, known as ‘clusters’. Other languages have complex sounds like both G and B together and at the same time, as in the name of the West African language Igbo. In English by contrast the G and B in egg box and dog basket are at the end of one word and at the beginning of the next. The human system which makes it possible to learn speech and language in ten years or so can reliably distinguish between the two cases, similar-sounding though they may be.

English allows some fairly complex clusters it allows, though these are less complex than some of those allowed by some Slavonic languages.

The words, strength, strange, and change would not count as possible words in many languages. Such languages would disallow the STR and NG combinations and the final TH in strength, the way the GE is preceded by an N and the fact that the vowel begins and ends with the tongue in different positions in the mouth in strange, and the CH and GE which use different airstreams at the beginning and the end of the sound in change

The complexities of English phonotactics are obviously something which children learning English have to learn.


English has just one sort of complex phoneme, what are known as affricates, at the beginnings and ends of church and George. Affricates begin with a complete closure of the air-stream, and end with an almost complete closure. At the beginning of the syllable, affricates only occur on their own before the vowel.

Some children find affricates hard to hear or say. But the issue for such children may be by a misanalysis . They may be hearing these phonemes as two phonemes one after the other, in the case of chair, for instance, as a T followed by a SH.

Syllables without built-in vowels

English also allows that an unstressed syllable after a stressed syllable may have no vowel, but just L or N, as in little, middle, wiggle, bottle and button which children hear early and often. The L was problematic for many, perhaps most, children who go through a stage of saying little and middle with the L as something like OO and the T of little as K and the D of middle as G. Somehow the L sound at the end disrupts the previous sound, no matter whether it is T or D. But is this by mishearing the words? Or by getting the tongue in a wrong position for one or more of the sounds? Various analyses have been proposed. 

Such syllables without a ‘built in’ vowel are quite unusual across the world’s languages. In most languages, all syllables have a built in vowel. The fact that this is not so for English, that there are what is known as ‘syllabic consonants’, is plainly something which children learning English have to learn. And for most children learning English, this point seems to be quite hard. So many, perhaps most, children learning English go through a stage, often for two years or more, of saying little as LICKOO, middle as MIGGOO, and so on, but revealingly, not tickle as TITTOO or toggle as TODDOO, with the opposite relations between T and K and between D and G. This is an example of asymmetry in children’s errors.

Variations of length

In many languages, phonemes can be doubled in length, with the difference in length alone enough to change one word into another. This is the case in Arabic, Finnish, Cypriot Greek, Classical Latin and Greek, and many other languages. But not in English.

In a way which might seem contradictory, there are instances where a phoneme is repeated at the end of one word or bit of a word and at the beginning of the next, as in non-native and soulless. But these are not single words.

But there are more potentially confusing miscues for the learner of English. Length can be deceptive. Most of the difference between hit and hid is signalled by the length of the vowel. The nominally short vowel in hid can be almost as long as the long vowel in heed. The identity of the consonant depends on the length of the vowel before it. Different tokens of one sound, which may vary enough from one context to the next to count as different sounds in one language may need to go into the same ‘box’ in another language. The sorting into boxes has to take place in the mind of the learner.

At least in modern English, there are no long consonants, although there may have been once upon a time, as suggested by the spelling of hammer and rudder. And some children wrongly conclude that in English there are long consonants, with the effect that they say finger with a NN in the middle. In all the cases that I have recorded, the F is also replaced by a T or a D, with the effect that the word sounds like TINNA or DINNA, where the N is perceptibly doubled.

Feet and metricality

English metricality (and that of most of the other languages of Western Europe and perhaps half of the world’s languages) is entirely by word stress. This uses a combination of pitch, length and loudness which give a word a rhythm. In English, as in most European languages other than Scottish and Irish Gaelic, stress is worked out from right to left. This working out is known as ‘scansion’. In French the scansion is simple: Stress the rightmost syllable, disregarding any final vowel with no specific character, what is known as ‘schwa’, always spelt with an E, as in the name of the country La France, and in the name of the anthem, La Marseillaise. In English, the system is much more complex and was only worked out in detail by Noam Chomsky and Morris Halle in 1968.

This is in contrast to the metricality in more than half of the world’s languages, in the Americas, Africa, and East and South East Asia, including China, where the tone of a word is crucial for its meaning.

So the task for the learner of English is first to determine that the metricality is by stress, and that the very obvious variations of pitch and tone do not differentiate words, but mark all sorts of ‘doing‘ effects, points of emphasis, the difference between questions and statements, and a whole lot more. And second, English learners have to work out the exact mechanism, which has been a major topic of scholarly debate ever since the 1968 work of Chomsky and Halle.

Children get a lot of guidance about being polite. And a lot of attention is paid to how they articulate the phonemes. But there is little consideration of children’s metricality.

There is strong evidence in English of a powerful principle at work in the way many other foreign words are treated. The Russian place-names, Vladivostok and Borodino, the second the site of the famous battle commemorated in Tchaikovsky’s 1812 Overture, both have final stress in Russian. When these Russian names are pronounced by English speakers, the stress is almost always shifted one syllable to the left and the first syllable gets a secondary stress.

By this right to left scansion, words like Austria, photograph, photographer, and Australia have the primary stress on the third syllable from the right, and have stressed and unstressed syllables alternating with one another, discounting the final syllable, not quite the same way as in French, but counting the same elements in the same direction.

Possible words

But taking account of all the relevant factors, there are still many more possible words than there are actual words.

There is a crucial idea from some 1958 work by Jean Berko Gleason which showed that children normally start operating on the basis of a notion of possible words between four and seven. But one of the things which emerged from my PhD research in 2002 was that there is a huge difference in the awareness of the relations between real worlds and possible words in children with speech problems. This is a significant co-morbidity.

Despite the large number of possible syllables in English, many either occur only in one syllable words, or are rare and unlikely to be encountered by children other than in stressed syllables or syllables immediately following the stressed syllable. 

English speech sounds are distributed quite unevenly. Most unstressed syllables are simple. So spaghetti may be the only word likely to be heard and known by children with a syllable beginning with SP before the stressed syllable. This uneven distribution is easily misconstrued by children. They may, for instance, assume that if a particular sort of structure sounds wrong, it is wrong, when actually it isn’t wrong at all, but just unusual. Some errors in child speech may be the result of this sort of misanalysis.

Many children spaghetti as BASKETI. But a few say it as PSKETI. They may be assuming either that English allows a PSK cluster at the beginning of the word or that this is a complex phoneme. Of course, such mistaken assumptions are not conscious, but strictly unconscious. Either of them, unfortunately, leads to a mispronunciation, in the case of PSKETI, not a possible word in English.

Analysis, construal, decision

On all of these points, there is the possibility of the child learner either misconstruing or misanalysing the evidence of what he or she hears, or deciding that the evidence is not clear enough for a firm decision. The particular combination of complexities in English metricality, syllable structure, and the phonemic inventory, is especially problematic for some children. Some of these complexities are beyond the scope of any plausible instructions at the time children are normally working on them in their minds. The speech won’t come out right unless the analysis on all of these points is correct.

Knick knack, paddy whack

In various cultures, traditional songs often have repeated refrains, sometimes consisting largely or entirely of nonsense words, not making sense from one phrase to the next, for example: “Knick, knack, paddy whack, give a dog a bone, this old man comes rolling home” or “La di da da” or the Beatles’ “Yeah, yeah, yeah.”

Refrains can be very helpful for child-learners of a language, epitomising in one or more ways its fundamental structures. For instance, knick knack exemplifies one common pattern of doubling or ‘reduplication’ in English where the only variation is with respect to the vowel going from high to low in terms of where in the mouth it is articulated. Paddy whack recapitulates the low vowel, but contrasts the simplest metrical foot in paddy with the plain syllable in whack, tellingly closed with a consonant. And in Give a dog a bone, this old man comes rolling home, the same diphthong is repeated four times.

With some musical invention, these principles can be exploited in work with children who have problems with the sound structures of English.