Menu Close


Syllables, clusters, long vowels, diphthongs

Phonotactics (from Ancient Greek) deals with with what a spoken language allows in terms of the structuring of consonants, vowels, and syllables. This is highly language-specific and thus forcibly within the learnability space. There is a spectrum here in what a language permits with English close to the permissive end. For any given language, its phonotactics defines which sounds can apear in which positions and combinations, in which syllables, in which sorts of words, in which sorts of use, consonants in what are known as ‘clusters’ before and after vowels, as in cry and elm and crisp, whether vowels are allowed to combine with one another in what are known as diphthongs, as in oh and why, whether either consonants or vowels or both occur only as singleton elements or alternating with long or doubled variants. The phonotactics is thus a branch of phonology.

What English does with its quite complex phonotactics – stringing sounds together, some West African languages do inside the sounds, allowing the same or similar actions in different parts of the vocal tract, as by the name of the African language, Igbo, with simultaneous closures by the lips and the back of the tongue. In English, by contrast, the G and B are sequenced in egg box, dog basket, and between syllables in Digby, Rugby, but crucially not simultaneously as in Igbo. English has many such sequences which could easily be misheard as single sounds. So in children’s occasional pronunciations of spaghetti as PSKETI, the sequence PSK may be miscontrued as a complex sound.

Languages vary widely in all of these respects – in:

  • Whether all syllables begin with a consonant, as in pie, lie, high, cow, or whether they allow syllables to begin with a vowel, as in eye and owe;
  • Whether no syllables have a consonant after the vowel or whether this is allowed, as in eight, ape and arm;
  • Whether consonants cluster, as in play, clay, pray, crew, spray, straw, screw;
  • Whether vowels diphthongalise, as in my, may, mow, bow, boy, with the tongue rising in the mouth as the articulation proceeds;
  • Whether and when consonants to double or ‘geminate’ as in between the adjacent N sounds in non-native and unknowing, where native and knowing are fine on their own unless they are turned into their opposites by non and un, neither of which count as words on their own;
  • Whether vowels lengthen contrastively, or whether as in English knee, true, yaw, this lengthening is also marked by an increased ‘tension’ in the tongue’ or whether these sounds would be better considered as ‘double vowels’ or ‘geminates’;
  • Whether consonants and parts of consonants combine as in the first and last sounds of church and judge, beginning with a complete closure and ending in an only partial closure in what is known as ‘affricates’;
  • Whether and how far sounds like the first and last sounds of church and judge are allowed to combine with other sounds, as in squelch and hinge;
  • Whether the R in the spelling of her, fir, fur, should be considered as part of the vowel in varieties like Home Counties English where is goes unpronounced;
  • Whether there are syllables like the L in little and middle or the TION in station which are always stressed in the same way – unstressed in these cases – and pronounced with the same speech sounds or ‘phonemes’;
  • Which sounds are allowed to form the nuclei of syllables, in English all vowels and N, L. M, and in some varieties of English R,  in button, little, prism and butter.

English is uncommonly permissive in most, but not all. of these respects, allowing consonants to stack up before or after the vowel or ‘nucleus’, with consonants like L, R closer to the nucleus than consonants like T, P, and K, but only one consonsonant, S, before two other consonants, TR, PR, PL, and KR in stray, spray, splay, and screw, up to three elements after the vowel – in glimpse, next and length.

But on many of these points, the evidence of the spoken language may be not clear and obvious to the first language learner. English contrasts tipship and chip. So chip could have an initial T SH cluster. If that was the correct analysis TSIP, beginning like tsunami, pronounced with a T and S, would also be predicted as a possible word, complicating the phonotactics.

What the learnability space has to allow

The words, strengthstrange, scrounge and change would not count as possible words in many languages. Such languages would disallow the STR and SCR and NG combinations and the final TH in strength, the way the GE is preceded by an N and the fact that the vowel begins and ends with the tongue in different positions in the mouth in strange, and scrounge and the CH and GE which use different airstreams at the beginning and the end of the sound in change.

This complexity is entirely limited to the ‘content’, or ‘encyclopedic’, or ‘lexical’ words, the nouns, verbs, adjectives, prepositions, and adverbs, like death, die, dead, in and sadly. ‘Functors’, like the, a, and the S in hits and the T in slept are all built more simply, as by these examples. One variation, between the pronunciations of TH in functional this and the, and in lexical think and thought, is entirely defined on the contrasting categorisations – or places within the ‘spine’, by the proposal here.

The importance of definitions

The terms and the ordering of the definitions are not obvious and thus a significant issue for learners. All languages display a pulsing in speech. But languages vary in what the pulses consist in and whether the pulsing is just in speech or in both speech and the way entries are stored in the lexicon. Because of this wide variation, by one widely accepted model the core of the pulsing is by what are known as the ‘nuclei’ of syllables. In most languages, the nucleus of a syllable is a vowel. But in languages like English, the syllabic core, the nucleus can also be what is known as a ‘sonorant’, a consonant with a high level of resonance – L, R, N or M, as in little, letter, button or bottom. The ancient languages of North Africa are even more permissivc allowing any phoneme to be represented in the lexicon and spoken as syllablic nuclei. The semitic languages spoken in the area around the Eastern Mediterranean, including Arabic, Hebrew, and Amharic, allow entries to be stored in the lexicon by consonants alone, with the vowels inserted in the course of speaking. The languages of the Caucasus such as Georgian allow very long strings of consonants at the beginnings of syllables, like MTSVANE in the name of a well known wine. It may be that the M and the TS are better considered as syllaibic nuclei without standalone vowels. Russian, in long term contact with the languages of the Caucasus, allows some of these things in particular, isolated words. It may also be the case some apparent complexities can be reduced by limiting the building of words to quite narrowly defined derivational steps. It is an open question whether the system here is defined on timing or on the phonemic content. The human learnability system has to be such that all of these systems are learnable as part of the core of any given language.

The child learning English has to resolve the issues here on the basis of evidence which is not uniformly clear.

Increasing or decreasing the complexity

On some accounts, in cases such as twelfths, there are plainly four consonants after the vowel. But the final S is a plural and the TH is arguably a derivation from a root form as twelve.

The complexity is reduced if syllables are built, or derived, in stages, with S in string, and the T in next both built after the rest of the structure, significantly both differing with respect to what the Sound Pattern of English characterises as ‘Continuance’ Modern English disallows GN and KN at the beginning of a syllable, but these were perviously allowed as shown by the spellings of knot and gnome, as they are still allowed in German and Dutch.

The case of what the child does not hear – negative evidence

By what I am categorising here as ‘taxonomic’ theories, there are constraints which disallow consonant doublings and all the phenomena listed here which are never exampled in the learner’s experience. In all probability, doubling was allowed in historic forms of English, as suggested by the spellings of hammer and rudder.

But to some, including me, constraint systems are unlearnable in principle (despite the vigorous protestations of constraint theorists), and learnability can only be stated with respect to what is allowed, not what is disallowed. On such reasoning, it does not need to be specified in the grammar of Southern British English pronunciations that there are no root forms beginning with the sound at the end of ring and song. Rather the learner is forced to conclude, by the complete absence of any root forms beginning with the NG sound, that where these forms do occur they have to be derived from two elements, one defining the airflow through the nose, the second defining the action of the back of the tongue and the soft palate, with a single sound in the final pronunciation.

By contrast, in Northern British English varieties, there are many different configurations with the G pronounced in some varieties, and not pronounced in other varieties,

By the proposal here, the simplest way of representing this complexity is by complete abstraction, as by the universal spine, as postulated by Martina Wiltschko (2014).