Menu Close


What makes a speech sound or 'phoneme'

The speech sounds or ‘phonemes’ of an accent or dialect or variety are commonly thought to constitute what is known as the ‘phonemic inventory’, the smallest units keeping words apart, as, for example, by may and say, cale and whale, or may and my, say and saw, coal and code, ward and wad. These phonemes are from the history of English, as it has developed from the Proto-Indo-European, spoken around 6,000 years ago somewhere near the Black Sea, in modern Turkey or the Ukraine. The letters are what English inherits from Latin, as used by the Romans 2,000 years ago and a thousand years later by the bureaucrats brought over from Normandy by William the Conqueror to run his new state apparatus.

But in a way very relevant both to child speech and to the changes in speech over time, there are, in the framework here, smaller units of difference, known as ‘features‘.

In English there are often said to be 44 phonemes. Counting them is actually harder than it might seem. I am one of those speakers for whom due and do don’t rhyme. There are speakers like me in Britain and North America. On the simplest analysis, we have an extra vowel like the way Russians hear the beginning of the name Yury. But 44 is a reasonable count for most varieties, as listed in the inventory here.

Most linguists agree that it is useful to list the phonemes of a language. Children often seem to be missing one or more phonemes from the inventory. The number of phonemes matters because it’s  clearly a key point on the learner’s agenda, in the learnability space.

Across the world’s languages, the number of phonemes ranges from 11 to almost 200. Languages vary not just in the count of the phonemes, but in whether they cluster together or how far they combine features in their internal structure. The question is hugely significant for learners. English clusters its phonemes, as in strength. Other languages add complexity to the internal structure of their phonemes, as in the name of the West African language, Igbo, where the GB is a single phoneme, rather than two.

Learners have to work which way their target language goes. In the case of a language with the complex clustering of English, this is a significant issue. In the case of try, for example, with the T influencing the R and the R influencing the T by what is known as ‘coalescence’, it is easy for learners to misconstrue TR as a single phoneme.

Many developmental speech problems involve either missing one or more consonants from the inventory or saying one or more in some non-standard way. But some problems are more severe or with respect to the vowels as well.

Phonemes by features and derivation

In 1968 Chomsky and Halle published the Sound Pattern of English, giving prominence to a featural analysis of the sound structure of the language, with key aspects of this analysis by derivation.

in 1669, 300 years earlier, William Holder proposed a first vcrsion of this  featural and derivational approach, motivated mainly by speech pathology.

By the hypothesis here, all aspects of phonemic structure are by derivation from primitive features.


As noted by Alexander Melville Bell, in everyday spoken English “You should have done that” is often pronounced with two of the vowels unpronounced as “You SH D V done that.” There are words like wished, watched, and bridged, but these have a root in wish, watch, and bridge, and a form related to ED defining a time scale in the past. But no word in English contains the sequence SH D V.

The vowels in should and have can be left unpronounced because of their special status as ‘auxiliary verbs’ in traditional theories of grammar, supplanted by the framework here (largely due to the work of Chomsky).

The learner of English has to set aside the potentially misleading evidence of “You SH D V done that.”


Apart from the inventory, there is significant variation with respect to which phonemes can be ‘clustered’ together in strength, strange, and so on, by what is known as the phonotactics‘. The learner has to learn that clusters are clusters and not complex phonemes.


English has just one sort of complex phoneme, what are known as affricates, at the beginnings and ends of church and George. Affricates begin with a complete closure of the air-stream, and end with a mere obstruction. At the beginning of the syllable, affricates only occur on their own before the vowel.

Some children find affricates hard to hear or say. But the issue for such children may be by a misanalysis . They may be hearing these phonemes as two phonemes one after the other, in the case of chair, for instance, as a T followed by a SH.

Many languages have phonemes more complex than those of English, The learner has to learn that uncommon sequences and the effect of unpronouneced vowels are not instances of complexity.


In many languages, phonemes can be doubled in length, with the difference in length alone enough to change one word into another. This is the case in Arabic, Finnish, Cypriot Greek, and many other languages. But not in English. At least in modern English, there are no long consonants, although there may have been once upon a time, as suggested by the spelling of hammer and rudder.

There are instances in English where a phoneme is repeated at the end of one word or bit of a word and at the beginning of the next, as in non-native and soulless, with both native and soul existing independently as separate words. So long consonants can be pronounced, but not as part of a single word.

There are more potentially confusing miscues for the learner . Length can be deceptive. Most of the difference between hit and hid is signalled by the length of the vowel. The nominally short vowel in hid can be almost as long as the long vowel in heed. The identity of the consonant depends on the length of the vowel before it. Different tokens of one sound may vary enough from one context to another to count as different phonemes in one language and the same phoneme in another language.

Some children wrongly conclude that in English there are long consonants, with the effect that they say finger with a NN in the middle and the F replaced by a T or a D, with the effect that the word sounds like TINNA or DINNA, where the N is perceptibly doubled.

The differentiation here betwceen long and short consonants has to take place in the mind of the learner.