Menu Close


Fundamental elements of contrast

By the proposal here, the fundamental category underlying human language is the contrast between one linguistic element and another.  

Features distinguish linguistic elements nouns from verbs, consonants from vowels, and so on.

There was a precursor of a feature in the making of a stone tool. A suitable sort of stone had to be chosen and chipped with another stone to form a sharp edge, with facets at angles to one another. Sharpness was a feature in the mind of the first human ancestor to see the utility of such a thing.

The first step  towards speech and language was to apply this human-specific cognition to at least three aspects of the relation between physical acoustics and meaning – speech and language – in the sound system, in the organisation of the vocabulary, and in the ways words are put together into sentences .

In the syntax of putting words together

One sort of feature is obvious, such as those involving:

  • Singularity and plurality, differentiating house from houses;
  • Animateness as opposed to inanimateness, allowing entities to be referred to as he or she, or not;
  • Gender or sex, differentiating he from she;
  • Person, differentiating I, you, he and she.

Another much less obvious sort of feature, assumed in the framework here, but disputed by the proponents of other frameworks, involves the notion traditionally described in terms of subjecthood, as in the ‘subject’ of a sentence, as in he in “He likes her.” But despite the central importance of such features in current linguistic theory, they seem marginal to the purposes here.

In the vocabulary

There is a feature of disrespect in any reference to a person kipping, ambling, whittering, lurking, etc.. Much more obviously, in most languages, but not English, there are two or more ways of saying you, according to whether there is implicit respect for the addressee.

In the sound system

‘Inside’ the sounds of speech, or ‘phonemes’, there are the properties known as ‘distinctive features’ which distinguish toe from doe by a variation in the relative timings of two things, the release of a closure in the mouth and a change in the separation of the vocal cords allowing them to vibrate spontaneously against one another. This variation involves one feature known as ‘voicing’ and another known as ‘aspiration’. Voicing involves the bringing together of the vocal folds to vibrate against one another very soon after the release of the closure, as in D, G, and B. A significantly greater delay is known as voicelessness, as in T, K, and P. Aspiration is by an increase in the delay where the phoneme occurs on its own at the beginning of the syllable, as in tea, key, and pea. All the other contrasts between phonemes can be defined in terms of a small number of other features.

Phonemes differ by where they are articulated in the mouth, how they are articulated and resonated, what else is happening in the vocal tract, and when, and how they fit into syllables.


There are two mutually irreconcilable theories of what is going on here. By one , essentially taxonomic, theory proposed by John Wallis (1653), phonemes can be CLASSIFIED by their properties. By the other theory, proposed by William Holder (1669), the phonemes are DERIVED from their properties, in other words going back to their origins.

300 years on, the issue still haunts research in both linguistics and speech and language pathology. Only now, the issue from a speech pathology perspective is mainly cast in terms of the clinical utility and relevance of the generative approach to linguistics associated with Noam Chomsky. Proponents of the generative approach (like me) mostly assume a derivational model, with the phonemes derived from features. Those opposed to this approach tend to insist on the centrality of the phoneme, seeing features as just the necessary properties of classification.  The rather subtle difference here is no minor quibble.


It is obvious, uncontroversial and incontestable that speech has to be articulated, and perceived as distinct, and the signs of a signed language have to be formed in what generative linguists call the ‘externalisation’, as an irreducibly necessary part of the system

For speech, this involves gestures at various points in the vocal tract at which it can be narrowed or widened. So the vocal tract is like a wind instrument with the interesting property that the bore can be varied along its length, not infinitely, but more than in any instrument of the orchestra.


For our purposes here, the following features (broadly, but not entirely) from Chomsky and Halle (1968) are enough to keep the phonemes of English apart from one another:

For vowels:

  • Where the tongue is in the mouth – up at the top, down at the bottom, or in the middle, towards the front of the mouth or at the back;
  • Whether the tongue is tensed and as close as possible to the edge of this space;
  • Whether the vowel is long, usually, but not always, as a character of tenseness;
  • Whether the tongue moves from one position to another, as in the case of the diphthongs in high and how,
  • Whether the lips are rounded as in rue and raw, or not as in hay and high;
  • In a way that Chomsky and Halle could not reduce to first principles because their framework did not refer to the syllable, whether the segment is classified as a vowel, and thus potentially a syllabic nucleus.

For consonants:

  • Whether the segment is consonantal, with no intrinsic syllabic role;
  • Whether, most fundamentally, the role of the sound in the syllable is as its ‘nucleus’, typically the vowel, or whether it is a consonant;
  • Whether the sound is characteristically part of the left edge of a vowel like Y and W in you, tune, why and twice, known as glides or semi-vowels;
  • Whether the sound is invariably next to the vowel in clusters like the L and R in splash and spray, traditionally known as liquids;
  • The continuance of the airstream (distinguishing T from S) – whether the airstream is continuous or not, where sounds like T are generally characterised as ‘stops’ because of the totality of the closure, and sounds like S are generally characterised as ‘fricatives’ because of the  air friction from a partial closure;
  • The place of any constriction – whether the airstream is ‘stopped’ or ‘bottle-necked’ at the lips, or with the tip of the tongue, or the back of the tongue (distinguishing T from P and K);
  • Whether the airstream is initially stopped and then just partially released as in the cases of the initial sounds in chore and jaw;
  • The relative timing of any involvement of the vocal folds (distinguishing ‘voiceless’ or ‘unvoiced’ T from ‘voiced’ D, P from B, S from Z, CH from J);
  • In the cases of voiceless stops, whether the delay in the voicing is increased by what is known as ‘aspiration’, as in pie, tie and cow in English, but not where the stop follows S, as in spy, sty, and scow;
  • Whether the airstream passes through the nose (distinguishing N from D);
  • Whether the main effect is to constrict the airway or to resonate, with this resonance, or what is known as ‘sonority’, characteristic of L, R, N, M, W, Y);
  • In the case of fricatives, whether the ‘noise’ falls below a given frequency, as it does with TH (distinguishing TH from S, F, and SH);
  • In the cases of S and SH (both with the tongue completely inside the mouth) S, unlike SH, makes the constriction with the tip or apex of the tongue.

But the learner has no privileged information about his or her target language, with its phonemes exclusively defined in this way,

From Holder’t time until the present, it has been assumed that the system should apply to all human languages. Although most phonemes in most languages fit the Holder, Chomsky, Halle schema, a more complete schema represents an ongoing challenge. Halle (1995) reversioned the feature set with this consideration in mind . But one of the greatest challenges is represented by phonemes characteristic of many African languages and some Asian languages in which the airstream is simultaneously stopped at two points. Although this does not happen in English, such an articulation is sometimes heard from children with no obvious contact with any relevant language when they try to say the word monopoly with a double articulation instead of the P – as MONOKPOLI. Various analyses are possible. But the mere fact that such forms are heard is robust evidence for a universal analysis of the feature system and for applying such a system in speech pathology. The fact that no such system currently exists should not, in my view, be an impediment.

Resonance and openings

In the case of words like me and no, beginning with what are known as ‘nasal stops, the first articulation is to open a flap known as the ‘velopharyjngeal sphinter’ at the back of the mouth and close the mouth – with the lips for M and with the tongue tip for N – allowing the airstream to resonate in the open chamber of the nose and the closed chamber of the mouth as the tongue and the vocal chords are positioned for the vowel.

The sphincter can be felt by running the thumb backwards along the roof of the mouth until it becomes soft and squidgy. The sphincter is effectively a valve.

As an English vowel is articulated, the sphincter is closed. In words like me and no, the sphincter is not completely closed when the vowel articulation starts. In words like own and aim, this sequence is reversed, with the sphincter starting to open while the vowels are being ariculated. In words like name and main and gnomeand moan, the sphincter barely closes between the articulation of the iniitial and final consonants. The two vowels are heavily nasalised as a result, though English speakers hardly notice this. In none of these cases is there are clear point at which the consonant ends and the vowel begins, or the other way round.

In the cases of buy and die, there are the same openings and closures within the mouth, at the lips for B and with the tongue tip for D, but without any opening of the sphincter, So B and D are known as ‘oral stops’. In pie and tie, the closure is released momentariily before the vocal chords are brought together to vibrate against one another. There is a slight puff of air in P and T which does not happen with B and D. This difference can be easily detected with a match or a candle. On account of this difference, B and D are known as voiced, and P and T as voiceless.


Thus the gestures of speech involve both the ‘area function’ of the vocal tract and their timings in relation to one another. Both are relative – within limits. Across Europe there are consistent differences between voicelessness and voicing. But the split varies from North to South. In Ireland and North west England there is a longer pause before the beginning of the voicing than in Southern England. And in Southern England there is a longer pause than in Northern France. And in Northen France the pause is longer than in Southern France. And for each expression of voicelessness, there is a corresponding expression of voicing. The differences are such that they can be misconceived as categorial. The P in Parisian Paris is almost like the B in London Barry.

There is a different sort of variation with respect to the action of the tongue tip. Russian contrasts phonemes equivalent to T and D in Tomsk and Don, but with the tongue tip contacting the roof of the mouth significantly further forward than in Western European languages.

On a narrow featural account of the inventory such specificities have to be represented as one aspect of what Marlys Macken (1995) called the ‘learnabiulity space’.

A problem

The original evidence for features was from changes in pronunciation was first pointed out by William Holder in 1669 mainly with reference to the speech of one child, then by the Danish linguist, Rasmus Rask, with reference to changes in the pronunciation of European languages over hundreds and thousands of years, and then developed and popularised by the German Jakob Grimm of fairy tale fame.

Thus it was noted that the TH in English father was originally a T as in German vater, Latin pater, Greek pateras. The R in Portuguese obrigado (thank you) and branco (white) were originally L as in English obliged and blank. All of these changes were by single features.

But how does this happen? These seemingly categorial changes from one phoneme to another, changing the value of one feature, would at the very least be commented on, if they didn’t lead to outright misunderstandings in a speech community. The problem is that there is no evidence of this happening, not in the historical records, and not by careful and detailed observation of such changes where these are demonstrably happening in the modern world.

William Labov (1994, 2001) describes some comparable vowel changes happening right now in the USA, using tape-recording and very large amounts of data which could only be processed computationally. As Labov shows, the variation is by less than a whole feature. The change from one vowel to another takes place over four generations with nobody noticing the subtle changes in the speech spreading through the population.

There are various theories about how this might happen. By one, the phonetics is scalar and non-categorial. But this entails that the child learner has to be listening out for two different sorts of things, one scalar, one categorial.

The solution, I believe, is by an extended notion of what Chomsky and others call ‘Merge’, applying this to the features, so that categories can be ‘built’ from features, but in language-specific or perhaps more accurately dialect-specific or even idiolect-specific  ways. I sketch this out in my proposal  here.


By the alternative, taxonomic model, the consonants are grouped into three ‘systems’, involving:

  • Place of articulation, with at least these possibilities exploited in English, the two lips, the upper lip and the lower teeth, the tongue between the lips, the tongue against the flesh ridge behind the upper teeth, the tongue against a broader area slightly further back, the back of tongue against the back edge of the soft palate, known as the ‘velum’
  • Manner of articulation, differentiating stops with a complete blockage of the airstream from ‘fricatives’ with an almost complete blockage, differentiating both of these from ‘affricates’ as in church and judge, starting with the complete blockage and ending in a partial blockage at the same point in the tract. differentiating ‘nasals’ with the airstream passing through the nose, M, N and the sound at the end of sing and ring, differentiating ‘lateral’ L with the airstream passing around both sides of the tongue and R with the tongue curled or grooved, the glides or semivowels, Y and W, always just before a vowel in English (despite the spelling of how and toy, residues of an earlier English);
  • Voicing, differentiating stops and fricatives according to whether the vocal folds are allowed to vibrate during the blockage or very soon after it, or not.

There are, I believe, many reasons for rejecting this model. Three of the strongest are as follows:

  • Models with at most two values for every feature correspond to the basic mechanism of the nervous system which allows only activation or non-activation;
  • The taxonomic model does not illuminate the cross-linguistically typical situation where just three or four cases contrast with one another, as in English, and falsely predicts systems contrasting any number of places of articulation;
  • The taxonomic model makes it hard to explain what is going on when a phoneme shifts from one category or part of a system to another, as in the case of R which seems to be shifting to a glide in those varieties of English which do not allow it to occur after the vowel at the end of a syllable.

Accurate repetition

Whatever the theoretical account of changes in speech over single lifetimes, the normal acquisition of a particular accent at a particular point in time is extremely accurate, with only the slightest deviations noticed and remarked on, usually negatively.

Do you have an enquiry?