With different sorts of beat

English (and perhaps half of the world’s languages) use a combination of pitch, length and loudness to create rhythms inside what we think of as words, as well as between words. In music, there is both melody and rhythm. In speech, there are speech sounds or phonemes or segments.  And there are rhythms. But the rhythms of speech and music are quite different from one another.

The contrast in speech is often characterised in terms of ‘metricality’ – because it is measured in the construction of poetic metre – as opposed to the ‘segmentality’ of the sequence of sounds or segments. This is part of every word in English. The beats in speech are largely defined on what are known as ‘feet’ – from a terminology from Ancient Greece – recognising the way the foot bends and is at a right angle to the rest of the leg.

The functionality seems to help keep words apart in what is acoustically a continuous string of sound.


Inside English words there is a rhythm of mostly alternating levels of stress, with a foot comprising a stressed syllable and and an unstressed one, as in canon, rather than the other way round as in canoe (from a native Caribbean term for a small boat). In a way that some find hard to believe, in English, as in classical Latin and many other languages spoken today, the computation of word stress is from right to left. So in banana, scanning from the right, a foot is found in the long AH vowel and the final syllable, with the stress falling on the long vowel. The initial vowel gets no stress because there is no foot for it to align with. In photographic, photography and photograph, the stress shifts according to the structure on the right, with no such structure in photograph, with a single right-edge vowel, in photography, and to the penultimate graph syllable in photographic because of a special property of words ending in –ic, defining the word as an adjective. In many words of three syllables or more, the final vowel is discounted, and the penultimate  vowel is stressed. Proceeding leftwards, pairs of syllables are stressed the same way, but with the pair on the right getting the highest level or primary stress. This gives the patterns in hippopotamus and Austria, Australia, and Amazonia. Trying to compute English stress from left to right would be absurdly more complex.

The structure here is mostly known variably as ‘prosodic’ or ‘metrical or ‘suprasegmental’ (by different theories). Feet and parts of feet combine to form what we know as words. Words combine to form phrases. Phrases combine to form sentences.

So the child learning English has to listen not just to the order of the words and sounds, but also to the rhythm. In the word, diplodocus, for example the main stress can fall on either of the O vowels, either with the first said like the O in cod, or with the last said like the OH in focus.

The principle governing English word stress affects a number of word sets which children are starting to hear in conversation not necessarily addressed to them, but in their presence, about matters of family concern – like Italy and Italianphotograph and photographymusic and musician, where the relation between the meanings is obvious, but the stress varies.

The mechanism here has been a major topic of scholarly debate ever since Chomsky and Halle’s pioneering SPE in 1968, showing that stress is not an accidental property of individual words, but assigned by general principles which have to be learnt.  By these principles the stress  in diplodocus can go two ways, but not on either the first or the last syllable. Inside words, there are two stresses in hippopotamus, one on the HIP and a stronger one on the POT.

The evidence of foreign words – ‘loans’

There is strong evidence for the claim that stress is not an accidental property from the way foreign words are treated. The Russian place-names, Vladivostok and Borodino, the second the site of the famous battle commemorated in Tchaikovsky’s 1812 Overture, both have final stress in Russian. When these Russian names are pronounced by English speakers, the stress is almost always shifted one syllable to the left and the first syllable gets a secondary stress. Exceptions are made only for revered foreign celebrities who are allowed to have their names pronounced as they do themselves – if they are revered enough.

If stress did not work this way, English speakers might be confused about how to say Austria, America, Amazonia. In every one of these cases, the stress falls on the third last syllable. The working out is known as ‘scansion’. In French the scansion is simple: Stress the rightmost syllable, disregarding any final vowel with no specific character, what is known as ‘schwa’, always spelt with an E, as in the name of the country La France, and in the name of the anthem, La Marseillaise. The modern English system happens to be uncommonly complex.

Where does English get its rhythm from?

Across the languages of Europe, the English system is close to the one in classical Latin. How might this have happened? A coincidence? I offer a possible, but speculative account of how  in The tragedy of English spelling.

Between words, in the most famous quotation in the language “To be or not to be, that is the question” the two cases of to, is and the may be all unstressed, creating a simple rhythm of mostly alternating stressed and unstressed words. The system at play here has been exploited by poets writing in metre, from the times of Chaucer and Shakespeare to the present day, even including ChatGPT.

Tone and intonation

Separately from stress, there is tone and intonation. Tone is used in more than half of the world’s languages, in the Americas, Africa, and East and South East Asia, including China, where the tone of a word or part of a word is crucial for meaning.

Stress in children’s speech

Children hear words and espressions like animal, excavator, helicopter, kindergarten, hippopotamus, Tottenham Hotspurs. And they want the words for whatever interests them. But as discovered by Paula Fikkert (1994), children learning a language like English normally start learning the stress pattern inside words around two and a quarter.

The child who calls a banana a NANA or a BANA is scanning the word correctly from right to left, assigning the stress correctly, but effectively treating the domain of the stress as the same thing as the word, and missing the vowel nucleus of the unstressed syllable on the left.

In brief

The rhythm of words is as fundamental as the sounds within them.

