Menu Close
VocalTractBugle

The human instrument

And adapting it for speech

The apparatus we use for speech evolved for the quite different purposes of breathing and feeding and for doing both at the same time without getting food into the lungs. All living humans have the same apparatus. It’s what there is. For speech, we use the apparatus a bit like a wind instrument with one resonator in the mouth and another in the nose. But there are hugely important differences.

The adaptation for speech uses the apparatus about 100 times faster than using it to feed. For speech at a normal tempo between two and three syllables per second, the actions of the 60 or so muscles in the vocal tract are at the speed of virtuosic music making. Whereas in feeding both sides of the mouth are used, in speech the aistream is symmetrical and in one or two straight lines, mostly one. These different sorts of use are not just matters tempo and symmetry. They are different uses.

The apparatus can be manipulated in ways which we start learning even before birth  These manipulations vary across languages. The engineering and the way humans learn to use it are extraordinary. But it is easy to exaggerate the role of the two most visible elements – the tongue and the lips.

In all cases, the entire apparatus here has to interface with a cognitive system in order for communication to occur.

There is a useful comparison with the closest man-made equivalent, a wind instrument – with the variation from one note to the next by

  • The airflow;
  • The length of the column of vibrating air;
  • The contact with the strings in a string instrument or the ’embouchure’ or the action of the lips in a wind instrument.

Musical notes are almost universally organised in scales of unequal values, mostly counted as eight in the West or, by a different way of counting, as five in much of the East. And in music there are values of timing in the rhythm.

But in speech the pitches and rhythms that we perceive are (mostly) relative to one another, not absolute. The absolutes that we perceive are the sounds of speech, the phonemes. Speech adjusts the rhythm and the pitch, but a whole lot more as well:

  • The pitch by raising or lowering the structure containing the vocal cords, known as the ‘larynx’, and by adjusting the tension of the cords;
  • The shape of the column by squeezing it slightly or completely closing it at any of a number of points;
  • The size of the resonator from just the space in the mouth to this and the internal space of the nose by squeezing or relaxing a ring of muscle around the soft palate;
  • The timing of any vocal cord action by bringing the cords to almost touch to vibrate against one another like the fluttering of a flag.

We think of vowels and consonants. But we adjust the muscles according to the sounds we want – as though there were keys for each sound.

For both singing and speech by untrained voices,  the system has a range of around an octave, with the vocal cords vibrating twice as fast at the top as at the bottom of the range.

The working of the apparatus has to be co-ordinated with the action of the lungs, and adjusted, taking account of the feedback from what is heard, what can be felt, and what can be sensed of the positionings. There are three sorts of feedback: by

  • Hearing;
  • The sense of where the tongue and the lips are in relation to the vocal tract, by what is known as ‘proprioception’;
  • The sense of what is touching what.

The working of the apparatus has to be co-ordinated with the action of the lungs, and adjusted, taking account of the feedback – allowing fractional adjustments to be made.

The musculature

A muscle is activated from the brain by a signal passing along a nerve. The greater the distance, the longer this takes. Long nerves have to be initiated before short ones. Almost all speech is on breathing out. The main muscles of breathing are the diaphragm and the muscles between the ribs, known as ‘intercostals’. So the messages to start breathing out have to be sent out before the first message for the first full-of-breath word. All of the different parts of this musculature have to be co-ordinated with one another irrespective of the different lengths of paths in order to achieve a particular result.

The co-ordination of instructions and feedback is obviously intricate. The disruption of this feedback is obvious after a pain-killing injection in the mouth or by trying to talk if the sound of the speech is artificially delayed, as sometimes happens in a studio or on a mobile phone.

In some medically well-defined conditions like cerebral palsy, the chain of instructions is disrupted.

Some children’s issues seem to be exclusively motoric without any medical diagnosis, as, for example, where the activity of a muscle in the tongue triggers a corresponding activity in a muscle in the face. But in the absence of any known medical factor, such exclusively motoric issues seem to me to be much rarer than commonly thought.

The vocal tract

The instrument here was first identified as what it is, the vocal tract, by William Holder (1669), who correctly identified how the lungs, the airway, the larynx or voice box, the tongue, the nose, the palate, rigid at the front and flexible at the back, comprise a single system as far as speech is concerned.

But the mechanism here evolved for the entirely different purposes of eating and breathing, with the larynx protecting the lungs from food and drink. It fulfilled these functions for hundreds of millions of years before human ancestors evolved in their own direction over some significant period.

In Holder’s day there was no idea of evolutionary pathways or of the time scales involved.

It now appears that the larynx has been exapted from its original function. And there seem to have been several adaptations for the special purpose of speech, the lowering of the larynx and the pointing of the chin, both increasing the length of the effective part of the tract in a way critical for vowels like EE – almost universal across languages.

The contrasting case of sign language

Every sensory system (we have a lot more than five senses) has its own degree of resolution. Humans belong to the order of primates which first evolved as small, nocturnal, tree-living grub catchers, using good hand-eye co-ordination to catch their prey. The sounds of the grubs scratching might betray their presence. But they had to be seen to be caught. Primate hearing and eyesight evolved to suit this way of life. On this time-scale, the evolution of speech and language is a brief after-thought.

So we can see more things happening at once than we can hear. Although sign language is called what it is, the action of the hands is only one component of the message. There is also the direction of the look, left or right, up or down, the action of the tongue and lips, the orientation of the head. The signing resolves into hand configurations in various positions and orientations separately or together. Spoken language cannot match sign language in these respects.

Do you have an enquiry?