The Formants of Monophthong Vowels in Standard Southern British English Pronunciation

The Formants of Monophthong Vowels in Standard Southern British English Pronunciation

David Deterding, National Institute of Education, Singapore

Journal of the International Phonetic Association (1997) 27: 47-55

This file contains some details of the measurements of the vowels, as reported in the JIPA paper shown above. Please feel free to use these measurements in any way you find useful.

This directory contains 10 files in XL format. Each file contains the measurements of the first 3 formants of the 11 monophthong vowels of one speaker from the MARSEC database. In each file, the measurements
of each monophthong are held in a separate XL sheet (so there are 11 sheets in each file).

Speakers

The measurements are from 5 male and 5 female BBC broadcasters. The speakers are the ones whose voice is heard at the start of the first file in each directory of the following MARSEC directories:

    ASIG     female
    BSIG     male
    CSIG     male
    DSIG     female
    ESIG     female
    FSIG     female
    GSIG     female
    HSIG     male
    JSIG     male
    KSIG     male

The speaker from directory ASIG is referred to as A, from BSIG as B, etc.

Selection of words

In most cases, there are many instances of each vowel that could be selected for measurement. Wherever possible, vowels following /j/, /w/, and /r/ or preceding /l/ are avoided, to minimize the effects of coarticulation. For some vowels, particularly /u:/ and /U/, it is not always possible to avoid such environments.

At least 5 measurements are made for each vowel of each speaker, with the exception of the /U/ of two speakers: for speakers A and E, only 2 clear instances of this vowel could be found. (Maybe the BBC should be encouraged to broadcast more programmes on 'Good Books on Cooking'!)

Methods of measurement

The measurements were made from digital spectrograms with overlaid LPC formant tracks, using the CSL software (Version 5) from Kay Elemetrics Corp. A pre-emphasis coefficient of 0.9 was used, and a 16th order filter for the linear prediction.

The Application Notes of the CSL documentation (page 384) recommend 2 LPC coefficients for each expected formant, with an extra 2 coefficients for the DC component. With the default sampling rate of 10 kHz, where one might expect 5 formants up to the Nyquist frequency of 5 kHz, they therefore recommend an LPC order of 12 (but maybe less for female speech).

The MARSEC data is sampled at 16 kHz, and one might expect up to 8 formants below the Nyquist frequency of 8 kHz. The default LPC order of 12 is therefore clearly insufficient, which is why 16th order was used for these measurements. In fact, for some speakers, 18th or even 20th order might be tried, particularly when measurement of the first formant is problematical for open vowels such as /ae/. However,16th order was used for all these data, to ensure consistency.

It should be emphasized that measurement of all vowels is occasionally not possible; and when there are clear problems, when for instance there is no formant track anywhere near the expected frequency, or when the measured value is clearly spurious, such tokens are ignored, and others are found. It is also questionable whether each and every vowel measurement is indicative of the quality of that token. Attempts have been made to provide 10 reasonably consistent measurements of most vowels for each
speaker in the hope that the average values do represent a reliable measure for the speaker.

Analsing the MARSEC data using CSL

Some minor adjustments are required to allow CSL to analyse the MARSEC data. As the speech is not in the format expected by CSL, the command LDDATA should be used to load each speech file; the default
parameters are fine except for the data rate, which must be changed from 10000 to 16000.

In a few cases, loading the speech file in this way causes the machine to hang, necessitating a reboot. It is not clear why this happens (and it is possible that it is a quirk of my machine). One way to overcome this is to specify an arbitrary header size, such as 20, to allow the software to skip over whatever is causing the problem.

When the data file is loaded using LDDATA, it is possible to play the whole file but not do much else with it. The best solution to this is first to save the file as a .NSP file on the host hard disk, in which case CSL
attaches its own header. When this file is then loaded in the ordinary way (using LDSPL), CSL can deal with it like any other .NSP file.

All these measurements were made by :

David Deterding,
    National Institute of Education
    Nanyang Technological University
    1 Nanyang Walk
    Singapore 637616

They are published in Journal of the International Phonetic Association. (1997) 27:47-55

Any comments/suggestions/criticisms, please contact me at

dhdeter@nie.edu.sg

David Deterding