The Formants of Monophthong Vowels in Standard Southern British English Pronunciation
David Deterding, National Institute of Education, Singapore
Journal of the International Phonetic Association (1997) 27: 47-55
This file contains some details of the measurements of the vowels, as reported in the JIPA paper shown above. Please feel free to use these measurements in any way you find useful.
This directory contains 10 files in XL format. Each file contains the
measurements of the first 3 formants of the 11 monophthong vowels of one speaker
from the MARSEC database. In each file, the measurements
of each monophthong are held in a separate XL sheet (so there are 11 sheets in
each file).
Speakers
The measurements are from 5 male and 5 female BBC broadcasters. The speakers
are the ones whose voice is heard at the start of the first file in each
directory of the following MARSEC directories:
ASIG female
BSIG male
CSIG male
DSIG female
ESIG female
FSIG female
GSIG female
HSIG male
JSIG male
KSIG male
The speaker from directory ASIG is referred to as A, from BSIG as B, etc.
Selection of words
In most cases, there are many instances of each vowel that could be selected for
measurement. Wherever possible, vowels following /j/, /w/, and /r/ or preceding
/l/ are avoided, to minimize the effects of coarticulation. For some vowels,
particularly /u:/ and /U/, it is not always possible to avoid such environments.
At least 5 measurements are made for each vowel of each speaker, with the
exception of the /U/ of two speakers: for speakers A and E, only 2 clear
instances of this vowel could be found. (Maybe the BBC should be encouraged to
broadcast more programmes on 'Good Books on Cooking'!)
Methods of measurement
The measurements were made from digital spectrograms with overlaid LPC
formant tracks, using the CSL software (Version 5) from Kay Elemetrics Corp. A
pre-emphasis coefficient of 0.9 was used, and a 16th order filter for the linear
prediction.
The Application Notes of the CSL documentation (page 384) recommend 2 LPC
coefficients for each expected formant, with an extra 2 coefficients for the DC
component. With the default sampling rate of 10 kHz, where one might expect 5
formants up to the Nyquist frequency of 5 kHz, they therefore recommend an LPC
order of 12 (but maybe less for female speech).
The MARSEC data is sampled at 16 kHz, and one might expect up to 8 formants
below the Nyquist frequency of 8 kHz. The default LPC order of 12 is therefore
clearly insufficient, which is why 16th order was used for these measurements.
In fact, for some speakers, 18th or even 20th order might be tried, particularly
when measurement of the first formant is problematical for open vowels such as /ae/.
However,16th order was used for all these data, to ensure consistency.
It should be emphasized that measurement of all vowels is occasionally not
possible; and when there are clear problems, when for instance there is no
formant track anywhere near the expected frequency, or when the measured value
is clearly spurious, such tokens are ignored, and others are found. It is also
questionable whether each and every vowel measurement is indicative of the
quality of that token. Attempts have been made to provide 10 reasonably
consistent measurements of most vowels for each
speaker in the hope that the average values do represent a reliable measure for
the speaker.
Analsing the MARSEC data using CSL
Some minor adjustments are required to allow CSL to analyse the MARSEC data.
As the speech is not in the format expected by CSL, the command LDDATA should be
used to load each speech file; the default
parameters are fine except for the data rate, which must be changed from 10000
to 16000.
In a few cases, loading the speech file in this way causes the machine to hang,
necessitating a reboot. It is not clear why this happens (and it is possible
that it is a quirk of my machine). One way to overcome this is to specify an
arbitrary header size, such as 20, to allow the software to skip over whatever
is causing the problem.
When the data file is loaded using LDDATA, it is possible to play the whole file
but not do much else with it. The best solution to this is first to save the
file as a .NSP file on the host hard disk, in which case CSL
attaches its own header. When this file is then loaded in the ordinary way
(using LDSPL), CSL can deal with it like any other .NSP file.
All these measurements were made by :
David Deterding,
National Institute of Education
Nanyang Technological University
1 Nanyang Walk
Singapore 637616
They are published in Journal of the International Phonetic Association.
(1997) 27:47-55
Any comments/suggestions/criticisms, please contact me at
dhdeter@nie.edu.sg