Click here to go to the download page
Overview of the Database
This database contains acoustic and kinematic data (collected at Harvard University and Haskins Laboratories) from speakers who were uttering sequences of alternating syllables in an evenly-timed fashion, "like a metronome". Since the syllables were perceptually isochronous, the database can be used to study physical cues that underlie the perception of temporal intervals in speech. A paper containing the details of one such study is included on the download page. The paper also contains details of data collection for all materials on this website, as well as a method for the analyzing these data for proposed timing cues.
The database contains acoustic data from 6 speakers (3 male, 3 female), and kinematic data from 3 speakers (1 male, 2 female). All utterances in this database have 11 syllables, and consist of the syllable /ba/ alternating with another syllable (the "target syllable"). The target syllable is one of the following:
/ba/ /cha/ /dela/ /ha/ /la/ /lad/ /li/ /ma/ /pa/ /sa/ /spa/ /ta/ /ya/
Each speaker produced four utterances for each target syllable.
Some details on acoustic and kinematic data
The acoustic data were collected in quiet rooms at Harvard and Haskins, and are in 16-bit .wav format (sample rate = 10 kHz, anti-alias filtered at 4 kHz).
NOTE: Some computer sound cards do not have 10 kHz as an available sample rate for playback, and may play the sound files at a different sample rate without notifying the user. This will speed up or slow down the sound files. To hear them at the correct speed, make sure your sound card supports a 10 kHz sample rate.
Kinematic data were collected using an electromagnetic midsagittal articulometer (EMMA) system at Haskins Laboratories, and are also in 16-bit .wav format (sample rate = 625 Hz, anti-alias filtered at 200 Hz during acquisition, low-pass filtered at 17 Hz after voltage-distance conversion). Note that these files are in .wav file format for portability between systems: they are not acoustic files!
The EMMA system measures horizontal and vertical position for selected articulators in a coordinate system centered at the upper incisors. Movements were measured from the upper lip, lower lip, jaw, tongue tip, tongue blade, tongue body, and tongue rear. (Note that the "tongue tip" transducer was not truly on the tip of the tongue, as this would interfere with articulation, but roughly 1/2 cm behind the tip.)
In this database, acoustic files have the following filename structure:
{subject initials} {target syllable} {utterance number}. For example: jbsa1.wav
The kinematic files follow the same filename conventions as the acoustic data, except that kinematic data files have a "k" at the end of the file prefix. For example, the kinematic data corresponding to jbsa1.wav is jbsa1k.wav.
Working with kinematic files
Each kinematic .wav file contains a single time series which contains the data for 7 articulators laid out end-to-end (no gaps), each with x and y directions Thus, once a kinematic file is read into a computer, dividing it into 14 equal parts will result in vectors representing:
Physical units of the data
The acoustic files are in units of volts, and the kinematic files are in units of centimeters.
As a check that you have the correct physical units, here are the max and min values in the acoustic utterance jbsa1.wav:
max 6.3281 volts
min -5.1123 volts
Here are the lower lip x and y maximum and minimum values for jbsa1k.wav:
lower lip x lower lip y
max -.6006 -1.4014 centimeters
min -1.1865 -3.6084 centimeters
Notes for MATLAB users
The following MATLAB code shows an example of how to read the acoustic data into a vector, and the kinematic data file into a 14 column matrix, where the columns correspond to the articulators in the list above.
% Acoustic data
[acoustic_data,sample_rate,nbits]=wavread('jbasa1.wav');
acoustic_data=acoustic_data*10;
% Kinematic data
[kinematic_data,sample_rate,nbits]=wavread('jbsa1k.wav');
npts=length(kinematic_data);
kin_data=reshape(kinematic_data, npts/14,14);
kin_data=kin_data*10;
Note the final step in both cases, which applies a gain factor of 10. This is necessary in MATLAB to convert these particular .wav files back to their original physical units (c.f. "Physical units of the data", above).
Miscellaneous notes
Subject AP has no utterances with /lad/ or /spa/ as the target syllable, and has only 3 utterances with /li/ as the target syllable. Subject LC has only 3 utterances with /pa/ as the target syllable. Subject LK has only 3 utterances with /ha/ and /sa/ as the target syllable.
A detailed description of this dataset, including figures of acoustic and kinematic data, can be found in chapter 5 of Patel, A.D. A Biological Study of the Relationship between Language and Music, 1996 Ph.D. thesis, Harvard University. (Available from University Microfilms).
![]() |
The Neurosciences Institute |
|
April 08, 2002, apatel
Visitor # Warning: Failed opening 'counter.php' for inclusion (include_path='.:/home/www/includes:/home/www/public/inc:/php/includes:/usr/share/php') in /home/www/users/patel/speech_database.html on line 175 (since 1999-01-01) |