Digital audio files in acoustic analysis
Setting up MS Windows audio devices
You must first set up recording and playback devices for the Operating System
environment (MS Windows or Mac OS). There are a few possible scenarios:
- You have a separate audio device that you use for capturing audio and you
want to use professional audio editing software (see the list below)
- You have a separate audio device that you use for capturing audio and you
want to use Praat
- You have only one device on your system
Scenario 1
- You must first make sure that your recording device is NOT set up as the
default audio device for MS Windows applications. This is prevent unwanted
sounds (such as mouse clicks, alerts, etc.) from leaking into your audio stream.
Follow these steps:
- Go to Control Panel
- Click on Sounds and Multimedia
- Click on the Audio tab
- Select your device from the drop-down menu
- You should then select your recording device in your audio editing software
(see the GoldWave example below).
- Your professional audio device will most likely have its own audio mixer
and router. Please, follow directions specific to your device. See the image
below as an example of setting up USB Pre amd M-Audio Delta series hardware.



Scenario 2
If you want to use your professional (either internal PCI, or external USB,
Firewire) device with Praat, you will have to make this device the DEFAULT device
for your entire system. See the Sound and Multimedia dialog box above for reference.
This is not an ideal situation, as once your device is selected as default,
all kinds of unwanted system sounds (mouse clicks, alerts, notification sounds,
etc.) can inadvertently leak into the system. It is also not guaranteed that
your device will work directly with Praat, particularly if your device is set
up to sample audio at 96 kHz and a 24-bit resolution, which Praat is incapable
of supporting.
Scenario 3
This scenario should be avoided due to the danger of producing inferior quality
digital audio. If you don't have an external device to record audio, you can
simply go directly to Praat and use it audio recording function (see the section
below)
Recording (capturing) audio on a PC
Use professional audio editing software to capture digital audio.
While it is possible to capture digital audio streams (record audio) with Praat,
it is advised to use professional-grade audio editing software, such as
Macintosh users can use
Audio editing software has a number of advantages over the native Praat recording
module. Some of them include:
- sophisticated graphic user interface (GUI)
- ability to record audio at 24-bit/96 kHz
- sophisticated waveform editing tools
- compatibility with hardware and software technologies, such as Direct, VST,
TDM, and others
The image below illustrates GoldWave's device selection dialog box. It boasts
a number of sophisticated options, such as the ability to select a particular
recording device (hardware) directly from within GoldWave.

What if I don't want to buy or install additional software?
If that case, you can still use Praat to capture digital audio streams. Here
is how:
- make sure you have selected the right recording and playback devices in
MS Windows (or MacOS) (more on that HERE)
- start Praat
- choose, New, Record mono sound...
- choose the sample rate at which you want to record (more on that HERE)
- click the Record button (you should now see level meters work)
- click Stop when finished
- enter file name and click Save to list
- your file is now added to the Praat Object window
- it is IMPORTANT for audio level not to reach the red zone, as your recordings
will be clipped and distorted (see image below)

Choosing recording hardware
It is recommended that you use professional audio hardware to capture audio
streams on your PC.
Hardware/software solutions
Hardware/software solutions, such as Kay Elemetrics CSL, have many advantages.
They are designed to be used by speech scientists, audiologists, and linguists.
They're most typically ready to use "out of the box." Both software
and hardware are of excellent quality and versatility. Tucker Davis solutions
are generally more flexible, as they're designed to be interfaced with the PC
by means of computer programs and scripts. Drag and drop functions are also
available.
Software only with custom hardware
Software only solutions are generally inexpensive (or free) and offer many
attractive functions. Praat and WaveSurfer work several different platform and
offer comprehensive signal analysis features. However, one cannot do reliable
analysis without high-quality, dedicated hardware. Unfortunately, most of commercially
available hardware has been designed for the audio industry, not for linguists.
Therefore, one should be very careful in choosing speech processing hardware.
As a general rule, one should stay away from hardware designed for computer
games and home recording studios.
Your recordings are analog and stored on magnetic tape
You will need the following pieces of hardware (also see the diagram below):
- high quality tape deck
- A/D (Analog-to-Digital) converter
- pre-amplifier
- digital audio interface
- a PCI digital audio card
- USB or Firewire digital audio interface

Your recordings are digital and stored on DAT tapes, CDs, or flash memory
If your recordings are on DAT tapes, you will need to have matching interfaces
between the output of the DAT deck and the input of the PC. Usually, this will
be done over an S/PDIF (mono RCA cable) or AES/EBU (balanced XLR cable) interface.
If your recordings are on flash memory, you will need a memory card reader.
Those usually connect to the PC via the USB interface. If your recordings are
on CD-ROM, you will simply put the CD in the CD-ROM drive and copy the digital
audio files to your PC.

You are recording via a microphone directly into the PC
Follow the diagram below for a typical set up to record via a microphone directly
into the PC:

Sample rate, resolution, digital audio files
Sample rate and resolution (quantization) are two of the most important parameters
in the Analog-to-Digital (A/D) conversion process. Without going into technical
detail, it is important to remember that one should always choose the highest
sample rate and resolution available on the digital recorder or A/D converter.
Most typically, you will be recording speech at sample rate of 48,000
Hz and the resolution of 16-bit (common DAT standard).
However, more and more recording devices offer a resolution of 24-bit
and a sample rate of 96,000 Hz. While 96,000 Hz
can be overkill for speech recordings (unless you are recording for archival
purposes), the 24-bit resolution will certainly improve the
quality (amount of acoustic detail) of your recordings. Note that Praat does
not have the ability to handle 24-bit digital audio files, so all of your 24-bit
recordings will have to be converted to 16-bit before analysis.
Prepare digital audio files for analysis
If you are dealing with 16-bit, 48,000 Hz audio files, you can use the Akustyk
batch converter. Simply select your input directory, output directory, and the
desired sample rate, and Akustyk will convert your files in batch. You should
make sure that your recordings are not clipped (overloaded), as Akustyk will
notify you of this error and disrupt the batch process.
What sample rate to use for analysis?
For the purposes of acoustic analysis, the most useful sample rate is 16,000
Hz. Sometimes, you might want to go a little lower (especially for
the purposes of re-syntheisis) to 10,000 Hz or higher (for
children's voices) to 22,050 Hz.

What file format to choose?
Praat and Akustyk give you the ability to work with several different file
formats. What all of these file formats (aiff, wav, au, nsp) have in common
is that they all use the PCM (Pulse Code Modulated) audio file
format. PCM files store uncompressed, raw sample values, which
means that the original quality is never compromised. PCM formats
differ considerably from compressed file formats, such as MP3, AAC, ATRAC, WMF,
QuickTime, which use psycho-acoustic compression to decrease the file size.
Compressed audio formats should never be used in acoustic analysis. As long
as you are using PCM files, it does not matter what particular audio file format
you use. Microsoft WAV, Apple AIFF, Sun SU, or Kay NSP, all are equally good
(see the details below). Use the Akustyk batch converter to convert between
those formats with ease.
More on digital audio files
Digital audio files can be described by means of the following parameters:
SAMPLE RATE is the sampling frequency expressed in hertz
(Hz) and defines the number of times per second that the analog audio signal
has been measured (sampled). Sampling frequency determines the audio bandwidth,
or frequency response, that can be represented by the digital signal. Higher
sampling frequencies theoretically yield wider audio bandwidth. Compact Disc
Audio uses a sampling rate of 44,000 Hz, therefore, each second of audio is
made up of 44,100 samples (with a word length of 65,536 binary digits, see BIT
DEPTH below).
BIT DEPTH defines the digital 'word length' used to represent
a given sample. Bit depth correlates to the maximum dynamic range that can be
represented by the digital signal. Larger bit depths theoretically yield more
dynamic range. Bit rate is an exponential measure (exponent of 2), so as bit
rate increases, the amount of data increases exponentially. Compact disc audio,
for example uses 16-bit audio, therefore, each sample is represented by a digital
word of 2^16 (65,536) binary digits. 24-bit audio has a word length of 2^24
(16.7 million) binary digits per sample. Similarly, the theoretical dynamic
range of a 16-bit digital audio file is 96 dB (6db x 16).
BIT RATE = Bit Depth x Sampling Frequency. In the example
below the data rate of 16-bit/44,000 stereo audio is computed. Division by 1024
converts from bits to kilobits. Multiplying the result by 2 gives the rate for
stereo (two-channel) audio.((16 x 44100) / 1024)) x 2 = 1376 kbps (172 KB/s)
FILE TYPE. There are two general types of file types: self-describing,
where the device parameters and encoding are made explicit in some form of header,
and 'raw', where the device parameters and encoding are fixed. Self-describing
file types generally define a family of data encodings, where a header fields
indicates the particular encoding variant used. Headerless types define a single
encoding and usually allows no variation in device parameters (except sometimes
sampling rate, which can be a pain to figure out other than by listening to
the sample).
Some most common types include:
- .au or .snd NeXT, Sun
- .aif(f), AIFF Apple, SGI
- .aif(f), AIFC Apple, SGI
- .iff, IFF/8SVX Amiga
- .voc Soundblaster
- .wav, WAVE Microsoft
- .sf IRCAM
- .snd, .fssd Mac, PC
- .raw Mac, PC
FILE FORMAT is the format in which a particular audio file
is encoded. The header fields indicate the particular encoding variant used.
The data encoding defines how the actual samples are stored in the file, e.g.
signed or unsigned, as bytes or short integers, in little-endian or big-endian
byte order, etc. Some file formats apply some kind of compression to the data,
e.g. Huffman encoding, or simple silence deletion.
|