HomeAboutContact
Supported vowel systemsAkustyk featuresDownloadDocumentation

 
Tutorials
Akustyk links
Linguistics
Audio technology
Recommendations

Digital audio files in acoustic analysis

Setting up MS Windows audio devices

You must first set up recording and playback devices for the Operating System environment (MS Windows or Mac OS). There are a few possible scenarios:

  1. You have a separate audio device that you use for capturing audio and you want to use professional audio editing software (see the list below)
  2. You have a separate audio device that you use for capturing audio and you want to use Praat
  3. You have only one device on your system

Scenario 1

  1. You must first make sure that your recording device is NOT set up as the default audio device for MS Windows applications. This is prevent unwanted sounds (such as mouse clicks, alerts, etc.) from leaking into your audio stream. Follow these steps:
    1. Go to Control Panel
    2. Click on Sounds and Multimedia
    3. Click on the Audio tab
    4. Select your device from the drop-down menu
  2. You should then select your recording device in your audio editing software (see the GoldWave example below).
  3. Your professional audio device will most likely have its own audio mixer and router. Please, follow directions specific to your device. See the image below as an example of setting up USB Pre amd M-Audio Delta series hardware.

Scenario 2

If you want to use your professional (either internal PCI, or external USB, Firewire) device with Praat, you will have to make this device the DEFAULT device for your entire system. See the Sound and Multimedia dialog box above for reference. This is not an ideal situation, as once your device is selected as default, all kinds of unwanted system sounds (mouse clicks, alerts, notification sounds, etc.) can inadvertently leak into the system. It is also not guaranteed that your device will work directly with Praat, particularly if your device is set up to sample audio at 96 kHz and a 24-bit resolution, which Praat is incapable of supporting.

Scenario 3

This scenario should be avoided due to the danger of producing inferior quality digital audio. If you don't have an external device to record audio, you can simply go directly to Praat and use it audio recording function (see the section below)

Recording (capturing) audio on a PC

Use professional audio editing software to capture digital audio.

While it is possible to capture digital audio streams (record audio) with Praat, it is advised to use professional-grade audio editing software, such as

Macintosh users can use

Audio editing software has a number of advantages over the native Praat recording module. Some of them include:

  • sophisticated graphic user interface (GUI)
  • ability to record audio at 24-bit/96 kHz
  • sophisticated waveform editing tools
  • compatibility with hardware and software technologies, such as Direct, VST, TDM, and others

The image below illustrates GoldWave's device selection dialog box. It boasts a number of sophisticated options, such as the ability to select a particular recording device (hardware) directly from within GoldWave.

What if I don't want to buy or install additional software?

If that case, you can still use Praat to capture digital audio streams. Here is how:

  • make sure you have selected the right recording and playback devices in MS Windows (or MacOS) (more on that HERE)
  • start Praat
  • choose, New, Record mono sound...
  • choose the sample rate at which you want to record (more on that HERE)
  • click the Record button (you should now see level meters work)
  • click Stop when finished
  • enter file name and click Save to list
  • your file is now added to the Praat Object window
  • it is IMPORTANT for audio level not to reach the red zone, as your recordings will be clipped and distorted (see image below)

Choosing recording hardware

It is recommended that you use professional audio hardware to capture audio streams on your PC.

Hardware/software solutions

Hardware/software solutions, such as Kay Elemetrics CSL, have many advantages. They are designed to be used by speech scientists, audiologists, and linguists. They're most typically ready to use "out of the box." Both software and hardware are of excellent quality and versatility. Tucker Davis solutions are generally more flexible, as they're designed to be interfaced with the PC by means of computer programs and scripts. Drag and drop functions are also available.

Software only with custom hardware

Software only solutions are generally inexpensive (or free) and offer many attractive functions. Praat and WaveSurfer work several different platform and offer comprehensive signal analysis features. However, one cannot do reliable analysis without high-quality, dedicated hardware. Unfortunately, most of commercially available hardware has been designed for the audio industry, not for linguists. Therefore, one should be very careful in choosing speech processing hardware. As a general rule, one should stay away from hardware designed for computer games and home recording studios.

Your recordings are analog and stored on magnetic tape

You will need the following pieces of hardware (also see the diagram below):

  • high quality tape deck
  • A/D (Analog-to-Digital) converter
  • pre-amplifier
  • digital audio interface
    • a PCI digital audio card
    • USB or Firewire digital audio interface

Your recordings are digital and stored on DAT tapes, CDs, or flash memory

If your recordings are on DAT tapes, you will need to have matching interfaces between the output of the DAT deck and the input of the PC. Usually, this will be done over an S/PDIF (mono RCA cable) or AES/EBU (balanced XLR cable) interface. If your recordings are on flash memory, you will need a memory card reader. Those usually connect to the PC via the USB interface. If your recordings are on CD-ROM, you will simply put the CD in the CD-ROM drive and copy the digital audio files to your PC.

You are recording via a microphone directly into the PC

Follow the diagram below for a typical set up to record via a microphone directly into the PC:

Sample rate, resolution, digital audio files

Sample rate and resolution (quantization) are two of the most important parameters in the Analog-to-Digital (A/D) conversion process. Without going into technical detail, it is important to remember that one should always choose the highest sample rate and resolution available on the digital recorder or A/D converter. Most typically, you will be recording speech at sample rate of 48,000 Hz and the resolution of 16-bit (common DAT standard). However, more and more recording devices offer a resolution of 24-bit and a sample rate of 96,000 Hz. While 96,000 Hz can be overkill for speech recordings (unless you are recording for archival purposes), the 24-bit resolution will certainly improve the quality (amount of acoustic detail) of your recordings. Note that Praat does not have the ability to handle 24-bit digital audio files, so all of your 24-bit recordings will have to be converted to 16-bit before analysis.

Prepare digital audio files for analysis

If you are dealing with 16-bit, 48,000 Hz audio files, you can use the Akustyk batch converter. Simply select your input directory, output directory, and the desired sample rate, and Akustyk will convert your files in batch. You should make sure that your recordings are not clipped (overloaded), as Akustyk will notify you of this error and disrupt the batch process.

What sample rate to use for analysis?

For the purposes of acoustic analysis, the most useful sample rate is 16,000 Hz. Sometimes, you might want to go a little lower (especially for the purposes of re-syntheisis) to 10,000 Hz or higher (for children's voices) to 22,050 Hz.

What file format to choose?

Praat and Akustyk give you the ability to work with several different file formats. What all of these file formats (aiff, wav, au, nsp) have in common is that they all use the PCM (Pulse Code Modulated) audio file format. PCM files store uncompressed, raw sample values, which means that the original quality is never compromised. PCM formats differ considerably from compressed file formats, such as MP3, AAC, ATRAC, WMF, QuickTime, which use psycho-acoustic compression to decrease the file size. Compressed audio formats should never be used in acoustic analysis. As long as you are using PCM files, it does not matter what particular audio file format you use. Microsoft WAV, Apple AIFF, Sun SU, or Kay NSP, all are equally good (see the details below). Use the Akustyk batch converter to convert between those formats with ease.

More on digital audio files

Digital audio files can be described by means of the following parameters:

SAMPLE RATE is the sampling frequency expressed in hertz (Hz) and defines the number of times per second that the analog audio signal has been measured (sampled). Sampling frequency determines the audio bandwidth, or frequency response, that can be represented by the digital signal. Higher sampling frequencies theoretically yield wider audio bandwidth. Compact Disc Audio uses a sampling rate of 44,000 Hz, therefore, each second of audio is made up of 44,100 samples (with a word length of 65,536 binary digits, see BIT DEPTH below).

BIT DEPTH defines the digital 'word length' used to represent a given sample. Bit depth correlates to the maximum dynamic range that can be represented by the digital signal. Larger bit depths theoretically yield more dynamic range. Bit rate is an exponential measure (exponent of 2), so as bit rate increases, the amount of data increases exponentially. Compact disc audio, for example uses 16-bit audio, therefore, each sample is represented by a digital word of 2^16 (65,536) binary digits. 24-bit audio has a word length of 2^24 (16.7 million) binary digits per sample. Similarly, the theoretical dynamic range of a 16-bit digital audio file is 96 dB (6db x 16).

BIT RATE = Bit Depth x Sampling Frequency. In the example below the data rate of 16-bit/44,000 stereo audio is computed. Division by 1024 converts from bits to kilobits. Multiplying the result by 2 gives the rate for stereo (two-channel) audio.((16 x 44100) / 1024)) x 2 = 1376 kbps (172 KB/s)

FILE TYPE. There are two general types of file types: self-describing, where the device parameters and encoding are made explicit in some form of header, and 'raw', where the device parameters and encoding are fixed. Self-describing file types generally define a family of data encodings, where a header fields indicates the particular encoding variant used. Headerless types define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample).

Some most common types include:

  • .au or .snd NeXT, Sun
  • .aif(f), AIFF Apple, SGI
  • .aif(f), AIFC Apple, SGI
  • .iff, IFF/8SVX Amiga
  • .voc Soundblaster
  • .wav, WAVE Microsoft
  • .sf IRCAM
  • .snd, .fssd Mac, PC
  • .raw Mac, PC

FILE FORMAT is the format in which a particular audio file is encoded. The header fields indicate the particular encoding variant used. The data encoding defines how the actual samples are stored in the file, e.g. signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Some file formats apply some kind of compression to the data, e.g. Huffman encoding, or simple silence deletion.