|
Table of contents
In order to perform acoustic analysis on recorded speech data or to deliver audio on-line, the audio signal has to be converted into a digital audio, PCM file format, such as Wav or Aiff. Analog recordings have to be digitized and digital recordings need to be transferred to a personal computer via a digital audio file transfer interface. This is an important, yet often underestimated, stage in the process of preparing audio data for analysis.
A/D conversion fundamentals
The main goal of A/D conversion (digitization) is to obtain the best possible digital representation of the original analog waveform. Without going into too much technical detail of the digitization process, one should choose a sample rate that will capture a broad range of frequencies and a bit-depth that will allow a wide dynamic range and a negligible amount of quantization noise. These goals can be achieved by means of a premium-quality, stand-alone A/D converter operating at the sample rate of at least 48,000 Hz and a 24-bit resolution. It is absolutely crucial not to use a PCI multimedia sound card, as they are built from inferior-quality electronic components and, more importantly, allow electrostatic noise and distortion to leak into the captured acoustic signal:

Spectrum of typical electrostatic noise generated by computer circuitry.
| |
AES/EBU |
S/PDIF (IEC-958) |
| Cabling |
110 ohm shielded |
TP 75 ohm coaxial or fiber |
| Connector |
3-pin XLR |
RCA (or BNC) |
| Signal level |
3..10V |
0.5..1V |
| Modulation |
biphase-mark-code |
biphase-mark-code |
| Max. Resolution |
24 bits |
24 bits |
The analog playback device (such as TASCAM 122 mkIII) should be connected to the A/D converter. One should make sure that the output levels on the tape deck match the input levels on the A/D converter. It is recommended to use balanced XLR line level interface (+24 dBu min. gain, +7 dBu max. gain, 65k ohm impedance). If the tape deck does not have this kind of output interface, a signal level transformer (such as Ebtech Line shifter PHOTO) and a pre-amplifier should be used.
The A/D converter needs to be connected to a PCI (though USB and FireWire are becoming common) digital audio I/O card (such as Midiman Delta DiO 2496 via a S/PDIF interface). The digital I/O card should be selected as the recording interface in the audio recording software (such as SONY SoundForge on a PC or BIAS Peak VST on a Mac). The digital audio signal should be captured with this software and saved either as Wav (PC) or Aiff (Mac) file at the sample rate and bit depth that the A/D converter was set to. It is also possible to capture digital audio signal directly into acoustic analysis software, such as CSL or Praat, though it is not recommended due to the fact that specialized recording and processing software offers considerable more control over the incoming signal. It should also be mentioned that USB Pre may be used as a high-quality, stand-alone A/D converter.
In this case the digital audio signal is transferred to a PC via the USB interface, which eliminates the need to install a separate PCI digital I/O card and makes it possible to capture digital audio on a laptop. In addition, USB Pre has a pair of tape-level inputs, to which a cassette deck can be directly connected.
|
|
Sensitivity (typical, for 0 dB FS) |
Clip Level (1% THD) |
Impedance (actual) |
|
|
min. gain |
max. gain |
|
|
|
MIC |
-10 dBu |
-53 dBu |
-12 dBu (195 mV rms) |
2k ohm active-balanced |
|
LINE |
+24 dBu |
+7 dBu |
+24 dBu (12.3 V rms) |
65k ohm active balanced |
|
DI |
+8 dBu |
-9 dBu |
+9 dBu (2.2 V rms) |
10k ohm unbalanced |
|
TAPE |
+8 dBu |
-9 dBu |
+9 dBu (2.2 v rms) |
110k ohm
unbalanced |
Summary of typical signal level types.
Improving A/D conversion
There are a few simple, yet important ways in which the quality of the digitial representation of an analog waveform can be improved.
1. Use a sample rate of 96,000 Hz.
In principle, if frequency response were the only issue, there would be no advantage in moving to formats with higher sampling rates. However, the evidence is otherwise. Direct psychoacoustic comparisons of the same source material, recorded and reproduced at 44.1 kS/s, 96 kS/s 192 kS/s show that there is an advantage in going to the higher rates - it sounds better! The most common comment is that such recordings have better spatial resolution. What mechanism can be at work? It seems unlikely that we have all suddenly developed ultrasonic hearing capabilities.
Energy dispersion and anti-alias filtering.
Sharp filtering inevitably causes a ringing transient response - the effect is referred to as the Gibbs phenomenon. The ringing contains energy, and although the energy in the input transient is concentrated at one time, the energy from the anti-alias filter is spread over a much longer time - the audio picture is "defocused. We might argue that the energy is ultrasonic, but this is certainly not the case at 44.1 or 48 kS/s - our bandwidth constraints mean that to get good anti-aliasing, we must filter as fast as we can, and only pass the audio bandwidth. A high sample rate gives us the extra bandwidth to contain the ringing (energy defocusing).
The audio DVD standard.
In addition to improved anti-aliasing and energy defocusing handling, the 96,000 Hz sample rate is part of the new, emerging digital audio standard, used in present-day recording studios, consumer PCs (e.g., the new Sound Blaster Audigy cards), and the audio DVD format.
2. Use 24-bit quantization
For the sampling theorem to apply exactly, each sampled amplitude value must exactly equal the true signal amplitude at the sampling instant. Real ADCs do not achieve this level of perfection. Normally, a fixed number of bits (binary digits) is used to represent a sample value. Therefore, the infinite set of values possible in the analog signal is not available for the samples. In fact, if there are R bits in each sample, exactly 2R sample values are possible. For high-fidelity applications, such as archival copies of analog recordings, 24 bits per sample or a so-called 24-bit resolution, should be used. The difference between the analog signal and the closest sample value is known as quantization error. Since it can be regarded as noise added to an otherwise perfect sample value, it is also often called quantization noise. The effect of quantization noise is to limit the precision with which a real sampled signal can represent the original analog signal. This inherent limitation of the ADC process is often expressed as a Signal-to-Noise ratio (SNR), the ratio of the average power in the analog signal to the average power in the quantization noise. In terms of the dB scale, the quantization SNR for uniformly spaced sample levels increases by about 6 dB for each bit used in the sample. For ADCs using R bits per sample and uniformly spaced quantization levels, SNR = 6R - 5 (approximately). Thus, for 16-bit encoding about 91 dB is possible. It is 20 to 30 dB better than the 60 dB to 70 dB that can be achieved in analog audio cassette players using special noise reduction techniques. A 24-bit encoding yields a theoretical SNR of 138 dB, which is only limited by the electronics of the hardware itself.
2. Use appropriate anti-aliasing filters
Simply put, aliasing is a kind of sampling confusion that can occur during the digitization process. It is a direct consequence of violating the sampling theorem. The highest frequency in a sampling system must not be higher than the Nyquist frequency. With higher audio frequencies, the sampler continues to produce samples above Nyquist at a fixed rate, but the samples will create false information in the form of alias frequencies. In practice, aliasing can and should be overcome. The solution is rather straightforward. The input signal must be band-limited with a low-pass (anti-aliasing) filter that provides significant attenuation at the Nyquist frequency. The most "archetypal" anti-aliasing filter will have "brick-wall" characteristics with instantaneous attenuation and a very steep slope. This results in unwanted ringing-type effects and should be avoided. In practice, our system should use an oversampling (see below) A/D converter with a mild low-pass filter, high initial sampling frequency, and decimation processing to prevent output sampling frequency.
3. Dither
Dither is a small amount of noise added to the audio signal before sampling. This causes the audio signal to shift with respect to quantization levels. Quantization error is thus decorelated from the signal and the effects of the quantization error become negligible. Dither does not prevent the quantization error; instead, it allows the system to encode amplitudes smaller than the least significant bit.
4. Oversampling
Oversampling is another technique aimed at improving the results of the digitization process. As noted above, a brick-wall filter may produce unwanted acoustic effects. In oversampling A/D conversion, the input signal is first passed through a mild low-pass filter, which provides sufficient attenuation at high frequencies. To extend the Nyquist frequency, the signal is then sampled at a high frequency and quantized. Afterwards, a digital low-pass filter is used to reduce the sampling frequency and prevent aliasing when the output of the digital filter (e.g. an interpolating, phase linear "FIR" filters) downsampled to achieve the desired output sampling frequency (e.g., 44,100 Hz). In addition to eliminating unwanted effects of a brick-wall analog filter, oversampling helps achieve increased resolution by extending the spectrum of the quantization error far beyond the audio base-band, rendering the in-band noise relatively insignificant.
5. Use high-quality, no-compromise hardware and software.
A/D conversion workflow
Below is an overview of a minimalist A/D conversion workflow
|