I am going to define noise as any unwanted sound that is recorded in addition to the sound (or signal) we want. So, for example, the broadband noise generated by an analog cassette recorder and the sound of traffic inadvertently captured during a recording session are both be considered noise, despite their different sources and spectral characteristics. The ratio of power between the signal and the unwanted noise will be defined as signal-to-noise ratio (SNR). One of the most important goals of any speech recording session is to achieve the highest possible SNR.
As I wrote in the section on metering, sound intensity is typically measured on the decibel scale. The decibel scale has the bad reputation of being confusing. For many fieldworkers, the whole business of decibels, non-linearity, amplitude, loudness, etc., are the last things they want to worry about while out in the field recording the last speaker of a dying language (see this NPR story). I strongly agree with this sentiment.
In order to simplify things, I have prepared an SNR demo (Figure 1) that will illustrate the notion with speech sound files mixed with noise. Using Akustyk, I have generated a speech-shaped noise file that matched the frequency envelope of the speech file. Then, I set the intensity of the speech file at 72 dB and mixed (also using Akustyk) with the noise file at varying intensity levels from 0 to 80 dB SPL. This should illustrate the effect that different levels of noise have speech intelligibility. You will also be able to see the gradual "smearing" of the speech-specific spectral detail, as the level of noise increases. You will be able to tell why analysis software might struggle with files that are full of unwanted noise.
Figure 1. A Flash demo of the effects of SNR on speech intelligibility
Figure 2. Spectrogram and waveform of an audio file containing speech-shaped noise as the noise level increases in 10 dB increments over time
Sources of noise
Inherent noise of a recording chain
Inherent noise (self-noise) in theory
Every recording chain produces some noise due to electrostatic activity. This kind of noise is often referred to as "inherent noise" or "self-noise." Let's take a simple field recording chain as an example. Suppose we use the Fostex FR-2LE recorder and the Sennheiser HSP2 omnidirectional microphone. When matching a microphone with a recorder, it is important to make sure that the recorder (or its microphone pre-amplifier, to be exact) does not appreciably degrade the microphone's noise performance. The task is often difficult because of the lack of standard protocols and nomenclature in describing self-noise. To make matters worse, some manufacturers do not even publish noise floor data for their products.
The two noise values that some manufacturers do publish most of the time are:
the microphone's Equivalent Noise Level (ENL, measured in dB SPL, A-weighted)
the recorder's (pre-amplifier's) Equivalent Input Noise (EIN, measured in dBu)
For example, the ENL value for the Sennheiser HSP2 microphone is 28 dB SPL, and the EIN value for the Fostex FR-LE2 value is -129 dBu. Fostex actually does not publish the EIN value, but it was independently measured by the Avisoft Lab. How do we make sense of these two values? How can we tell if the HSP2 and the FR-2LE are a good match in terms of noise performance?
To make the calculation easier, I created a look-up table, which helps you convert the microphone's ENL value into dBu, so you can directly compare it to the EIN value of your recorder. Follow the following steps:
Find the ENL value of your microphone (also referred to as "Equivalent Noise Level," "Self-Noise," "Equivalent Noise SPL," or "Noise Floor").
Find your microphone's sensitivity in mV/Pa.
Locate the values (1) and (2) above, in in the look-up table.
Move your finger along the Sensitivity row and down the ENL column.
The point where the two values meet is the microphone's ENL converted to dBu, A-weighted.
Find the manufacturer's published EIN value for your recorder and reduce it by 3-5 (to approximate an A-weighted value).
Compare the recorder's new EIN value with the microphone's ENL value (both in dBu)
The recorder's value should be at least about 10 dB lower than that of the microphone's to guarantee no appreciable degradation in noise.
In our example, the Fostex FR-2LE's A-weighted EIN is -129 dBu, while the Sennheiser's ENL value is -117.77 dBu, so the recorder seems to just be able to provide adequate gain without too much noise degradation. Microphones of higher sensitivity should give even better noise performance.
One word of caution. You may have concluded that if you have a microphone of sufficiently high sensitivity, you can use a noisier recorder and still get very good results. While this holds true in theory, in practice, noisier pre-amplifiers are typically cheaper, and therefore built of lower quality components. They are likely to be more prone to distortion, even at seemingly low gain levels (e.g., see my review of the Marantz PMD660). I encourage you to read the materials available at the Avisoft Lab website for some interesting noise floor comparisons among popular field recorders.
Finally, what about dynamic microphones? Since they can be considered virtually noiseless due to their design and operation principles, one might conclude that they should have decent noise performance with most field recorders. This, regrettably, is not true. Dynamic microphones are typically at least ten times less sensitive than condenser microphones, so they require significantly more pre-amplifier gain for equivalent signal levels. This inevitably results in higher noise. For example, the Sennheiser HMD25-1's ENL value is approximately -135 dBu, A-weighted, while its sensitivity value is 1 mV/Pa. It would require a recorder of an EIN value of at least -145 dBu for clean gain. This is very hard to achieve, by any field recorder. This is one of the reasons why I typically recommend condenser microphones for field recording.
Inherent noise in the real world
The theoretical self-noise calculations are only a part of the picture. We should be able to have a more tangible idea of the noise levels captured by our recording chain in a typical speech recording scenario. Of course, the same theoretical principles apply, but we should try to estimate the actual noise levels when recording speech in a quiet environment at typical conversational levels (calibrated to peak values of around -12 dBFS). I have, therefore, set up a simple test to estimate real-world self-noise levels.
My setup is the same for all my tests. I use a sound-treated booth, I record with the Sound Devices USBPre (one of the quietest field recorders), into a laptop computer running on battery power (to avoid possible ground loop problems). The microphone is at the distance of about 4 feet from the laptop computer, which is the only appreciable source of noise. The laptop uses SSD storage and its fan is disabled. Some level of noise from the environment (especially low-frequency rumble) is unavoidable.
The setup is meant to introduce a source of low, but controlled of noise into the environment, and to always calibrate the pre-amplifier's gain to the same signal RMS across all the tested microphones (equivalent of RMS of 70 dB SPL, measured on a typical, sentence-long voice recording sample at normal vocal effort and conversational loudness). You can read more about volume calibration techniques here.
The resulting noise in the digital audio file is a mix of environmental sound picked up by the microphone, the noise made by the electronics of the pre-amplifier, quantization noise from the A/D converter (a negligible amount in this case), and self-noise generated by the electronics of the microphone itself. All sources of noise except the microphone's self-noise and pre-amp noise, are kept constant across all tested microphones. Pre-amplifier noise tends to be more severe at high gain settings, particularly on low-quality, inexpensive portable recorders. In this test, we measure not just the levels of noise, but also its power spectrum (FFT). Figure 1 shows a comparison of self-noise spectra of Opus 55 MkII (left panel) and Shure Beta 53 (right panel) with the SoundDevices USBPre, tested according to the above principles. The spectra allow you to study self-noise levels and energy distributions of each microphone in a real-world recording situation.
Figure 1. Comparison of ambient noise spectra of Opus 55 MkII (left panel) and Shure Beta 53 (right panel)
A/C adapter noise
Recording directly onto a laptop computer is becoming more and more common among field recordists. Laptop computers offer potentially much more control over the incoming signal, they can do sophisticated real-time processing, display real-time spectrograms, have a lot of storage capacity, enable quick file transfer to cloud storage, etc. Even the so-called "pro" laptops lack the necessary components to record high-quality audio, so most of us use dedicated USB (or FireWire) audio recording interfaces. I use Sound Devices USBPre, but there are many other such devices on the market. As far as noise is concerned, using a USB interface significantly reduces noise due to electrostatic activity inside the computer, which plagues many desktop PCs and workstations.
Unfortunately, your laptop computer's A/C adapter is a common source of broadband noise (a.k.a., "static"), which inevitably leaks into your recordings. I have come across many A/C adapters that generate unacceptably high levels of noise. The adapters are typically based on the switched-mode power supply design, which has the unfortunate disadvantage of creating electromagnetic interference. Figure 2 shows a spectrum of inherent-noise of a laptop computer running on battery power (left panel) and with an A/C adapter plugged in. As you can see, the adapter adds a significant amount of noise (almost 3 dB) to the recorded signal. The obvious solution is to record on battery power only, or purchase a high-quality A/C adapter for your laptop computer.
Figure 2. Comparison of laptop inherent noise with battery power (left panel) and an A/C adapter (right panel)
Sixty-Hertz hum (or 50 Hz outside of the US) results from interference from electrical circuits and may enter the recording either through electromagnetic induction from power lines, or via the recorder's own power supply (ground-based hum). Once recorded, 60 Hertz hum is difficult to remove without erasing some of the low-frequency components of speech. Figure 3 shows a spectrogram of the English word “bag” recorded in the presence of 60-Hertz hum. The hum is represented in the spectrogram by the thin, dark band along the bottom of the display window.
Figure 3. Recording of the word “bag” obtained in the presence of 60 Hertz hum
Let us consider the following four sources of unwanted ambient noise in a field recording, each WAVE sound file is 15 sec. long, at 10,000 Hz and 16-bit mono:
Each noise type has not only a different source, but also different acoustic characteristics. Of course, while we are concerned with the overall level of unwanted noise at the site of recording, we should also be aware of the spectral envelope of each type of noise. Such information will help us choose the appropriate strategy for dealing with noise. Figure 4 shows long-term average spectra (LTAS) of the four noise files. LTAS shows us the spectral envelope of each file averaged over time, which makes a description that is more characteristic of the file as a whole. Each noise source has distinctly different characteristics. The police siren and the air conditioner have a strong spectral focus around relatively narrow frequency regions, along with harmonicity (see Figure 5), while the orchestra and cafeteria have more of a broadband distribution. The interesting thing about the cafeteria noise is that it has an envelope closely resembling the -12 dB / octave slope of the speech spectrum. Also, take a closer look at the overlapping of formant tracks and the siren's fundamental frequency with formant tracks of the speech signal in Figure 4. Download the original WAVE file here.
Figure 4. Four sources of noise - each with different spectral characteristics
Figure 5. Speech and siren noise mixed at 70 dB each; note the overlap of the siren fundamental with formant tracks
Noise reduction strategies
What level of noise is acceptable?
Noise can originate from the immediate recording environment as well as the recording equipment itself. Eliminating unwanted noise, though difficult, is not impossible, and special care must be taken to understand the nature of noise and to prevent noise from leaking into the recording. The recordist should spend some time before the recording session to auditorily examine the location. With some practice, common sources of noise can be recognized and evaluated. If in doubt, a sound level meter (SLM) can be used to more accurately assess the level of ambient noise. A typical SLM is a small, hand-held device designed to measure ambient noise (see this RadioShack model on the left). SLMs vary in terms of the scale of operation, as well as accuracy and noise weighing options. Most SLMs measure include the dB SPL scale. One should try to find an SLM that has a low threshold, of at least 50 dB SPL. Some higher-end models can also perform a simple frequency analysis (e.g., 1/3-octave band analysis). Frequent use of an SLM can undoubtedly improve one's success and consistency in detecting excessive noise, especially ambient noise that is "embedded" in the background and that we often fail to notice. Because noise is so variable and recording techniques so diverse, there is no cut and dry rule as to what acceptable SNRs are. Yes, I know it is disappointing that I cannot give you at least a range of values, but there does seem to be an art to dealing with noise, and I am sure you will develop this art with practice.
Active noise reduction
There is a common belief in the recording industry that some types of noise can be dealt with quite effectively. Directional shotgun microphones, aggressive noise filtering, as well as a host of post-production digital signal processing (DSP) techniques can be used to clean up recordings. However, very few such methods apply to speech research, as they effectively alter the recorded signal, thus making it unreliable for acoustic analysis. There is a great marketing opportunity for pro audio hardware and software manufacturers to develop products that offer "the ultimate" noise reduction solutions. I would encourage to try not to succumb to the temptation. My advice would be to avoid any form of DSP-based noise reduction or active noise canceling products and, instead, try to modify your recording technique to maximize your SNR. You may find some of the ideas discussed below useful in accomplishing this goal.
Microphone placement as a noise reduction technique
Regardless of microphone type, significant improvements in SNR can be achieved with proper microphone placement. The closer the microphone is placed to the talker's lips, the stronger the signal, and the lower the noise. A head-worn microphone will achieve the highest SNR when placed at about 3-6 cm from the talker's lips, slightly off to the side of the mouth. A hand-held microphone should be put on a stand and should be placed no further than 15 cm from the talker's lips. Often, a gooseneck adapter can be mounted on a standard table-top stand, which allows more flexible and precise microphone placement, without the need to use a full-size, boom microphone stand. A lavalier microphone should be clipped onto the subject’s clothing (looping the cord through the tie-clip) no further than 20 cm from the talker's lips (see my post on microphone placement for more information).
In situations where noise levels are particularly high (e.g., in a busy college cafeteria) one might consider using a cardioid head-worn microphone, such as the Sennheiser HMD25-1, as its use will eliminate most of extraneous noise, while providing a high-fidelity recording with a relatively flat frequency response. Note that there are very few directional, close-talking microphones that will fulfill the requirement of capturing clean, uncolored sound. The Sennheiser HMD25-1 has been designed to be used in the broadcast industry. It is a low-impedance, low-sensitivity dynamic microphone, so it will require a quiet, high-gain preamplifier to work at its best. It has very good off-axis noise rejection and its low-end response has been tailored specifically to minimize proximity effect. The HMD25-1 (in fact the HMD414, which is the microphone unit that is part of the headset) has been a standard reference microphone in machine speech recognition research for some time (e.g., DARPA-related projects). The HMD25-1 headset typically comes with unterminated headphone and microphone leads, so it might be best to ask your supplier to terminate the leads for you before shipping (XLR for the microphone, and 1/4 inch for the headphones). For more information, see my review of this headset.
If you ask about noise at your local pro-audio dealer, you are likely to hear that you should use a so-called "shotgun microphone." (Figure 6) Shotgun microphones (a.k.a., "boom microphones") are designed to have a hyper-cardioid polar pattern and relatively high sensitivity. One of the common misconceptions is that a shotgun microphone is like a zoom lens on a camera and that it brings sound closer to the microphone. Some manufacturers even refer to boom microphones as "zoom microphones." The truth is, however, that a shotgun microphone works more like a camera with a cardboard tube fitted around a standard lens. It obtains a narrower "field of view", but the microphone-to-source distance, of course, remains unchanged. This type of design does nothing more than help reject (or attenuate) a great deal of off-axis noise, thus making an auditory impression that that the signal is somehow brought closer and amplified. One should bear in mind that the inverse square law guarantees a loss of approximately 6 dB of signal level per the doubling of the distance from the sound source. Still, using a shotgun microphone, particularly if we can use the help of a skillful boom operator, may be a good option for recording in noisy environments, especially if we are concerned more with the overall intelligibility of the recording rather than acoustic analysis per se (e.g., most oral history projects).
Figure 6. Azden ECZ-990 microphone with a 1/8-inch interface and a holder compatible with a camcorder "hot shoe." This microphone would be a very good replacement option for built-in omnidirectional microphones of a voice recorder or a digital camcorder; image courtesy of Azden Corp.
Reducing low-frequency noise
What is low-frequency noise?
Some low-frequency noise can be easily heard in the recording environment. Refrigerators, air-conditioners, highway traffic, elevators, and electric fans are the most typical culprits. If possible, one should avoid recording in the vicinity of such sources of noise. Low-frequency, periodic noise shares the acoustic spectrum of recorded sounds with important frequency components of speech, such as F0 and F1. Noisy, highly attenuated recordings pose a special challenge to acoustic analysis, particularly when detailed analysis of voicing or nasalization is required.
One of the most effective methods of dealing with 60-Hertz hum is using battery-powered equipment. While selecting field recording equipment, it is important to make sure that it will work on battery power and will provide good battery life. Ideally, all elements of the recording circuit should be connected via a balanced XLR interface. If only an unbalanced interface is available, short, high-quality cables should be used. When using a stand-alone microphone preamplifier, it is best to select one that has balanced line outputs (e.g., Sound Devices MixPre). In addition, it might be a good idea to purchase a cable tester and a hum eliminator. Cable testers can be used to check cables for possible defects as well as to determine the type of connections available on recording equipment. Some cable testing devices can also be used as tone generators, which can be very useful in calibrating recording devices (see my post on tone calibration). Hum eliminators are just as useful. When connected between a preamplifier and a recorder, they can effectively detect and filter out 60-Hertz hum. If all of the above methods fail, a low-cut filter might be used. One of the best hum eliminators I have tested is the Ebtech Hum-X. You can read my review here.
If hum is not caused by a ground-related problem, a hum eliminator will not solve the problem. One of the simplest solutions is to "ground" the circuit by using your own body. You can use an anti-static wrist strap and connect it to a metal part of your laptop (e.g., an unused printer port). I have used this method on a few occasions and I keep the wrist strap in my equipment bag at all times.
Using a low-cut filter to reduce low-frequency noise
Some quality preamplifiers have a built-in analog high-pass filter, also known as low-cut or bass roll-off filter that will prevent low-frequency components from being recorded (see, for example, my review of Sound Devices USBPre for an explanation of how such filters work). The use of a high-pass filter is generally not recommended, as it alters the original signal, but when no other option is available, it may be possible to use a high-pass filter to good effect. Figure 7 shows a 6 dB/octave high-pass filter response graph. Frequencies that are attenuated by less than 3 dB are said to be within the filter's passband, while those attenuated by more than 3 dB are in its stopband. The frequency at which the signal is attenuated by exactly 3 dB is called the cut-off frequency (in this case 100 Hz).
Figure 7. A typical response of a high-pass filter used on field microphone pre-amplifiers, such as the Sound Devices MixPre
Winholtz and Titze (1997a) recommend that a 60 Hz 24/dB octave high-pass filter be used at all times (at least in a laboratory situation) to filter out some of the low-frequency building rumble. However, their application of this filter is rather unorthodox because the recording is first captured unfiltered on DAT tape and then high-passed filtered as the signal is re-digitized by a stand-alone analog-to-digital converter (ADC) while being played out of analog outputs of the DAT deck (also, see 3.2). Typically, a high-pass filter should be used with the preamplifier (in a so-called "pre-fader" mode) or, if it is not available, a digital (DSP) filter may be used after the data stream has been encoded in a digital audio file.
Dealing with noise reduction is a little bit like having healthy eating habits. Try to use the most natural technique (e.g., microphone placement) and avoid heavily processed solutions (e.g., active noise reduction). Even though the sentence above is meant to be funny, noise is certainly no joke!