Perceptual Audio Demonstrations
A number of demonstrations were created to help instructors teach their students about the human auditory system and its limitations. In order to understand the demonstrations and the concepts behind them, some background material and detailed explanations are given for each topic.
Decibels are defined as a logarithmic unit of sound intensity. Mathematically a decibel (dB) is defined as:
Where P1 and P2 are powers in Watts. Decibels can also be calculated using amplitudes. The above equation becomes:
Where A1 and A2 are amplitudes.
A logarithmic scale provides a relative measure of sound intensity. Based on powers of 10, decibel units allow a manageable range of numbers to represent the wide range of the human auditory response. The other reason a logarithmic scale is used when discussing sound is due to the fact that the human auditory system works loosely on a logarithmic scale. Our ears respond to ratios rather than differences, with the smallest perceptible change in loudness that the human auditory system can distinguish being roughly 1 dB.
Based on this information a series of three sound clips were developed to demonstrate decibel scales. This gives the student an opportunity to become familiar with incremental steps in hearing and to get familiar with the decibel concept that is used throughout the compact disc.
In the first demonstration, track 1, we hear a 440 Hz tone (A4 on the musical scale). The tone is then reduced in 1 dB steps. On tracks 2 and 3 the tone is reduced in steps of 3 dB and 5 dB respectively. Students will notice only minor changes in loudness in the first demonstration. The second and third tracks will allow them to hear more significant changes in loudness.
Intensity vs. Loudness
Sound intensity is a measurable quantity relating to acoustic energy whereas loudness is a subjective quality affected by the human auditory response to sound.
The Sonic Research Studio at Simon Fraser University defines sound intensity as “The sound energy transmitted per unit time through a unit area, thereby being a measure of the magnitude of a sound.” This ‘magnitude of sound’ is measured in Sound Pressure Levels (SPL) and is defined as:
Where Pref = 20 µPa = 10-12 W/m2. It is also known that the intensity of sound decreases according to the inverse square law, where r is the radius from the sound source.
So, how does sound intensity relate to perceived loudness? As mentioned above, loudness is subjective. What seems twice as loud to one person, may not be twice as loud to another. In the 1930s, two employees at Bell Labs, Harvey Fletcher and Wilden Munson, wanted to make a correlation between sound intensity and loudness. They asked a group of people to judge when pure tones of two different frequencies were the same loudness. They averaged their results and came up with a graph now known as Fletcher-Munson curves (Figure 1).
Figure 1. Fletcher-Munson Loudness Curves.
The Fletcher-Munson curves are also known as equal loudness curves. Any frequency of given intensity along the curve is ‘as loud’ as any other frequency on the curve. These loudness levels are referred to as phons. Looking at the 10 phons curve, a 20 Hz tone at an intensity of approximately 75 SPL will sound as loud as a 1000 Hz tone at 10 SPL.
To illustrate the relationship between intensity and loudness based on the Fletcher-Munson curves, a demonstration was created that plays various frequencies at a constant SPL. Track 4 shows that the perceived loudness of tones varies at equal sound intensity.
A second demonstration was created to further illustrate the concept that loudness is perceived. In this demonstration students should be asked to pick the tone that sounds twice as loud as the reference tone. On track 5 a reference tone is played and then the same tone is played 5 dB higher. This is followed by the reference tone and then the tone 8 dB higher and finally the reference tone and then the tone 10 dB higher.
When referring to sound and music the term pitch has two common uses. Pitch is considered an attribute of the human auditory system as well as a synonym for the term frequency. Although these two definitions may seem incongruous, it is possible to make a connection between the two definitions.
The ANSI definition of psychoacoustical terminology considers pitch the “auditory attribute of sound according to which sounds can be ordered on a scale from low to high.” This definition envelops both usages. The human auditory system makes distinctions between frequencies (pitches) and can order them on a scale from low to high. In this case there is no inconsistency.
If we consider both definitions when discussing pitch the question becomes, “How do we quantify the perceptual attribute of pitch?” Pitch is usually linked to the fundamental frequency of a pure tone, but what happens if the time duration of the tone prevents it from being a “sound that can be ordered on a scale from low to high?” In what time frame does the tone change from a noise to something with pitch?
Three demonstrations were created to illustrate pitch as an auditory phenomenon relating to frequency. On tracks 6, 7, and 8 we investigate how pitch is perceived in time. Each track plays time bursts of sound. Initially the tones will sound like pops or blips, but as the tones increase in time duration they will begin to have pitch. Each demonstration uses a different frequency of tone.
Timbre is defined in many ways. The American National Standards Institute defines timbre as, “that attribute of auditory sensation in terms of which a listener can judge that two sounds, similarly presented and having the same loudness and pitch, are different.” Musically, timbre is defined as “the quality of a musical note which distinguishes different types of musical instrument.” And finally, timbre is defined as “everything that is not loudness, pitch, or spatial perception.”
Those three definitions cover the concept of timbre very well. When two sounds of the same loudness and pitch are played we can distinguish between them. Everyone will agree that a flute and a saxophone playing the same song don’t sound the same. This is due to the difference in timbre of each instrument. Timbre can be based on physical characteristics of instruments, such as airflow, embrasure, and many others. On track 9 you will hear examples of differences in timbre.
Binaural masking is simply the internal masking done by the human auditory system due to the fact that we have two ears, one on either side of the head. The processing and mapping of sound in the brain is still relatively unknown, so the best way to explain what binaural masking is, is simply to demonstrate it. Headphones are required.
Five demonstrations have been created. The first track in this series, track 10, is a 440 Hz reference tone of one second. The next track, track 11, plays the same reference tone added to some noise in the left ear, and nothing in the right ear. The third demo, track 12, is the sine wave plus noise in the left ear and noise in the right ear. The fourth, track 13, is the sine plus noise in the left ear and the same (in-phase) sine plus noise in the right ear. Finally, track 14 is the sine plus noise in the left ear and the sine (out-of-phase) plus noise in the right ear. Figure 2 illustrates the four scenarios from above.
Figure 2. Binaural Masking Effects (Moore, 1964).
Although slightly out of order, the figure above attempts to describe what the listener will hear on tracks 11 through 14. Comparing track 11 and track 12 you should find that the sine wave is much easier to distinguish when noise is played in both the left and right ears. Again, exactly what is happening is unknown, but internally there seems to be some form of cancellation. If you compare tracks 13 and 14 you should notice that the out-of-phase sine wave is easier to hear than the in-phase sine wave. What is the possible or probable cause? The ears tend not to receive signals at exactly the same time. Through years of mapping phase-shifted sounds it could be that the human auditory system does not effectively know how to deal with receiving in-phase tones.
Grouping and Segregation
The human auditory system groups and separates sounds in a very similar fashion to the human visual system. Due to this similarity, visual aids can help explain how the brain groups and segregates sounds.
Apparent motion is a phenomenon that occurs both visually and acoustically. Animation is an example of visual apparent motion. Animation is just a series of drawings viewed in rapid succession. The brain groups these images together and interprets them as a single moving image.
Apparent motion is possible in the auditory domain as well. The brain will group notes as a single melody if the notes are alternated in time at a slow rate. Figure 3 illustrates this visually. This is also acoustically demonstrated on track 15 of the Audiobox CD.
Figure 3. Grouping and Stream Segregation A.
On the audio demonstration four notes (C4, G4, F4, B3) are alternated slowly in what sounds like a melody. The human auditory system thinks the sounds belong together, and groups them in a single group of four notes. As the time delay between notes is decreased the brain begins to group the sounds differently. Figure 4 shows the new grouping of the four notes.
Figure 4. Grouping and Stream Segregation B.
On track 16 the notes are sped up and we begin to hear rhythmic beats played as a melody. The auditory system is now hearing two groups of two notes.
Finally on our last track, track 17, the time delay is decreased again. We no longer hear a melody, we only hear the rhythmic beats. Our auditory system is now hearing four groups of one note each. This is illustrated in figure 5.
Figure 5. Grouping and Stream Segregation C.
Another audio phenomenon that falls under the Grouping and Segregation umbrella is separation. The human auditory system separates sounds based on some of the attributes mentioned earlier, mainly pitch and timbre. It also separates sounds based on amplitude.
To illustrate separation phenomenon, a series of demonstrations were created. Pitch, timbre and amplitude separation are all covered. When two songs of the same pitch are interleaved our brain is unable to distinguish between the two melodies. Our brain requires some noticeable difference in the melodies, such as a shift in pitch, timbre or amplitude to distinguish between the two.
In the separation demonstrations “Camptown Races” and “Yankee Doodle” have been interleaved and played at the same pitch. Track 18 begins with the two melodies at the same pitch. Each time the interleaved melody is played one of the songs is shifted in pitch until eventually the two melodies become distinguishable.
The second demonstration, track 19, plays the two melodies at the same pitch, but at different timbre. The two melodies are distinguishable instantly. The third demo, track 20, adjusts the amplitude of the two songs while leaving the pitch constant. All three demonstrations help to solidify the concept of separation that takes place in the human auditory system.
The masking of tones happens in both the frequency and the time domain. This distinction is an important one to make when discussing masking and the limitations of the human auditory system, particularly if continuing onto advanced topics such as MP3 encoding.
Time domain masking (forward and backward masking) happens when a tone is masked by a previous sound or by a sound that occurs shortly after the tone. Forward masking is more powerful than backward masking, meaning the time duration, t, between the masking and masked tone can be larger for forward masking than backward. The time duration for forward masking can be as large as 20-30 ms, while t for backward masking can not be much larger than 10 ms. See figure 6.
Figure 6. Time Domain Masking.
The first forward masking demonstration, track 21, plays a masking tone and then a tone that is semitone down with a 100 ms delay in between. Notice that you can hear both tones even though the second tone is decreased in 3 dB increments. The second forward masking demo, track 22, plays the same two tones with a time delay of 10 ms. Masking occurs in this demonstration. How many steps are audible before the second tone is masked?
The backward masking demonstrations illustrate a similar phenomenon. The initial tone is going to be masked by the tone that follows. In the first demonstration, track 23, a time delay is set to 100 ms. You should be able to hear the first tone throughout. The time delay is then decreased, but still above the 10 ms range (track 24). This is a grey area. Does masking occur in this demonstration? Finally, in the third demo, track 25, the time delay is below 10 ms. Masking occurs. How many steps are audible?
As mentioned earlier, masking in the frequency domain also occurs. Pure tones that are close together in pitch mask each other better than those widely separated. To demonstrate this fact, a single tone is played, followed by the same tone and a higher frequency tone. The higher frequency tone is reduced in intensity first by 12 dB, then by steps of 5 dB. The sequence above is repeated twice. The second time the frequency separation between the tones is increased (track 26).
Pure tones also mask higher frequencies better than lower frequencies. Track 27 and 28 attempt to mask high and low frequencies respectively. The higher frequencies will mask more easily than the lower frequencies.
Finally, a tone of greater intensity masks a broader ranger of tones than a tone of less intensity. This is demonstrated on track 29. A single tone is played, followed by the same tone and a higher frequency tone. The higher frequency tone is reduced in intensity first by 10 dB, then by steps of 3 dB. The sequence above is repeated twice, the second time increasing the intensity of the single tone by 28 dB.
Many, if not all of these masking phenomena are exploited in mp3 encoding. From a teaching perspective it is suggested that these demonstrations be used prior to teaching mp3 encoding. It is essential to understand the basic techniques begin used before ‘mathing up’ the concept.