Vaia - The all-in-one study app.
4.8 • +11k Ratings
More than 3 Million Downloads
Free
Americas
Europe
Linguists who specialize in Phonetics frequently analyze speech sounds using spectrograms. Spectrograms are useful for linguistic analysis because they allow you to see multiple speech signals simultaneously. For example, you can see component frequencies, glottal pulses, voicing, vowel formants, and place of articulation, all on a single spectrogram. With a little practice,…
Explore our app and discover over 50 million learning materials for free.
Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken
Jetzt kostenlos anmeldenLinguists who specialize in Phonetics frequently analyze speech sounds using spectrograms. Spectrograms are useful for linguistic analysis because they allow you to see multiple speech signals simultaneously. For example, you can see component frequencies, glottal pulses, voicing, vowel formants, and place of articulation, all on a single spectrogram. With a little practice, you can even estimate what a speaker is saying just by reading the spectrogram.
As valuable as spectrograms are, they can initially be a bit overwhelming. To understand what's going on in a spectrogram, you need some background information.
A spectrogram is a graph of a sound wave's component frequencies over time. Component frequencies are the range of frequencies present in the sound.
To clarify, when you hear a single sound, you're really hearing lots of different frequencies stacked on top of one another. These stacked frequencies are the wave's components, and the lowest component is the pitch you hear (also called the fundamental frequency, or F0).
Fig. 1 - The spectrogram shows time on the x-axis, frequency on the y-axis, and amplitude as differences in color or darkness.
A spectrogram shows time on the x-axis and frequency on the y-axis. That means the bottom of the spectrogram is the lowest frequency, and the top is the highest frequency. Moving left to right on the spectrogram represents moving forward in time.
A spectrogram also shows a third dimension: amplitude (loudness). Differences in amplitude are shown as differences in color or darkness on the spectrogram. The darker lines are frequencies with higher amplitude, while the lighter areas are frequencies with lower amplitude.
The word spectrogram comes from the word spectrum.
A spectrum is a plot of a wave's components at a given point in time.
You can think of a spectrum as a single snapshot of a spectrogram. If you want to think about it another way, a spectrogram consists of lots and lots of spectra lined up next to each other. Each large "spike" visible on the spectrum is one of the darker horizontal lines visible on the spectrogram.
Fig. 2 - A spectrum is like a slice of a spectrogram laid on its side, with frequency on the x-axis and amplitude on the y-axis.
There are two types of spectrograms: wide-band spectrograms and narrow-band spectrograms.
The most common type of spectrogram used for analysis is a wide-band spectrogram. This kind of spectrogram looks "fuzzier," with lots of vertical lines. In speech, these vertical lines represent glottal pulses: the repeated opening and closing of the glottis. These glottal pulses represent voicing in speech sounds. A wide-band spectrogram helps you to see how a sound changes over time.
To view a wide-band spectrogram in your analysis software, set the "window length" to 0.005 s.1
A narrow-band spectrogram looks like a series of thin horizontal stripes, sort of like a filet of fish. These thin stripes are the wave's components. On a narrow-band spectrogram, it's easy to see the differences in amplitude between the individual components.
To view a narrow-band spectrogram, set the "window length" to 0.05 s, or even 0.5 s.1
Fig. 3 - The same audio clip looks different on a wide-band vs. a narrow-band spectrogram. The graph above the spectrogram is the waveform of the sound.
It's possible to estimate what a person is saying just by looking at the spectrogram of the utterance. You'll get some hands-on practice with this in a bit. In the meantime, here are some signals that linguists look for when analyzing a spectrogram.
These signals don't tell you everything about an utterance, but they can help you make educated guesses.
Remember those dark horizontal stripes that you see on the spectrogram during vowels? Those stripes are the vowel's formants. Relative formant values help you determine the vowel's place of articulation, or the position of the vocal tract when producing the vowel. The most relevant formants for linguistic analysis are the first three formants: F1, F2, and F3.
Fig. 4 - The red lines on this spectrogram indicate the vowels' formants.
The lowest formant, F1, tells you inversely how high a vowel is. The lower the F1, the higher the vowel. F1 is the dark line closest to the bottom of the spectrogram. The high vowels are sounds like [i], as in bee or sheep, or [u], as in soup or blue. These vowels will have the lowest F1 value. Low vowels are sounds like [a], as in box or party. These vowels will have the highest F1 value.
Vowel height refers to how high the tongue is in the mouth when producing a vowel. If you pay attention to the position of your mouth, you can feel that your tongue is higher when you say sheep than when you say shop.
The next formant, F2, tells you how far back a vowel is. The lower the F2, the further back the vowel. The frontmost vowels are sounds like [i] and [e], as in plate. These have the highest F2. The back vowels are sounds like [u] and [o], as in pole or order. These have the lowest F2 value.
Backness refers to the horizontal position of the tongue when producing a vowel. If you say the word boot, you'll notice that your tongue is pushed toward the back of your mouth and that the back part of your tongue carries the most tension. Compare that to the word beet, where your tongue is pushed forward and the front part of your tongue is tense.
This table summarizes the relative F1 and F2 values for the five vowel sounds present in most languages.
Vowel | F1 Value | F2 Value |
i (high front) | low | high |
e (mid front) | mid | high |
a (low mid) | high | mid |
o (mid back) | mid | low |
u (high back) | low | low |
The next highest formant is F3. F3 doesn't tell you much about most vowels, but it plays a unique role in r-colored vowels. R-sounds, like in the general American pronunciation of bird, have a very low F3 value compared to other sounds. This makes these sounds easy to spot on a spectrogram.
You might notice that a fourth formant line is visible on a spectrogram. Higher formants, including F4, F5, etc., appear in speech sounds. However, these formants don't reveal as much about speech sounds as F1-F3 and are not commonly considered in linguistic analysis.
Lastly, formant transitions can help you identify the place of articulation of neighboring consonants. The formants of a vowel change as a speaker moves from one consonant to the next. The direction of these formant changes can help you determine where the consonants occur. For example, moving from a vowel into a [k] sound would result in a rising F2 and a lowering F3 (this is called a "velar pinch" on a spectrogram).
Now for some practice analyzing a spectrogram. The example spectrograms in this explanation have all visualized the same utterance. Zoom in on the first quarter of the utterance: what do you see?
Fig. 5 - You can guess what sounds you're looking at by analyzing certain signals on the spectrogram.
You've made some educated guesses—what word are you looking at here? As it turns out, this spectrogram shows a speaker saying the word Mary!
Try repeating this analysis on the rest of the utterances for some extra practice! You can see the answer below.
Fig. 6 - The segments of Mary loves raspberries, annotated both in the Latin alphabet and the International Phonetic Alphabet.
This spectrogram shows a speaker saying Mary loves raspberries!
A spectrogram is a graph of a sound wave's component frequencies over time. Component frequencies are the range of frequencies present in the sound.
A spectrum is a plot of a wave's components at a given point in time. You can think of a spectrum as a single snapshot of a spectrogram.
There are two types of spectrograms: wide-band spectrograms and narrow-band spectrograms. A wide-band spectrogram helps you see how a sound changes over time, while a narrow-band spectrogram helps you see the differences in amplitude between components.
The vowel formants visible on a spectrogram help you see the place of articulation of vowels. Formant transitions between consonants help you see the place of articulation of consonants.
Spectrograms are useful for linguistic analysis because they allow you to see multiple speech signals at once. For example, you can see component frequencies, glottal pulses, voicing, vowel formants, and place of articulation all on a single spectrogram.
Flashcards in Spectrogram16
Start learningWhat is the definition of a spectrogram?
A spectrogram is a graph of a sound wave's component frequencies over time. Component frequencies are the range of frequencies present in the sound.
The "pitch" you hear in a sound wave is the wave's _____.
lowest component
The perceived pitch of a wave is also called the _____.
fundamental frequency
What dimension is on the x-axis of a spectrogram?
time
What dimension is on the y-axis of a spectrogram?
frequency
What dimension is shown on a spectrogram as differences in color or darkness?
amplitude
Already have an account? Log in
The first learning app that truly has everything you need to ace your exams in one place
Sign up to highlight and take notes. It’s 100% free.
Save explanations to your personalised space and access them anytime, anywhere!
Sign up with Email Sign up with AppleBy signing up, you agree to the Terms and Conditions and the Privacy Policy of Vaia.
Already have an account? Log in