 The discussion of how humans process the speech input signal requires a precise understanding of the analysis of sound waves. And that's the focus of this short e-lecture. We will first of all look at simple and then at complex sounds, discuss the phenomenon of resonance and eventually say something about the variation of the speech signal. Well, let's start with a very simple form of sound that is represented by a simple tone. For example, an electronic noise that represents standard pitch A. This is what it sounds like. Now, in order to represent such a sound visually, we have two types of representation. On the one hand we have the so-called waveform view and the second representation is the spectrogram or the spectrographic view. Now, what do these views display? Well, the waveform view is a two-dimensional view where you have time information on the horizontal axis. Within this we have frequency information by means of cycles per time unit. And then we have the amplitude on the vertical axis. The spectrographic information is three-dimensional. We have time information again on the horizontal axis, normally expressed in terms of milliseconds. We have the frequency information in terms of hertz on the vertical axis. And the third dimension in inverted commas is the amplitude displayed in decibels by means of different colors or by means of different degrees of darkening. The darker, the more intensive. Now, what can we see on this spectrogram? Well, we can see precise frequency information. For example, we can see that our simple tone involved the frequency of, and we know this value, 440 hertz. This is the frequency of our sound input. Well, and this sound input frequency is referred to as f0 or f0 or in full you call it fundamental frequency. Now, if we enlarge these two views, then you will see that an enlargement of the spectrogram doesn't really help. It's the same information. If we enlarge the waveform view, we can see the cycles per time unit. So this would be one cycle here. And you see the amplitude in terms of the distance from the maximal elongation to the zero line. What happens if we feed the simple tone that is produced by a sound source through a resonating body? For example, the body of a guitar or the vocal track or the tube of a flute. Well, let's take my flute as an example. This is what it sounds like when resonances are created by the flute body. Oh, I've been carried away a little bit. So let's just play and record one single sound, the standard pitch A again. Now let's look at the acoustic analysis of it. Well, here we are again. The waveform view to the left with an enlargement and the spectrographic information. Well, what can we see? Now the waveform displays a complex sound wave, which is fully periodic. The periodicity you can see here in the enlargement. This is one cycle again. But you see it's now a combination of several simple sounds. However, apart from the intensity information and if we zoom in the precise amplitudes, it is similar to that of a simple sound wave. The spectrographic information, the spectrogram is different. It does not only display the fundamental frequency, which is here, f0, 440 Hz again. But it also displays multiples of the fundamental frequency. So here are the multiples. This is one, this is one, this is one and this is one. And furthermore, we have additional effects such as the representation of noise, which was probably created by turbulence airflow created at the whole of the flute here. Now since frequency information and information about additional sound effects, noise, pulses, etc., is fundamental for the phonetic analysis of speech, the spectrographic information is the primary means of acoustic analysis in phonetics. But what about these additional frequencies, these here, these multiples? Well, whenever a sound is not just a simple input signal, but it is filtered, intensified and damped by the parts of the resonating body, then we get these resonance frequencies or, as you call them, formants or formant frequencies. Now the formants are, well, if you wish, the echoes from the resonating body. Let us now compare our musical instrument with speech. I have produced the following two vowels and analyzed them acoustically. E and R at 440 Hz. E, R. Now here is the analysis. Again, the values for the fundamental frequencies are the same. At least I try to use similar tones, but the formants have different values. This is due to the different shapes of the resonating body that is my vocal tract. So here are our F0 value. We know it is about 440 Hz. At least I try to produce that. Then we have the value for F1, which in the E is part of the big bundle here at the bottom, whereas for R it is much higher. It is about here, around 1000 Hz. And then we have our second formant F2, which in the E is extremely high. And in the example of an R it is somewhere in the middle. Now how can we explain that? Now let us imagine the following. Let us imagine the vocal tract to be analyzed just like the resonating body of a flute. We have the sound input here at this end of the flute and we have the orifice here. So we have exactly the same in the vocal tract. This is our sound input and this is the orifice, our sound output. Now what happens if we produce an E? Well, something like this. We produce a constriction, which is very much at the front of the mouth. In other words, we produce a very large back cavity and a relatively small front cavity. The situation is different if we produce an R. Then the constriction occurs in the middle with a back cavity and a front cavity, relatively equal in size. So the interesting thing is that we can now associate front and back cavity with parts of the vocal tract. The back resonance cavity is associated with F1. You see everything in red here has to do with F1. What about the front cavity? Well, that is associated with F2. With this information, we can now create patterns for all vowels. If you produce an E, you have a very large back cavity. That is a very low frequency for F1 and a very small front cavity, which means you create a high value for F2. If we now use real present-day English words, then the format patterns are even more complex. Well, here I produce two words, C and CAR, standard words of present-day English. This time I didn't use the 440 Hz fundamental frequency input value, but I produced a sound which can be used in normal speech. Something like C and CAR with a falling tone in each case. And that is represented in my fundamental frequency, which is somewhere here, F0, somewhere here, F0. Again, a value between something like 180 Hz down to 120 Hz. Now, F1, our first formant in E, well, as we know, is probably here. It's a very low value. For R, it's somewhere here. And then we have a very high value for F2 in C in the E as in C, and well, a medium value for F2 in CAR. Now, these spectrographic representations involve some sort of complication. For example, here we have friction noise, we have an alveolar fricative, here we have a vela plosive, you see this little spike here, and the puff of air which represents the aspiration component. Well, and this phonetic context seems to influence the representation of the formant patterns of the vowel. Well, this is even more complex if we look at connected speech or longer passages. So, here we have a complex stretch of speech, for example, in this case, the virtual linguistics campus. And now we have the whole vocal tract working as one resonant system. The fundamental frequency of vocal fold vibration varies between 100 and 200 Hz, whereas the formant patterns of vowels can be identified, there are additional problems. Speech sounds can often not be clearly identified. For example, where is the E here? Well, is this really the E or is it the E plus the vela nasal as in linguistics? Speech sounds are influenced by their acoustic context. Well, as we see, for example, here the formants, they seem to emerge from a certain position, so they are curved in a way, so they're not straightforward as in our steady state vowels. And then we have additional complications such as background noise, distortions, missing segments. All this complicates the acoustic analysis of speech. And then there are further problems. Speakers have different vocal tract shapes. They use different input sources, women versus men, for example. And as already pointed out, the phonetic context influences the formant patterns. Well, these phenomena of variation will be discussed in another e-lecture about the nature of the input signal.