 The Fourier transform only looks at the frequency response of a signal as a whole. It doesn't account for the fact that frequencies come and go over time. When we speak, pronouncing the letter S generates a higher set of frequencies than pronouncing the letter O. But the Fourier transform doesn't tell us when the S and the O occur within a signal. To solve this issue, voice recognition algorithms make use of the short-time Fourier transform. The signal is broken down into short blocks and the FFT run on each block. The frequency responses of the individual blocks are then laid out side-by-side on a graph, with time on the x-axis, frequency on the y-axis, and the intensity of each frequency denoted by the colour of the trace. The voice recognition algorithm then compares each block with a frequency response dictionary of known sounds to work out what you are saying.