 The technology which allows you to carry a million songs around in your pocket, also allows apps to quickly identify a piece of music and to make this video compact enough to stream even in high resolution. What might surprise you is that the foundation for all this was laid down by a French scientist called Joseph Fourier about 200 years ago. Before I get to that, I'm afraid I'll have to take you back to high school for a moment. I know, I know, but bear with me. In mathematics, a function is like a recipe. It is a rule to take a number, referred to as a variable, perform some operations on it and return a new number, the output. One example of a function will take a temperature in Celsius, multiply and add to it, and then return the temperature in Fahrenheit. Usually we denote the input variable by x, so the function x squared plus one means take a number, multiply it by itself, and add one. We can get a curve from a function by evaluating it for a bunch of different values of x, marking the results as points, and then joining them together. If once we have evaluated the function, we add another step to multiply or divide by a number, this will stretch or squash the curve in the vertical direction. If we multiply or divide x by a number before it goes through the function, it will stretch or squash the curve in the horizontal direction. One important mathematical function is sine from trigonometry, which takes an angle and returns a ratio to do with the sides of the triangle. That doesn't matter in this case, but what is interesting is the curve it makes. It looks like a wave with peaks and troughs. Just as before, we can use two numbers to manipulate this wave. One will stretch it vertically, usually called the amplitude, the height from the middle up to the peak or down to a trough. The other will stretch it horizontally, usually called the frequency, namely how frequently the peaks or troughs appear. The higher the frequency, the more often the wave repeats itself. You are probably familiar with the idea that sound consists of waves. When this sound is made, a curve of the air pressure with time is a perfect sign. The amplitude determines how loud it is, so for example, as the sound travels further, the wave amplitudes get slower, the changes it makes to the background air pressure get smaller and smaller, and the sound gets quieter. This is the same sound, but quieter. The frequency determines the pitch of the sound, which in the last case was 500 Hz, or 500 peaks and troughs in a second. The higher the frequency, the more rapidly the wave oscillates, and you perceive a higher pitch. This is the same sound at double the frequency. You may also be familiar that electronic devices which produce sounds usually have a magnet and a coil. A voltage is varied across the coil, which pulls a membrane in and out to create the pressure waves you just heard. The software which controls the coil must receive a stream of numbers corresponding to the voltage which must be set at every step in time. This video, for example, has a sample rate of 44.1 kHz, meaning that it breaks every second into about 44,000 equal steps, and at each step sends a number to the speaker to control the voltage, or at least it did when I made the video. 44,000 steps per second with 4 bytes per step, and that's a megabyte of data every 5 seconds. This amount of data would make it quite difficult to keep songs on storage media or stream voice chat over the internet. This is how the wave file format that sound recorders often use works. You might have noticed that it produces very large files. But here is a better idea. I told you that the sound I played was a pure sine wave. To reproduce it, we can store or send just two numbers, the amplitude and frequency, and the computer can reproduce the curve by simply calculating the sine function at every point in time. The sounds I played are all well and good for something like a retro computer game, but they're not very realistic. You don't hear that sort of beep in nature. However, you can always add two mathematical functions to make a new one. Here is a curve for two signs with different amplitudes and frequencies added together. You can keep going, though. I could give you 100 pairs of amplitudes and frequencies, and you could add up all the individual functions of time that arise as a result. Tedious and boring for a human, but for a computer this is very straightforward. Here is just such a sound made out of 100 different sine waves. Sounds a bit more realistic. To store a second of it, you still only need 200 numbers, as opposed to the 44,000 we had before. So that is the mathematical bit of magic behind the MP3 file format, and why MP3 files are so compact. Rather than store the value of the sound signal at every single step in time, we store a smaller bunch of frequencies with amplitudes, and the computer generates the sound by repeatedly using the sine function. Hopefully, I've managed to explain to you how to turn pairs of amplitudes and frequencies into a sound to play it. But if you already have a sound, how do you work out what frequencies and amplitudes you need? In other words, how do you produce an MP3 file from a recording in the first place? Well, in 1822, Joseph Fourier developed the mathematical background to do this, and in 1965, James Cooley and John Tukey developed an efficient computer algorithm, which takes any signal, such as a sound, and calculates the amplitudes of sine waves required to reproduce that signal again. This type of algorithm is called a discrete Fourier transform, after arguably its main originator, although many others contributed to this work. But here's a catch. If you did break up an arbitrary signal into individual sine waves, in basic terms, you would need the same amount of information, the same amount of data, to store it. That's because to reproduce a signal perfectly, you would need lots and lots of sine waves. But in practice, you can make a pretty good approximation with only a limited amount. When I talk in a low register, you can't hear any high pitch sound, right? In mathematical terms, the signs with high frequencies have very low, almost zero amplitudes. So when recording a short clip of a sound with low pitch, only low frequencies need to be stored. The opposite is of course true if a clip had only high frequencies, such as Pavarotti stubbing his toe. When making an MP3 out of a long song or speech, it is first broken up into shorter frames and the Fourier transform is applied to each one in turn. The point of MP3s is to only keep a limited number of signs, which happen to have the largest amplitudes in each frame. Generally speaking, it works. This is why MP3s are referred to as a lossy form of compression. You do end up throwing away some of the sine waves required to faithfully recreate the original sound and therefore what you end up storing is slightly different to the original. That can make a difference if you want high audio fidelity, but usually lossy compression is good enough. With all that in mind, here is another way of looking at sound. Running up the screen, I will have an axis denoting the frequency. In the horizontal direction, I will have an axis with the corresponding amplitude. Remember the first sound I played. This was a pure sine wave with just one frequency. To represent this, we draw a curve, which takes a value of zero everywhere, except a spike at the frequency of that one sign. Now, if we have a sound made of two sine waves, one of them twice the amplitude of the other, we get a big spike and a little spike at the appropriate frequencies. I also played a sound made up of a hundred sine waves. Actually, some of the waves that made that sound were really close together in frequency, so it means that those spikes become slightly wider. In fact, that sound began as a recording of my voice. Here is what the original, with all the frequencies kept in, sounds and looks like. This is what the term Fourier Transform refers to. I've taken the wavy curve of a sound signal and transformed it into a curve of amplitude against frequency. Bear with me for a moment, but I'm going to color this curve by how large the amplitude is. The spikes where the amplitude is high become bright in color, and where the amplitude is low becomes dark. This allows me to squash the graph into a thin strip, representing just a short interval of time. If I once again have time running along the horizontal direction, I can visualize frequencies as they change in a much longer sound. For example, listen to this short piece of music as I do just that. This graph is called a spectrogram. It shows different frequencies come in and out as different notes are played. Remember that bright colors mean that waves with corresponding frequencies have a large amplitude at the times indicated on the horizontal axis. This spectrogram looks remarkably similar to the staff music notation. The first note being played is an F at about 350 Hz, and then it stops and the note G is played. You can see that the vertical spacing and frequency corresponds to the difference in pitch between the notes. With a more complicated piece of music, you could really see the horizontal difference in duration between notes of different lengths. Spectrograms are the principle behind apps which can recognize a song based on just a short clip. I mentioned how MP3s might compress sounds by keeping in only the most important frequencies. It is possible to use the way that Fourier transforms decomposer sound into its constituent sine waves to deliberately remove or filter out certain frequencies. If there is an annoying high-pitched noise, it's possible to set its amplitude to zero but then recreate the sound from the rest of the sine waves. Whole ranges of frequencies can be filtered out in this way. It also makes it easier to simply cut out periods of time with undesired noise. My videos don't have good audio quality by any measure, but I do filter them a little bit. Like many others, I use a program called Audacity. Normally, it shows sounds in waveform mode, which is just the sinusoidal signal. It only tells you how loud the sound is in a given period but nothing else about it. However, putting the view into spectrogram mode is much more instructive. My voice produces bright lines in the spectrogram whereas when I breathe out, there is a smattering of medium amplitudes across many frequencies. The spectrogram view makes it very easy to spot whenever I've taken a breath and I therefore endeavor to remove that sort of clutter from my audio. These sorts of spectral methods allow music to be auto-tuned and apparently can even be used for guiding missiles. Spectrograms should be familiar to anyone interested in radio technology, acoustics and analog electronics. Books and movies like The Hunt for Red October depict the use of passive sonar, quietly listening for the sound of enemy submarines carried through the water. The noises produced by the propellers and engines are unique to each class of submarine and can easily be distinguished when visualized by a spectrogram. So far I've talked about functions of one variable which in the case of sound waves is time. It's also possible to have functions of two or more variables like X and Y. This function means take the number X and square it, take the number Y and cube it and multiply them together. By multiplying the sign of X by the sign of Y each with their own frequency it's possible to have a wave in two dimensions. An image on a computer is composed of pixels with three colors, red, green and blue. Taking just the red for now, this is stored as numbers on a two-dimensional grid. Whereas before we split time into steps and had a number representing the sound at every step we split an image into pixels in both directions and have a number representing the brightness at every pixel. You can probably see where I'm going with this. It's possible to decompose that brightness data into sign waves of different amplitudes corresponding now to a pair of frequencies using a two-dimensional Fourier transform. A lossy format such as a JPEG will then compress the data by throwing away and not recording those sign waves which have low amplitudes. This might seem a little weird. With sound we had something that basically looks like a wave already. But surely a picture unless it's taken at the seaside doesn't in general contain waves. Firstly, a JPEG breaks up larger images into smaller pieces just like an MP3 breaks up a song into frames. Within these small pieces it turns out as the colors change it is possible to represent them as sign waves. JPEGs with a low quality and therefore a small file size relative to the size of the image store very few of the waves required to reconstruct the image. This leads to artifacts. Here are JPEGs of an apartment building in high and low quality. If we zoom in on a vertical edge of this building we can see some of these artifacts. There appear to be vertical lines of the wrong color next to the edges. To explain this let's take a horizontal slice from the original image to see the brightness values of the individual pixels. On the right the pixels are bright and therefore have a large value. Then there is a sudden drop and the pixels on the left are dark or have low values. The mathematical function which defines this curve is unsurprisingly called the step function. If we decompose this curve into its constituent sign waves but only keep a few of them this is what it will look like. The curve oscillates. If only we had kept more sign waves their peaks would have filled in where this curve's troughs are and vice versa. Images of text often produce artifacts because letters pretty much are high contrast lines with sharp changes in brightness. On the flip side most images are not filled with a large number of sharply contrasted edges so the JPEG method works pretty well. There is also lossless compression which as the name suggests stores an image compactly but without altering data in any way. For example if an image of a blue rectangle on a black background is saved as a PNG rather than storing the value of every single pixel the file would store the following instructions. Set every pixel to black and then go between two horizontal and vertical positions and set every pixel to blue. In this case there are no compression artifacts but generally PNG files come out larger than JPEGs. Finally everything I've talked about can be used to compress movies. The sound can be treated like an MP3 or similar. The visual component consists of sequential frames some of which can be stored in a similar way to JPEGs. Usually though not much changes between any two frames. A car might move forward by a short amount while the background remains the same. Therefore it's more economical to store only the change in pixels from the previous frame to the next. Fourier methods can again be used to store these changes efficiently. Typically there will be a key frame which is stored in its entirety and then up to 30 seconds of these little changes. When you see garbled footage this is usually due to the data encoding those changes being corrupted. Eventually when the next key frame has reached the video looks good again. In summary I've talked about the mathematics of a huge chunk of internet traffic. Most audio, images and video. All of these are stored and transmitted efficiently by first decomposing the signal into waves. Mathematically waves are the signed function from trigonometry with a particular amplitude and frequency. A Fourier transform is the mathematical tool which allows this decomposition to take place. Calculate the combination of amplitudes and matching frequencies of those signs of which, when added together, allow the signal to be reproduced. Thank you for watching.