 So, I think most of you have used different multimedia applications, and you've seen in different lectures, maybe in ITS 323 or similar lectures last semester or previous semester's examples of multimedia applications. We're going to talk about a selection of such applications and the issues of sending audio, video across the Internet. We will give some general characteristics of what we mean by multimedia applications, and also then look at what are the aspects of the network that impact upon performance, and how do we measure the performance of multimedia applications, things like delay, sending rate or throughput, and jitter or delay variation. And then we'll go through several example protocols for sending audio and video across the Internet. For example, voice over IP. We may skip over the streaming stored audio and video or mention it briefly and then finish with IPTV, sending high quality TV across the Internet. Let's briefly introduce some of the requirements for our multimedia applications, and then we'll go to a quick example. So, what do we mean by multimedia applications? Well, different ways we can characterize. One is based upon the direction, so the direction of communications. For example, streaming one way or unidirectional communications. So, streaming audio and video across the Internet is an example. So, there's an online radio station and you listen to that radio via the Internet. Radio, music, and video are the typical examples. And video may be recorded, like YouTube, or it may be live video, like watching some sporting event as it happens. So, it's one way in that usually there's some source of the content that generates the content, and that's streamed in one direction to the viewers or the listeners. We often distinguish between stored and live because they have different requirements in terms of performance, especially delay. Stored is like YouTube. In YouTube, there are a set of servers, but let's think conceptually there's a single YouTube server in Google, Google who owns YouTube. Google have a YouTube server with all their videos stored on it. When you upload a video, you send it from your computer and store it on Google's or the YouTube's server. When you want to view the video, then that video which is stored on the server is streamed to your computer in one way. When you sometimes, when you view a video on YouTube, when you press play, it may take several seconds before the video starts playing. So, what happens is there's some buffering taking place and that when you press play in your client, that is on your web browser, then the YouTube server starts streaming the video. It's already stored at the server and there may be a several second delay even, up to 10 seconds delay before it started. But once it starts, usually it plays smoothly. We look at different causes of why it wouldn't play smoothly later. That's an example of stored video. The video is already created and it's stored in a file on the server. Live video is video or live audio or video is content which is generated. The file is generated at the server as at the approximately the same time as you want to view it or listen to it. So that's, for example, a live sporting event where someone is filming the sporting event and as the content is generated, it is then streamed to the viewers. The main difference is the requirements in terms of performance. With live audio and video, we need a much smaller delay from when it leaves the servers until it is received by your computer. Because being live means what you view or listen to should be approximately happening at the same time as when it's actually happening in real life. That's the idea. If someone, if there's a football match and you're watching it on the internet, so it happens at this point in time and then it's streamed to your computer and if it takes 10 minutes from when it happens until you receive it, then that's not so good for the user because it's a 10 minute delay from when it happens until you receive and when someone kicks a goal, then maybe your friend calls you before you see the goal being kicked because they have been watching it at the game themselves. So that's not very useful. So with a live streaming of audio or video, we usually need a very small delay from when it happens until when you receive it, in the order of seconds preferably, less than seconds. Whereas with stored audio, watching a movie, for example, from when you press play until when you receive it, you can usually tolerate a delay of more than multiple seconds. As long as once it starts on your computer, the client, it plays smoothly, that's the main requirement. So there's different requirements in terms of performance. The other type of multimedia applications, two way bi-directional communications when people are talking to each other across the network. Voice calls, video conferencing, so these are interactive, they're users interacting with each other and they again have different requirements in terms of performance. When you talk to someone over a phone, whether it's across a normal telephone network or across the internet, then the delay between from when you say something until when they receive something typically should be in the order of less than hundreds of milliseconds. If the delay from when you say something until when they receive it, and hear it, is multiple seconds, then it becomes almost impossible to have a conversation because you say something, there's a delay and then the other person receives it while they're waiting for you to say something, your data is being transmitted, they start talking and people start talking at the same time. So with interactive applications, there are even stricter requirements on performance, especially delay. Sometimes we refer to them as real time applications. Because of the fact that we usually require strict delay or have strict delay requirements. But in fact stored audio and video, the delay requirements are not so important. So with a voice or video call, or even live audio video streaming usually want a small delay and a small jitter. So we'll talk about what we mean by jitter. Essentially the the variation of the delay and other applications, non-multi-media applications, what do we mean web browsing, downloading files. In other applications, normally we care about reliability. You download a file, you want to get the exact same file as what's at the server. You don't want any difference. Reliability is essential. But you can tolerate some delays. You start your download of a one gigabyte file. The delay from when you press start until it actually starts transferring can be the order of seconds, not a problem. So also varying delays are not a problem when you're downloading a file, for example. With web browsing delays in the order of seconds start to become a problem because the interaction time is not good. Whereas with multi-media applications, delay is usually the main factor that impacts on performance. They usually delay sensitive applications. If the delay is too large, the time from when the server sends the data until when you receive it is too large, then sometimes the application will be unusable. The users will not find it convenient. Both the delay and the jitter, the variation of the delay. If they become too large, then multi-media applications do not work so well. Whereas web browsing, file downloads, emails require reliability. The server has a file sending to you. The file at the server and the file at your computer, when you download it, they must be exactly the same. It has to be 100% reliable. With multi-media applications, often we can tolerate packet loss or data loss. So there's a video at the YouTube server and it goes, it's a video for 10 minutes and streaming to your computer and you're watching the video on your computer. You can still view the video even if some of the data of that original video doesn't arrive at your computer. Even if some of the data is lost, the application is still usable. What does a user see? Maybe some of the pixels in the video that you view are not displayed correctly. Wrong color or there's a slight artifact on the video for a short amount of time. But usually we can tolerate some loss in multi-media applications. Whereas with normal data applications, data loss is not tolerated. Most of you have seen this or if not seen it, then have seen something similar to it in other courses. Voice communications is a special case of audio communication. So voice, someone talking. Usually voice, the majority of voice communications, the frequency range of the analog signal is from about 300 hertz up to 3400 hertz or roughly from zero up to 4000 hertz is the human voice. So when we want to send the human voice across a network, we need to take that analog data and encode it as some digital data to send in our packets. So we look at different ways to do that. But note that the frequency range in the order of hundreds of hertz up to 3000, 4000 hertz. So a bandwidth of less than 4000 hertz is, well, to be, that's the most typical form of voice. But in fact, it can be a larger bandwidth than that, going lower frequencies and even up to possibly thousands of kilohertz. With music, the dotted line here, we have a larger range of frequencies, a larger bandwidth here. So we have different audio signals. So when we want to send these sources of content across a network, we need to encode them into some digital format and transmit that digital format across a network using some protocol. So we need to briefly mention the ways at which we can encode the analog into digital and then look at some protocols for how do we send across a network. And I think most of you know how to encode, or the simple way to encode analog communications, voice communications. And one example is using pulse code modulation. If this represents our analog input, what we do is we take samples of the signal, of the input strength, and map those samples to some discrete value. And that discrete value can be represented as a binary value, and that's our digital data. So we sample in this case, these points, map to some value, 1, 9, 15, 10, 5, 2, 2, which are represented as four bit binary values. And we've mapped our analog input to digital output. And this digital output can then be stored in a packet and sent across the internet. So we need some way to first convert the analog data to digital data, and then send that across our network. So at our source, for voice communications, you have a mobile phone is our mobile phone, our user talks, analog input to the mobile phone, the mobile phone has to take that analog input. And if we want to send it across as a digital signal or as digital data, then convert that into some digital form into bits. Thanks to analog input and produces some digital output. And then that can be transmitted with a mobile phone, it's transmitted with digital transmission. If it's a voice over IP software, then you talk on your computer into the microphone, the computer converts that into some digital form, which is then sent in packets across the network. So we're focusing on how to do the conversion first. How do we can we send this digital data as an analog signal? Yes, we can. Yes. It depends upon. So here's let's say, instead of a mobile phone, this is your computer, you're using Skype or similar software on it, you talk into your microphone, the computer converts that into digital data. And then that sent using your network connection to wherever if you're using ADSL for your home, internet access, then it's sent from your computer to a modem. And then the modem sends that as an analog signal across the telephone network. But the data that it's sending is still digital. Okay. So at the source, we convert the analog input to the digital form, and then send it across the network. How does it deliver it across the network as either analog or digital signals? That's not so important. It's at the source, how do we convert it? And of course, at the destination, when we receive it, we receive some digital input. And then we need to take that and convert and get some analog output. That's our goal. And of course, the analog output at the destination should be the same as the analog input at the source. If I'm talking here, what the person here is here on their speakers should be the same. Or similar, it should convey the same information. So one way to do this conversion from the analog input to the digital form here is using, for example, pulse code modulation. We take samples of the signal at fixed points. So we record the input value mapped to some discrete value, which can be represented in binary. And hence we have our digital output. Send that digital output across our network. The receiver takes the received binary values and maps it back to the discrete values and outputs, say on a speaker, audio at particular levels, according to those discrete values. So we play back what we receive. What we care about is the two main things we care about in terms of evaluating the performance of this process is the quality that we receive at the receiver, at the destination. We have some quality on input. What does the destination here? What quality of the signal or output do they get? And the other thing is what is required to send this data across the network? How much capacity do we use? With pulse code modulation, what impacts upon the quality of the output? The frequency of what? The frequency of which we capture the samples. The rate at which we record the values. This is our input. What we'd like to get at the output is a perfect reproduction of this shape, this signal. But what we're doing at the input is we're recording not every possible value along this line, we're recording just a selection of the values. So when we reproduce the analog output at the destination, we take those selection of values and use that to create the output. So what would our output look like in a very simple form? So if this is the source input, the digital data produced, these bits are sent across the network. The receiver receives these bits, so they receive 0, 0, 0, 1. And when they receive these bits, which map the four bits mapped to some level one, they produce an output at that level for some period of time. So on the speakers, for example, at the receiver's computer, they output some audio at that level for the period of time, which is equivalent to the sample period here. Then they receive the value 1, 0, 0, 1, which is level nine. So they output a value at level nine. And then 15 is the next value received 10, 5, 2 and 2. So this is an example of what the receiver would produce as an analog output, because what the receiver does in the simplest terms is takes the received digital data and maps that to a particular level and plays that back. In this case, creates an analog or audio output on the speakers. What we want is that the output to be identical to the input, or at least, well, if they are identical, then we'd say that that's 100% quality, perfect quality. Is our output the same as the input in this case? No. The input is a smooth line. The output is this square, rectangular-shaped line. So how do we increase the quality, the sampling rate or the frequency at which take these samples will impact upon the quality? The more frequent we sample, the closer the analog output will be to the original input. And the other way to increase the quality, other way to increase the quality of our audio, the size of the sample, so this value here. Here we had four bits for every sample, which why do we have four? Because we divided the space here into 16 levels. If we divided this same space into say 32 levels, which we could represent with five bits, then each sample we take is, the level we get is more, is closer to the real value. The real value here was 1.1 or 9.2. If we have more levels here, then the sample value will be closer to the real value. And when we reproduce it, the reproduced value will be closer to the real value. So to increase the quality at the receiver, increase the sampling rate, sampling frequency, and, and or increase the number of levels or the size or the length in bits of each sample. And that's summarized here. So the two factors that impact upon quality are how often do we sample, what's the rate, and how big is each in the number of bits. The larger the sample size and higher the sampling rate, the better the quality at the receiver. What's a good sampling rate? Well, it depends upon the input content. For voice, well, in general, Nyquist sampling theorem tells us that it should be twice the highest frequency component. With voice, let's say the highest component of your voice is four kilohertz, the sampling rate should be twice that, which is eight kilohertz. So there's some theorem to tell us what's the recommender value for sampling different types of data, assuming we know the frequency or the bandwidth or the spectrum of that data. What should the sample size be? Well, people have done different tests depending upon the input, and again, they determine the appropriate values for good quality voice. For voice, typically it's seven or eight bits. With eight bits sample size and a sampling rate of 8,000 samples per second usually gives a good quality voice, and it's a typical unit for telephones, especially. Let's look at some examples of that. First example is in these pictures. These are pictures of the output at the receiver. The input was a perfect sine wave, just a simple sine wave. And these are what we get at the output depending upon different or with different sample rates and number of levels. So the first picture is if we have 50 samples per second. So we have a perfect sine wave as input. We take 50 samples per second. The period of the, it was a one hertz sine wave. So from here to here is one second. 50 samples per second, we have 50 bars in this plot here. And 100 levels, which means from the bottom point to the top point we divide evenly into 100 different levels. And we take samples of our perfect sine wave of input and this is what we get in terms of output. And you can see that in the other five plots with different values of the number of levels and the number of samples per second. So in the top row the same sampling rate, 50, 50, 50. But what we change is the number of levels, 100 down to 10 down to 4. And you can see graphically as you decrease the number of levels, you have fewer levels, the quality at the receiver goes down. Why do I say the quality goes down? Because the output, the yellow plot is the output is, is further away from what the original input is. So the difference between the input and output is larger as we go to the right as we reduce the number of levels. If you can imagine a sine wave, this looks like a sine wave, this one looks much less like a sine wave. So we, when we reproduce at the output, it's less like what the original input was. And the second row shows with the same number of levels, 10 levels, with a decreasing sampling rate, 50, 10 down to 4 samples per second. And again we see, as we decrease the sampling rate, the output is further away what the original sine wave was. That is the quality at the receiver goes down. So we want a high sampling rate and a high number of levels to give a high quality output. Let's give a different example. This is a website, there's a link on our course website to it. It's just a website that I've done some demonstrations. I've recorded some voice, 5 or 10 seconds of voice. And then what they do is that they save them, or they take that analog input and use different methods for converting it to digital. So one of them they use PCM, but they use different sampling rates and different sample sizes. And the formats that they produce are listed here. So for example the first one, their sampling rate is 11,025 hertz. That is they sample 11,025 times per second. So there's the analog input, they take their samples, using PCM and 16-bit PCM, which means each sample produces a 16-bit number. In the slide here that is this number is not 4-bits, but 16-bits. And they have 11,000 samples every second. And then they've done it. So what they do is they take the original analog input and record it using that sampling rate and the number of bits. And then they save it as a WAV file so we can play it back and hear it. And then they do it with different values. For example 8,000 hertz, 16-bit PCM. They reduce the number of bits down to 8 bits. And in fact you don't have to use PCM. There are variations for how do you encode that analog input to a digital output. PCM is a simple one, but there are others. Micro law is similar in fact to PCM. Users are some logarithmic coding. And you can also apply other algorithms which also include some compression of the data. So in general there's some codec or encoder that takes the analog and produces a digital output an encoder. And at the receiving end there's a decoder that takes the digital and produces the analog. So together we have an encoder and a decoder or a codec is the general name. We'll list some of the codecs later, but they've done it with different codecs. So PCM, micro law, some delta PCM, GSM, the original codec used by GSM mobile phones. So when you talked on your GSM mobile phone then you, there was a particular codec used to take your voice and save it in a digital format. And as you may have used MP3 is another codec. MP3 is used to take music typically and save it in a digital form. So take that analog and produce some digital output. So they've used MP3 to encode the audio and some other ones. Let's play some of them back and see if you can recognize which ones are the highest quality. And then we'll look at some of the performance characteristics. How are we going to play them back? Let's start, let's start with the top one. I have an audio player. This is, I've just opened that first WAV file which uses 11,000 samples per second, 16-bit samples. And let's see if our audio works and play it back. It's just some Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. So what we're going to do is play several of these and you're trying to distinguish which one sounds better. It's the same person talking each time. Let's play that one again and then Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. Let's go down to GSM actually this one. Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. Can you hear the difference? Let's try. Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. If you're installing extra dictates, you will find that the dictates will improve the volume of your dictates by letting you take when you're away from the office. Time to dictate into your type is remarkably over the internet and keep track of your work from the type. That one's much more obvious in terms of the poorer quality than the others. So we've only played three at the moment. This one is in fact using MP3. The first one was using PCM, an 11,000 hertz sampling, and the middle one was using GSM. So yes, you need to listen carefully to be able to distinguish the quality, but at least in these three, you can notice that this one's worse than the other two. And I think if you listen to them one after the other immediately or longer, many times you'll be able to detect which one is best. Sometimes it's hard to determine. So probably between the first few, it may be quite difficult. They're using 11,000 and 8,000 hertz with the same number of samples. The first two, it would be quite difficult to work out which one is better. But the other one we played is this cell phone case. One more time. Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. Thank you for installing ExpressDictate. You will find ExpressDictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the typing queue. This one should be the best quality of the ones that we've played at this. And I can detect it, I don't know if everyone can detect it, but this one, compared to the other two at least, is the best quality. You cannot see it from this plot, that's not the idea, it's from listening. And the reason it's the best quality is because it has the highest sampling rate, it has the highest number of levels, or the number of bits per sample, and it has no compression, so there's another factor that's involved. The ones we played, the PCM one at the start, let's record them. With 11,025 samples per second, each sample is 16 bits, they've done some calculations and that's equivalent to 176.4 kilobits per second, because 11,000 samples per second, each sample 16 bits, 16 multiplied by 11,000, we get 176,000 bits per second. So for example, to save a one minute audio file using this codec, we need to store 1.2 or 1.29 megabytes bits, let's calculate, 176 point, let's be more precise. We have 11,025 samples per second, each sample is 16 bits, and for one minute, we have 16 seconds. So how many bits do we get? You need your calculator for this one, 11,025 by 16 by 60 is 10 megabits, okay? Divide by 8, convert to bytes is 1.3 megabytes approximately, okay? So to store audio, one minute of audio using this codec takes 1.3 megabytes of space. Or similar, to send audio across a network using this codec, if we have no other form of compression, we need to send 176 kilobits per second to get that audio from the source to the destination. Because we generate, we're generating 176 kilobits per second at the source, sampling the audio as it comes in every second, we generate 176 kilobits, and we need to deliver them to the destination so they receive 176 kilobits per second and generate the audio output at the same rate as which it's, or the same frequency at which it's generated here. So for this voice application to work, where we're communicating across the network, we must be able to send that digital data generated at a speed across the network of 176 kilobits per second, and it must be received at that rate. If it's received at a slow rate, 100 kilobits per second, then it means, A, they're not receiving all of the data, or B, they are receiving it with some delay, and it will not play back the same. If you don't receive the data, it will not play back what was said. If you receive it with some delay, as if the voice is slowed down, like you slow down the playback rate, speed. So an important measure of performance that we use in voice and generally multimedia applications is what sending rate is required to deliver that data across a network. The smaller the sending rate, the better. Now this is what confuses some people. Normally when we talk about throughput, we want a high throughput. But here we're talking about getting some data from A to B. We want to use a smaller portion of the network as possible to get the same data from A to B. So the applications which use a smaller sending rate to get the audio from A to B are better in terms of that performance metric. So the smaller sending rate, the better when we compare these codecs. So the first one, which gave us the highest quality, requires a sending rate of 176 kilobits per second. Another one we used, so there are others. If we drop the sampling rate down to 8,000 hertz, everything else the same, then we change this calculation to 8,000 here and you get 128 kilobits per second. That's the required sending rate. So the second one is better than the first one in terms of the sending rate required, the bit rate. But it's worse than the first one in terms of quality. So there's a trade-off here. We want high quality, but we want a small bit rate or small sending rate. The second audio file that we played was, so we skipped over some, the cell phone quality one, which is the original codec used in the GSM mobile phones. I'm not sure if it's still used in the current generation of phones, but the original one used had a sampling rate of 8,000 hertz and it used a codec to not just take samples and produce the binary value, but to also apply some compression. So use an algorithm so it takes the original digital form and reduces the number of bits that needs to store that data. It compresses it. And the sending rate required is just 13 kilobits per second. But we can distinguish the difference in quality between the GSM codec and the original PCM codec. It wasn't as good as quality, but a smaller sending rate, which is better. Or another way to think of it, if you have one minute of audio with the first PCM, you need to use up 1.3 megabytes of your hard disk. If you encode it using GSM, you use just 100 kilobytes of your hard disk, which is better. The last one we listened to used a different codec. Again, it has some compression, MP3. And that requires just 8 kilobits per second. Even better in terms of the sending rate, but even worse in terms of quality. So that's the trade-off in this case. The sending rate or the bit rate required, we want as low as possible, but the quality we want as high as possible. So we need to choose depending upon what our requirements are. The worst one, the lowest one, it opens. Let's see if we can save it. Okay, it's coming. Try again. My player cannot play it. MP3 is the second worst. That's the best. Keep brainstorming, express dictate. You will find express dictate will improve the way you dictate by letting you dictate when you're away from the office. Send your dictation to your typist immediately over the internet and keep track of your work in the type... I suspect it's not much worse than that one. That one's pretty bad for audio communications, but quite good in terms of the sending rate. So there are different codecs that are available. Returning to our lectures, we're not going to go through all of them, but just give... Okay, we were first talking about voice communications, people talking, same concepts apply for non-voice audio communications, music, for example, except that music generally has a larger bandwidth and users generally require or desire a higher quality output with music. When you're talking to someone on the phone, the audio quality can be one level and you'll be... except that. But if you're listening to music, usually you want a higher quality output. But other than that, same approaches. What about video? What is video? Video is just a set of... Well, video is a set of images that change at some rate to give this illusion of motion. Okay, so we have fixed images and we change them and when we watch it, our eyes think that there's some motion happening. So those images often called frames, still images. Of course, video is usually accompanied by audio as well. How do we measure the amount of information contained in video? Then we look at how big the frame is. It's an image. So we can measure the width and the height of that image or frame and with a digital image, each point is called a pixel and each pixel is some color and we represent a color using a binary value. So how many bits do we use to represent that color is the depth of that color of an image. So if you consider just an image, we have a number of pixels to create the width and the height. So the number of pixels is just width times height and each pixel is a single color represented by a binary value and how many bits to represent that color is the bits per pixel or the depth of the color. So for example, we have an image which is 800 by 600 pixels. The width is 800 pixels. The height is 600 pixels. So there are 800 dots across and 600 dots or pixels down. Each pixel is a binary value which indicates some color and we use usually different lengths. For example, a 24-bit color means that that pixel is represented by a 24-bit number. So we can calculate the amount of information stored in that image. The number of bits we need 800 by 600 by 24 bits and with the calculator is 11 megabits, 11.5 megabits. So if you have an image, just an image, not a video yet, an image of 800 by 600, 24-bit color, each pixel is a 24-bit value, then we have 11 megabits to store there. And a video is simply multiple images that change at some rate. So multiple frames that change at some rate, some frame rate. And we change them at a rate that it looks like there's some motion. And for your eyes, it's considered that normally around 15 frames per second are needed to trick your eyes into thinking there's some motion. If you change the image at 15 frames per second, then your eyes will think that that's smooth motion. So there are different standards for what frame rate to be used. One of them is 25 frames per second. Another common one is 30 frames per second. So what we do is we take our image and we change it 25 times per second. So the number of bits per second is 25 times by our 11 million. And how many? 0, 6. If you have a video and the resolution of that video is 800 by 600, and the color depth is 24 bits, and you're using 25 frames per second, then that video requires to be sent at 288 megabits per second. Who has streamed a video from a server to their computer? For example, a YouTube video. What's the resolution? What's the typical resolution in YouTube? Anyone remember? I think 480 is one value. You can actually choose the resolution. It makes the image larger and smaller in YouTube. So 480 usually is 720. It's the width that they specify. So either 480 pixels across, 720, which is a bigger one, pixels across, and then the height is related to that, depends upon whether it's widescreen and so on. So we're talking about, say, with YouTube, similar dimensions. This is slightly larger, but 720 by a similar value gives us if we're using 25 frames per second and the same color depth, to send that video from a server to your computer, you need to, if you're using 800 by 600, send it 288 megabits per second. Who has an internet connection of 288 megabits per second? Anyone home internet? What's your home internet or the internet that you use? Wi-Fi. What's the data rate? You're usually in the order of 10 megabits per second, maybe higher in some cases, your Wi-Fi. 288 megabits per second would be needed to send this raw video to your computer. And it's not a very high resolution, in fact. With video, compression is also used. So before it's sent, it's, in fact, compressed. And it can be compressed significantly. So it's much smaller than this. And the compression usually performed by the codec. So both for audio and video, to save space where you store it or save the network resources when you send it across the network, compression is commonly used. Some examples of the raw data rates required. With PCM voice, normally with a eight kilohertz sampling rate, we need 64 kilobits per second. That's okay. We can send 64 kilobits per second across a network. PCM audio, music. We usually have a larger sampling rate. With music, we usually use stereo, two different channels. And 16 bits per channel means 1.4 megabits per second, which is the quality of CDs, normally. Not MP3, but a normal CD. Standard definition digital TV. For example, similar to the YouTube 720 video, 720 by 576 pixels. 24 bits per second, 250 megabits per second are needed. And with high definition TV, larger resolution, talking about 1.2 gigabits per second. So the raw video, the data rates are so large that most networks cannot support them. And similarly, if you try to save a file using the raw video, you'll quickly use up your hard drive. 1.2 gigabits per second, which is what, 1.5 megabits, 150 megabytes per second. Take a movie, which goes for two hours. So multiply by 7,000, then you've got, what, almost a terabyte, just to store one movie. This is the raw data rate. So audio and especially video usually use some form of compression. It saves the amount of storage, and it saves what needs to be transmitted. There are two general types of compression. Lossy and lossless. Lossy compression is when we reduce the quality of the information that we store. We take the audio with lossy compression. The result is an output which is a lower quality than the original. We lose some information. We lose information, we lose quality. Lossless compression is when we take our input, we compress it, and when we decompress it, we get the exact same as what we had originally. We don't lose any information, and we don't lose quality. What does zip use? Lossy or lossless? Why does it use lossless? Because when you unzip, you get the same information. With zip, you take a file. You use zip, which defines a compression algorithm, and you get a smaller file. When you decompress, you get the exact same original file. You don't lose any information. Lossless. MP3. MP3 is used for compressing audio, encoding and compressing audio. What do we use? Lossy. If you take a normal CD, not an MP3 one, but a normal original CD, and then you take a song on that and you encode it using MP3, then the size of the file will be much smaller than the original file in the wave format on the normal CD. Because MP3 takes the original input, it reduces the size, throwing away some information. We lose some information. So when you decompress, when you play it back, you do not get the original input. The quality is reduced, but the advantage of lossy compression is that it can compress much further. With lossless compression, the amount that we can reduce the size by is much less than when we use lossy compression. For example, what is MP3? What's the typical length, say a four-minute song? Anyone guess how big an MP3 is, approximately? Four-minute song? How long are they? How many megabytes in a four-megabyte? About... Okay, let's say five to six, three to four, about, let's say, per minute. With an MP3, say for each minute, you have about one megabyte to store. Approximately, it depends upon the algorithm used. So if you have a four-minute song with an MP3 compression, it's down to four megabytes. What if you don't use a compression? How large would the song be if you record it from a CD? Well, for a single song of four minutes, it's, in fact, using this rate. What a CD does is it uses 44,000 samples per second, 16 bits per channel, two channels, it's stereo. 1.4 megabits per second, four minutes is 240 multiplied by that, which is, what, 360 megabits, which is 360 megabits, which is 45 megabytes. Take a four-minute song, and from an original audio CD, and save it in a wave format, and wave typically uses, well, you can use these parameters for the sampling rate, and you'll get a 45 megabyte file. Take that song and compress it using MP3, and you may be down to four megabytes, about 10% of the original size. So without compression, with MP3 lossy compression, about 10% of the original size in this simple example, which one's better quality? CD or MP3? A CD, which is the wave file. There's no compression there. It's the original quality. With MP3, the quality is reduced, but the file size is significantly reduced. So with lossy compression, the amount that the final size is compared to the original is in the order of 5% to 25%. Here it was 10% in our example. It depends upon the input and the algorithm. So you take a 100 megabyte file, you may be able to reduce it down to a 5 megabyte file, depending upon the algorithm. With lossless compression, you compress much less. That is, you take a 100 megabyte file with lossless compression, maybe you end up with a 50 megabyte file, 50% of the original size. An example of lossless audio, anyone who gives me an example of lossless audio can leave and finish for the day. Lossless audio. What's an algorithm or a codec? Anyone? Okay, FLAC. You may have seen that. Apple has one. I think it's called ALAC. Free lossless audio compression. I think Apple, lossless audio compression. They take the original file, say our 45 megabyte file for our 4-minute song, and they compress it. And let's say they compress it down to half the size, 22 megabytes. But when you play it back, the quality is the same. It's identical quality in the playback to the original. There's no loss of information there. So FLAC is an example of a lossless compression algorithm. We don't lose quality, but of course we end up with file sizes larger than if we use lossy compression. So that's the trade-off there. You can go, and everyone else can go. Let's stop there. And what we'll do next week is we'll give a demonstration of these, some of the different codecs, over the, here's a list of some of them. I'm sure you've seen a lot of these and many others. Have a look at your media files at home, your movies, your music. Usually you can see the properties and see what codec they use, see what data rates they use, see the different parameters. So have a look at some of your own files and see what algorithms, parameters they use. And we'll summarize on them next week.