 Hello, welcome to this lecture on digital communication using GNU radio. My name is Kumar Appaya and I belong to the department of electrical engineering IIT Bombay. In this lecture, we are going to take a look at quantization. In particular, this is related to the conversion of real world signals that can potentially take an infinite set of values into a finite set of values that can be represented easily in bits so that you can communicate these using digital communication techniques that you have learned so far. So as we just discussed, real life signals are real valued. For example, whenever you speak and measure the volume of your voice, your voice can be essentially represented as potentially a set of real numbers. And because you can actually have a very wide spectrum of numbers, for example, even if you scale your voice to be between minus one and one, there are an infinite set of values between minus one and one that your voice can take. Of course, one aspect is that you need not represent it 100% accurately. You can potentially make a simpler representation that carries most of the information. For example, your telephone is essentially a device, your mobile phone, at least in modern sense, where your voice is converted to a smaller set of values. Similarly, the lecture that is being delivered to you, the video as well as my voice is being limited in terms of the set of values which it can take. Nevertheless, you are still able to discern what is being said and what is being seen. So the practicality is because we need to send data over finite bitrate channels. That is finite bit channels is finite bitrate channels. You cannot send an arbitrarily large amount of data. You have seen in the context of channel capacity that practically speaking, channels have limited capacity even if you allow for errors, the amount of data you can send is essentially finite per unit second. So the question is, how do you convert this real value data into finite bits while minimizing the error? Now this minimizing the error can have many connotations. For example, in the context of my speech, it could be that you just want to minimize it in such a way that you can discern it. In the context of an image, you may want to capture only those colors or those points in the image that, which when replayed to you, looks very close to the original image even though it has some information loss. So in that sense, the question is, how do we convert real value data to finite bits and that is the context of this lecture. We will look at quantization. Quantization in very plain terms means restriction of the set of values which a particular variable can take and we will look at optimal quantizers, which means this is the best that the quantizer can do given some constraints. So let us take a very naive example. Let's say you have a continuous waveform. This is like a wave which is present and let's say that I ask you to quantize it. Quantize it means you have to limit the set of values which it can take. So in this particular scenario, what we are doing is we are saying, let's assume that the amount of bits you are given to represent the signal at any given time is only one bit and in fact, the choice I have made is minus half and half or my representation. So you can see as we start over here, when it is close to zero, then since it is closest to zero, the value which it takes is essentially zero. So in this case, it's when I say one bit, right? It's actually there's a slight error in terms of judgment. It's actually one bit plus zero. So we are also taking zero as a level. Let's say that we have one bit in addition to zero. So it's more than a bit, but we'll go with it. So as it gets closer to plus half, you set the value to plus half. Then as it crosses the limit and it is closer to zero, you set the value as zero and so on. So in that sense, in this case, it's three levels, three level quantization. So in this three level quantization, you're able to see that it's a very poor representation of the waveform. Now if you allow for five levels, in this case, I've said two bits, which is five levels. For five levels, the values which we're taking are zero. We're taking, I think, one fourth and then we're taking three fourths. So as you can see, even though this is still scaly and wavy, it is somewhat better representation than the previous one, because you're able to capture much of the part where the signal rises over here, and the part where it falls over here. But on the high parts, you are essentially missing it. Of course, you can play around with the levels, but you won't get too far. As you increase the number of bits, let's say this is three bits, four bits and five bits, you can start to see that your representation is much, much more accurate. If you squint your eyes and look closely, you will still see slight amount of jaggedness. But if you look closely at six bits, let's say, and seven bits, eight bits. At eight bits, there are 1024 levels. And 1024 levels, you can see that this particular waveform is being represented to a reasonable accuracy. In that sense, if you're willing to give more data, then your representation will be better. Of course, there are some caveats on how you can scale the data to make it fit this representation better. But naively speaking, there's this general notion of something like CD quality audio. If you go to the reference and see how CD quality audio was defined, of course, there is relationship with sampling rate and so on, which I will skip. But typically, CD quality audio uses 16 bits of quantization. And a slightly more less ambitious quantization, let's say for your phone, will use eight bits or 12 bits. So eight bits to 16 bits is the range for which you will quantize audio. If you have only speech-related information, typically eight bits or 12 bits, if you have music, then you'll likely go to 16 bits, plus it is stereo, so it's actually 16 bits per channel. So these are some rough aspects to keep in mind, and you'll be able to comprehend the decisions made for these quantizations when you inspect the standards that govern this quantization. Now, the overview of how a quantizer is to be viewed is given here. We want a representation of a scalar source, sorry for the spelling mistake. Typically, the scalar source is taken to be a random variable. That is, my speech is taken to be a random variable. Now you may argue that taking it to be a random variable is a little, it's tricky, but it's like this. Since whatever I'm going to speak is not determined a prior, you don't know what I'm going to say. We typically model it as a random variable. The more controversial assumption, so to speak, is that we will assume that the random variables are independent and identically distributed. Now here you will have a problem, because whenever I speak, for example, if I utter a sound, the sound from one millisecond to the next millisecond to the next millisecond is going to be very, very close. Okay, after a second or two seconds, maybe it is different, but the very close in sounds are going to be similar. So there is a high amount of correlation over a short time intervals about what I'm speaking. So in that sense, you may argue that the assumption of independent and identically distributed is not valid. So to address this, typically what they do is people take away the correlation. In other words, you know there is some correlation in my speech. You note that down and then subtract the correlation out so that it looks independent and identically distributed or at least uncorrelated. So the IID assumption is valid when you take into account the correlations and just it's almost like you take a notebook and keep the correlations, remove the correlation so that at the receiver or wherever you're reconstructing it, you just put that correlation back and you get it back. In fact, speech quantization and speech compression is a wide topic which heavily makes use of this. Several others you can check like image quantization also use similar techniques. So the assumption is that the source which you're compressing, we will assume a scalar source is independent and identically distributed. So it's like this, input waveform, correlation removed waveform is passed to a sampler. The sampler essentially samples data. Now what is the rate of sampling? Now to decide this you need to study what type of signal that your sampling is. The signal that your sampling is something like a machine vibration or something which is of a few tens of hertz, you get sampled hundreds of hertz. If the signal that you're sampling is something like speech, then depending on the quality that you want, you can sample it something like eight kilo samples per second or if you want CD quality, you have to sample it 40 or 44 kilo samples per second. This is to get an accurate enough representation over time. Remember, we need to quantize over time as well as the amplitude. So this takes care of the time quantization. You, I'm sure that in digital signal processing you would have seen aspects related to sampling. This is the amplitude quantization quantizer. We restrict the set of values which the vertical axis takes to a set of finite values like you saw in the previous waveform. If there were some in between values, you will then map it to the closest value. So this quantizer gives you set of values which are finite. For example, let's say it is 8 bit. That means there are only 256 possible values. And the encoder converts this to 0s and 1s and 0s and 1s. And we've skipped it, but there's also modulation and all those things. Then there is potentially a channel, we've seen all that. And I'm again encompassing all the equalization, error correction, everything in decoder. And decoder again gives you these bits 0, 1, 0, 0, 1. And then there's a table lookup. Table lookup means there are 256 values. Each value map to a set of 0s and 1s. Now given a set of 0s and 1s, you find out what amplitude value it was. Then you need an analog filter, why? Because these will give sampled values. So let's say sample 0, sample 1, sample 2. You have all the quantized values which the transmitter has sent. And the filter essentially draws the line between and converts this to a continuous time waveform such as speech. So when I am speaking to you, my voice is being sampled by the sampler after this microphone. Then there is a quantizer sitting in the system which essentially quantizes and compresses the values. And in this case, the channel can be assumed to be just the internet. Over the internet or over, let's say a disk, it is being delivered to you. And then these quantized values are being taken out. They are converted back to amplitudes, the finite set of amplitudes. These amplitudes are now just set of numbers. They are now converted to a waveform. You would have seen in DSP, you have these and you interpolate them using an analog filter and that results in the speech that you are able to listen to. So it is 100% guaranteed that the speech you are listening to is not the same as the speech that I'm speaking right now. But it is extremely close and close enough so that you can discern what I'm speaking without losing almost any word. So since the quantization is good enough, every word that I am speaking is something that you are able to comprehend. Let's now look closely at the aspects related to scale or quantization. Let us say that we want to quantize a random variable x, whose PDF is fx of x. Typically the way we do it is this. We have to decide the quantization levels and quantized values. So quantization levels and quantization values are given in this manner. A1, A2, up to AM plus 1 in the way I have given are the quantized values. Quantization levels are like decision regions, B1, B2, B3. So suppose that your signal falls to the left of B1, the value, meaning let's say A1 is 0, B1 is half. Let's say you get 0.4, it's to the left of half, so you quantize to A1. Suppose the value you get is 0.6, it's to the right of B1 you quantize to A2. So B1, B2, B3 are like decision regions, A1, A2, A3, A4 are quantized levels. Note that the quantized levels need not be uniform and the B1, B2 need not be the midpoints. Because for example, if you think about it in terms of your BPSK demodulation, there you had equiprobable symbols and you had the middle point of the decision region. In the case of quantization, for example, if you study how voice patterns emerge, certain values are more likely than others. So typically voice quantizers are not uniform, meaning they have some different gaps. So the quantization levels, when I say levels, I mean decision regions are B1, B2, Bm, quantization boundaries are, okay, I think, let me see, I think I have mixed these up. The quantization levels are A1, A2, Am and the quantization boundaries are B1, B2, Bm, B is for boundaries, sorry. So you have to choose AI and BJ so as to minimize an error. Now the typical choice of errors is the squared error. Now squared error makes sense because squared error is related to power. It's almost like you're minimizing the energy between what you get and what the quantized signal is. So typically, you have to minimize the mean squared error and you choose by convention B0 and Bm plus 1 as minus infinity and plus infinity respectively because there is a B1 which is the quantization boundary to the left and Bm plus 1 quantization boundary to the right that you don't see, they are minus infinity and infinity. So now, our aim is to minimize expectation of x minus q of x the whole square. q of x is the function that determines A1, B1, A2, B2. Basically A1, A2, B2 are determined by q of x. In other words, for example, if you give, let's say I chose A to be 0, B1 to be 0.5, q of x, q of 0.4 will be A1 which is 0, q of 0.6 will be A2. So q is the quantization function. Now your aim is to minimize the squared error. So if you now expand in terms of the PDF, you have to minimize in integral minus infinity to infinity fx of x, x minus q of x squared dx. And you can split this as this particular summation. In this case, it's summation j equal to 0 to m, bj to bj plus 1, fx times x minus aj the whole square dx. That is, you split it into different boundaries. So minus infinity to boundary 1, boundary 1 to boundary 2, boundary 2 to boundary 3 and boundary m to infinity. And within each boundary, what is the quantized value that will give you the error. Minimize this by playing around with b's and a's. That is the optimal quantization problem for the scalar case. So let's take a practical example and see how this works. Let's take a uniform random variable. I think again, let's not take 0 to 1, let's take uniform minus 1 to 1. Let's take this, uniform minus 1 to 1. Now the question is, how do you minimize the squared error? Let's just do this and just see how it works. So let's say we have a uniform random variable. The pdf is this, this is the pdf. Now I will call this fx of x, x. Now the question which we have is, how do you choose the levels, two levels, one bit quantizer, two levels such that you minimize the squared error. Now here I'm going to basically say, okay, let's say you choose two levels on the positive side. You're going to get a lot of error if the value falls negative. Both on the negative also doesn't work. If you choose one level here, another level on the right side, which is not equally away, then also you're going to have an issue. In this case, because of the symmetry of the problem, you can sort of guess that it's good enough for you to choose minus a and a as the levels. Why minus a and a? The pdf is symmetric. And again, just from intuition, it's exactly half of the time positive, half of the time negative. So whenever it is positive, choose plus a, whenever it is negative, choose minus a. This it turns out is the optimal strategy and finding out the optimal a is your aim. Now this is something which you can verify mathematically also, but I'm leaving it to you as an exercise. What I'm trying to point out is that when you have a symmetric distribution like let's say uniform minus 1, 1 or even Gaussian, then the quantization levels, optimal quantization levels have to be symmetric across the 0, you know, across the y axis. So now what remains is for us to find a. So we want to minimize, okay. We want to minimize expectation of x minus qx squared. Now in this case, it's going to be integral minus infinity to infinity fx of x, okay. x minus q of x square dx, okay. Now this is equal to integral, I'm just going to go from minus 1 to 0 because my boundary is 0 as I just argued, fx of x is half, okay. x and my q of x is a plus integral 0 to 1, half x minus a because the quantization level is a. Now I'm going to play a trick here, here I'm going to replace y with minus x. So this integral becomes integral and because of this negative sign the integral swaps, I'm going to get 0 to 1, half y minus a the whole square dy. So this if you add these up, you're going to get integral 0 to 1 x minus a whole square dx, okay. Now you have to minimize this by choosing a optimally, okay. Now you can make some guesses and things like that, okay. But let's just, I'm just going to call this, you know, j equal to and let's say dj by d a is equal to integral 0 to 1 2 times x minus a dx, okay. So now if I now perform this integral, I can cancel the 2, integral x dx is half. You can also perform the integral over here and you know, just get it by differentiating but I'm lazy, I just did this. So a is half is the optimal quantization level. So minus half and half are the optimal one bit quantization levels. When we do this in GNU radio, you will see that choosing other levels results in higher error. In fact, there's something we can verify. With 2 bits and this is something I leave to you as an exercise. It's minus 3 4s, minus 1 4, 1 4, 3 4 and with m bits in general, it's going to be minus 2 power m minus 1 plus 1 by 2 power m minus 2 power m minus 1 plus 3. So it's like this. You have to take 2 power m minus 1 minus 1 and just keep adding 2 to it, I mean from negative side and divided by 2 power m, you will get the values. Again this is something which you can easily verify by using the symmetry of the distribution and the optimal optimality you can prove. And the mean square error is delta square by 12 where delta is bj plus 1 minus bj. That is the gap between two boundaries or gap between two quantization points also. Let's just find j here. So in this case, what is j? j is equal to at for this expectation of, I can take the positive, it's just x minus q of x the whole square and as we proved here, it is equal to integral 0 to 1 x minus half whole square dx. We proved it over here. So now, if you just evaluate this integral, you are going to get x minus half the whole cube by 3, 0 and 1, what do we get? So now, if you substitute, so you get, so when you substitute 1, you get half cube, when you substitute 0, you get minus half cube by 3. So this is 1 by 8 plus 1 by 8, so you will get 1 by 12. This it turns out is the error and this makes sense because delta is, I have written bj plus 1 minus bj, you can also take it as aj plus 1 minus aj for this particular case, it will be delta square by 12. This it turns out is the error optimal for the scalar quantizer quantization for uniform minus 1 1 random variable. Now let's also do one more case, which is the Gaussian. The case of a Gaussian random variable, we can take x to be n 0 1, you can of course generalize it for any other case, but it's very easy for n 0 1. Once again, symmetric distribution for 1 bit, the optimal quantization points are plus or minus a and the optimal boundary is 0 by the same arguments I made just earlier. Therefore, our aim is to minimize integral 0 to infinity x minus a the whole square e power minus x square by 2 by root 2 pi dx, of course you can get rid of root 2 pi everything. Now how do you do this? In this particular case, let us differentiate, if you differentiate this directly, then you get you have to do this differentiation carefully, you just have to differentiate with respect to a. If you do the differentiation with respect to a, you get x minus a times 2 and set it equal to 0. If you split those integrals, you get 0 to infinity x e power minus x square by 2 by root 2 pi dx is equal to integral 0 to infinity a e power minus x square by 2 by root 2 pi dx. Now the advantage of keeping the root 2 pi is that this particular part this integral becomes half, because it is 0 to infinity of a n 0 1 Gaussian is half. The other side you have to perform integral little carefully, you just have to this x dx here, you let y be equal to x square and you know x you can say x square by 2 then dy will be 2 x dx or you know you can basically make the substitutions and verify that you will get I do not recall exactly what you get, but simplifying gives you a is equal to root I think you will get 1 by root 2 pi and a will be root of 2 upon pi. This in fact is the optimal point and in GNU radio, we will set this optimal point and verify that this indeed minimizes the error. Now if you want to go to optimal quantizers, there is something called the Lloyd-Max algorithm. We will just briefly cover the Lloyd-Max algorithm, it is an iterative algorithm that can jointly obtain both the optimal quantization levels and the boundaries. For example, if you choose the initial quantization levels a 1, a 2 up to a m plus 1 and we set b j's to be a j plus a j plus in the midpoints, then we set the new a j's to be the conditional means given x belongs to b j to b j plus 1, why is the conditional mean? If you remember your probability, the conditional mean always minimizes the mean squared error. The conditional mean always minimizes the mean squared error and therefore, setting x to the conditional mean is the right thing to do. That means you get new a j's, again find the new b j's, again find the new conditional means and keep repeating this process. If you keep repeating this process, you will find that after some time, the values of a j's and b j's does not change much and the mean squared error does not reduce. At that particular point, you say, okay, I have the optimal quantizer. This algorithm is more than, I would say, 30 or even 40 years old and it's proved to be optimal under many circumstances. Because the algorithm captures the spread of the random variables, see, every time by setting it to the conditional mean, you are minimizing the squared error and every time it just does slightly better, slightly better, slightly better at reducing the MSE till a point where the MSE stops reducing and that's your optimal MSE, minimum MSE. You can go refer to this algorithm by the papers by the several authors which are there right from the 80s. But one aspect I want to mention is that these are closely related to K-means clustering in the sense that in the case of K-means clustering, it's a very common algorithm used in pattern recognition and machine learning. There you are essentially performing this conditional mean by finding the closest and it's very, very related. If you study the algorithms of Lloyd-Max and the K-means clustering, you'll find a lot of connection between them. One more aspect which I wanted to mention is that rather than quantize scalars like X, you may want to quantize tuples like groups of random variables X1, X2, XN jointly. The advantage of doing this is that it is more efficient at exploiting correlations within vectors. For example, let's say that you have a particular image and you do a transformation to get some vectors. It could be that some vectors may have a lot of correlation among them. Quantizing them independently may result in loss of the benefit of the correlation. So you just want to quantize them as vectors rather than perform this quantization operation in the scalar case. You just perform this on the n-dimensional space and this it turns out the results in Voronoi regions. That is if you have these kinds of quantization points, then the all points that fall within this region are mapped to this point. All points that fall in this region are mapped to this point. And this kind of separation of the space into several regions, this is Voronoi regions. These Voronoi regions determine where you should quantize to and at the optimal point you are again going to minimize some error such as the squared error. Vector quantization is very, very popularly used in several scenarios where you have large amounts of data with correlations inbuilt that you are able to capture. So summary, quantization is an essential step to get your real values to finite precision representation. But remember always whenever you have any practical signal, there is quantization error and an optimal quantizer minimizes the quantization error. Now a scalar quantization approach is of course suitable in many applications. But you may want to use vector quantization in the context where you want to take groups of data and quantize them rather than scalar in geometry you can go for higher dimensional spaces. For further reading on this topic, this is a deep and you know mature topic, you can look at block quantization, rate distortion theory and entropy coded quantization. You can refer to these in various sources on compression information theory and understand more about this topic. In the next lecture, we are going to perform these operations as a simulation on new radio and observe how the errors manifest in histograms. Thank you.