 So, this is lecture 42 and we are talking about coding and we are using this very simple BPSK over AWGN model for coding, so how is the whole thing going to work, instead of mapping every bit to a symbol and sending it across, I am going to take k bits and then map them to code, I do a encoding, according to some code, I get a code word C which is n bits and then you do a 0 to 1, 1 to minus 1 BPSK. So, you do a mapping and then this goes through a channel which is modified, which is modeled as just addition of noise and then you receive a received vector R. So, this one I will call maybe a symbol vector S which is n symbols, this is R, R is the received vector which has n real values. So, noise added to plus 1, now minus 1, so you cannot do anything more than that. So, what we have to do now is at the receiver, you have to figure out how to do decoding, I do not know what happened. I think something is happening here, I am going to try closing this, I think it is recording, so let us see. So, we have to decode this and let me just quickly check that, just recording. So, you have to decode this and produce what? Decode this and produce some C hat or M hat, you can think of it either way because this is a 1 to 1 mapping, this is a 1 to 1 mapping from message to the code. What is the code? Definition of the code is what? Set of all C, so that is the definition of the code. So, how many code words will you have? 2 power k. In general, if you look at all n bit sequences, you have 2 power n n bit numbers, but n bit vectors, but that is all of them will not be in the code, only some small subset of them will be in the code. So, I have put code here in this box, but remember this is what is called an encoder. You have 2 power k vectors on the left, 2 power k vectors on the right, how many possible ways in which you can do 1 to 1 mapping? Sorry, nobody knows the answer. So, at least there is more than one, everybody agrees there is more than one. There are in fact a huge number of ways in which you can do a 1 to 1 mapping, so you take permutations, you will get that. So, that many ways in which you can do that. So, for a single code, there can be millions of encoders if k is large. So, in typical in practice, this k and n will be fairly large. So, in most systems, maybe in wireless systems, it is a little bit small for accounting for some reasons, but at least hundreds, so you have to think of k and n as hundreds. So, if you have 100 bits, how many possibilities are there? 2 power 100 possibilities are there. You have to map that into a code word. So, encoding itself can be a complex operation. So, you have to think of all kinds of ways of doing it. So, that is the kind of numbers. The examples we will see will be very simple examples where I take k is 1, k is 4 and all that. So, most people do not use such codes in practice, always k is thought of as a large number. So, those are all things that I talked about in the last class. So, a couple of other definitions. Rate of the code is k by n. This is basically the units. The units is good to keep in mind. The way I have written it down, it is bits per symbol. So, and that is a reasonably good way of putting down units. So, bits per symbol. So, tomorrow you might look at a coded system where you have say 16 QAM constellation or a 256 QAM constellation. It is a huge constellation. Even that, if for an un-coded system, you know how many bits per symbol you can carry. For a coded system, it will be lesser than that, but usually people always say bits per symbol as a unit. For BPSK, it is very clear, k by n will be less than 1. So, that is the rate. And then the important definition of EB over n naught. So, this worked out for this BPSK system to be 1 over 2 R sigma squared. So, you remember the key point I made was if R changes, something changes in this equation. So, typically your EB over n naught, you will hold a constant when comparing say coded system and a un-coded system. So, EB over n naught, you hold a constant say 3 dB or 6 dB or some such thing. For a 6 dB EB over n naught, if R is 1, you get a certain sigma. If R is half, you get a higher sigma. So, sigma changes depending on what your rate is. So, I gave you an explanation for why that can be expected because you are letting in more noise by doing a faster clocking. So, that is the way of thinking about it. But that is an important way of comparing because that equates the actual energy per information bit. If you do not do that by R division, divided by R, you are not equating the correct thing. So, you are not comparing the right things. So, like I said when you do coding, you pay two penalties. One is in terms of power, the other is in terms of rate. Both of them are nicely captured in this EB over n naught. So, if you make a plot of better rate versus EB over n naught, that is an accurate comparison between several different systems without coding with coding of different rates also. You might have different rates. So, all those things are very nicely compared. It is all there, energy per information bit. You can think of it as rate, but there is a bit of power also in computing energy. So, the thing is you should not just write SNR. If you only look at simple energy divided by noise energy, it is misleading you. You are not getting the right picture. There might be situations where SNR is important to you. If you only care about SNR and EB over n naught does not matter, then maybe SNR one can argue, but it is not really a very good way of comparing systems. So, any other question on this? So, for instance, this kind of model is very important because when people evaluate coded systems, computing bit error rate and all is pretty much impossible. For today's complex error control codes, you cannot compute bit error rate analytically. So, what you do is you set up a simulation like this. You come up with an EB over n naught, which hopefully captures a real quantity in an actual system and then you run simulations. And you see how much your EB over n naught curve shifts when you do coding and when you do not do coding. And that is your coding game and that you can expect in a real system. So, the difference is what is important. So, like I said, I have been talking about it all the time. So, in this model, you might get a plot for this might be the uncoded system and for a coded system, you might get another plot. So, remember this is BER versus EB over n naught. Typically, this is in log scale, this is in dB. So, these things are important. When you present a BER versus EB over n naught curve, you should always have the x axis in dB and the y axis should be in log scale. Don't ask me why. At this point, you should know the importance of dB and log scale. I think in the lab, I had a huge shock when people were presenting spectrum plots with y axis from 0 to 20,000 and x axis from 0 to 200,000. I think by your final year, you should be professional about presenting your results. So, you present a spectrum, what should be y axis and what should be x axis? dB and frequency. Yeah, dB and frequency. So, that is how it should be. It should be hertz and dB on the y axis. Why is dB so important? Why can't you do linear scale? Yeah. So, the resolution you get in the linear scale between 10 power minus 4 and 10 power minus 5, you cannot see anything. Only in dB, you can see something. Only there, you will see something. So, that is important. Hopefully, another thing about presenting curves, I think another thing which I realized people have not understood after 4 years of engineering. You should not present a spectrum plot which looks like, it shouldn't look like that. A spectrum plot should not look like this. Do you expect a real system to have a spectrum which looks like this? I saw a lot of people giving me such plots, but it should not look like that. How should it look? It should look smooth. Same way this curve also should look smooth. What do you do to make it look smooth? If it is not something, it is not smooth then you have to window and average. So, that is what you do. These are standard ticks that most engineers are expected to know. Surprisingly, many people didn't know in the lab. So, hopefully, you learn it at least now. I am just saying this to you. Alright. So, these curves are usually generated by simulation like I said. So, your eb over and out is important because if this number, this difference is 2 dB, say for instance, at some bit at a rate say 10 power minus 5, then even in the real system, in the real system you can expect the same 2 dB improvement. These absolute numbers on the x axis may not mean much. In your model, it will be very different. In the real system, it will be very different. But if you did the modeling right, the difference will map on to a very close, map on to a very, have a very close match with the real difference you see in a physical system. So, that is why these models are important. The reason why such block diagrams are very nice is the whole theory of digital communication is the reason why it is so successful and so clean is because of these strong mathematical models. How you can do the whole thing with mathematical models and suddenly when you go to real practice, you see the same gains. So, such kind of strong results you will not have in many other areas. So, that is why this thing is so mathematical and so clean and so nice. So, that is a bit about the model. So, let us keep going. So, the first few codes I am going to show you are bad codes in the sense that you do not provide coding gain. But they are good lessons to start off with and then we will build up on it and maybe see a few good codes. So, the first type of codes we will see are repetition codes. So, the most obvious thing to do is repeat a bit when you do not know any better. So, here is the encoder for the repetition code. You pick k equals 1. So, you have just one message m. You pick n equals 3 and your code word is c 1 c 2 c 3. So, this is my repetition code. So, what am I going to do? I am going to repeat. So, I will make a table with m and c. It is really simple. 0 goes to 0 0 0 and 1 goes to 1 1 1. So, if I put BPSK here I will get a symbol vector s 1 s 2 s 3 and if you want you can write s also 1 1 1 minus 1 minus 1 minus 1. And then this goes through noisy channel and then you get r which is r 1 r 2 r 3 and you are supposed to build a decoder here to get back your estimate of the message. So, this is the way a repetition code will work. So, this is a specific example for n equals 3. You can vary n for instance I could have n equals 5 repetition code. So, what would I do if I have n equals 5 repetition code? Yeah, just repeat 5 times. You can have any other n if you want. So, n equals 3 is the simplest example and you can easily illustrate some properties. So, traditionally when you could not for instance, so when you could not process real data in your receiver. So, what do you mean by real data? You cannot, these real numbers do not really exist. So, you have to only quantize them. So, you have maybe an 8 bit quantizer or 6 bit quantizer or something and then you give it to a receiver. Sometimes it might happen that you do not have any such thing and all you have is a 1 bit quantizer. So, in that case you have to put a slicer there and make a decision. Decision symbol by symbol which is immediately suboptimal. So, all these 3 received values are correlated, they are not independent and you have to use the data from that totally to get one estimate for your message. If you are going to immediately independently quantize each of them to 1 bit, it is immediately becoming suboptimal. So, but maybe because of complexity in restrictions in your receiver you are forced to do that. So, if you do that you are set to employ a hard decision decoder. So, that is called the hard decision decoder and the hard decision decoder is the easiest to study. So, we will do that first. So, what do I mean by hard decision decoder? I am going to slice. Slice symbol by symbol. So, which is a suboptimal thing to do and I get B1, B2, B3. What will these B1, B2, B3, B? There will be bits. They will be bits now. So, I have gone from my code word to same length vector of bits. But what are the possibilities for B1, B2, B3? If we call this vector B, my code word was only one of two possibilities. So, what about B1, B2, B3? Can be any one of eight possibilities. So, that is the difficulty. So, now we have to decode from B to you build what is called the hard decision decoder. So, this is the first type of system we will study. So, later on we will see that this is really, really bad. So, we will move on to a soft decision decoder which works with R1, R2, R3 together without doing any suboptimal slicing. So, we will do that soon enough. But first thing we will see is this. So, you do that to produce C hat or M hat. So, it is good to make tables whenever you have situations like this. So, let us look, let us make a table of B. How many vectors do I have for B? I will make this table out of order because it is convenient. So, if you just stare at this B for a while, you know immediately what the corresponding C hat has to be. It is easy and intuitive to make a guess for what C hat will be, what will be a C hat for the first one, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. What rule am I using now? Yeah, I am just seeing in some way which code word my vector of B is closest to in some sense. So, you see that and you make a decision. So, you might wonder about the optimality of such an approach. It turns out such an approach is optimal in this case. Once you make a hard decision, which is again suboptimal, after you make that hard decision decoding, which within the hard decision decoders this is optimal. So, it turns out this is what you have to do. So, I will give you general rules for how to do this later on, but for now this is a very simple enough example. So, the first thing we have to do now is I have a system like this which is a coded system. I have to find its probability of error as a function of eb over n naught and then compare that with the plot of the uncoded eb over n naught and figure out my coding gain or in this case as it will happen coding loss. So, that is what you have to figure out. So, I am going to give you a couple of minutes to figure out probability of bit error for this decoder. It is very easy. So, to help you a little bit just fix m equals 0. First thing you do is fix m equals 0. Do not play around with two both possibilities fix m equals 0. So, you are sending 0 0 0. So, now you have to find probability that each bi will be in error. It turns out you can do it that way. So, what is the probability that b 1 will be in error or b 2 will be in error or b 3 will be in error? What is the probability? I will call that p that will be q of 1 by sigma. Do you see that? Sigma is my noise variance here. So, this is probability that what is this p? The probability that bi had not equal to ci. So, of course, this is seems like it is conditioned on ci equals 0, but even if you condition on ci equals 1 you will get the same answer. So, this is the probability of error at after the hard decision. So, once you do that it is easy to figure out probability of error after your decoder. When will you make errors? 2 of them. Yeah, when if 2 of them go in error or all 3 of them go in error. So, what is the probability that 2 of them will go in error? It can happen in 3 possible ways. 3 p square 1 minus p plus p power 3. This is the probability of error. So, if you want you can simplify this a little bit. 3 p squared minus 2 p power 3. So, now, if you. So, the such an expression might be difficult to deal with. So, what is the best way of simplifying it? Just look at the leading term. So, if sigma becomes really low clearly p square will dominate over p power 3. So, it is enough if you look at 3 p square. So, this is roughly if I want an expression in terms of q it will be what? 3 times q of 1 by sigma square q square of 1 by sigma. So, one might need an approximation for q. I think we can use the exponential approximation if you want. So, you can pull the 2 in fact inside the I think if you want. But remember I have to write p in terms of e b over n naught. That is an important thing. So, let us keep this expression for a while and then see what happens if we do not write p in terms of e b over n naught and simply compare with 1 by sigma which is your SNR. So, you will see it will go wrong. So, this is probability of error for the coded system. This works out roughly as 3 times q square 1 by sigma. What is probability of error for uncoded system? It will be q 1 by sigma. If you just stare at these 2 expression it looks like you are doing better. Probability of error for the coded system is better than probability of error for the uncoded system particularly at sigma being low. So, if you plot this 1 by sigma it is very misleading or 1 by sigma square which is SNR. It is very misleading. But if you convert e b over n naught what happens? What is e b over n naught in this case for the coded game? It is 3 by 2 sigma square. R is 1 by 3. What is e b over n naught in this case? It is 1 by 2 sigma square. There is a three fold increase in 3 fold decrease in 1 by sigma because of maybe 1 by root 3 terms decrease because of the rate factor. If you do this conversion and substitute it back here let us look at probability of error for the coded system. It is going to be 3 times q square what? Root 2 by 3 e b over n naught. And for the uncoded system it is going to be q of square root of you know this already 2 e b over n naught. So, this q square is a little bit confusing. So, maybe if you do an approximation for q as an exponential or some such thing you will see how it works out. But if you plot these two things you will actually get a loss because of coding. So, if you actually plot it plot p e versus e b over n naught if your coded curve uncoded curve lies here your coded curve will actually be marginally worse. You can try this in MATLAB if you want. So, it means what does it mean? It means because you did this coding you are wasting too much energy you are not getting back enough gain for all the energy you wasted. So, that is what has happened. But if you plot with respect to just SNR it is okay. There is no problem. So, when will e b over n naught matter and when will SNR matter from physical? Yeah, when R is 1 then both are the same I agree with you. But I am talking about from a physical point of view when should you in a physical system when should you wonder what constraints will you worry about e b over n naught and what constraints will you worry about? Okay, maybe. So, what is the energy per bit? So, remember when energy is a significant constraint for you then you have to worry about plots with respect to e b over n naught. For instance, if you are working with a handset which has a battery and every all the power comes from battery and you do not want to burn the battery too fast. So, then e b over n naught becomes important. But if it is already if you are working with SNR then it means what? The energy comes for free pretty much. So, if that is the situation then maybe SNR is a better way to compare but nobody will use SNR. In terms of coding gain if you want to quantify the actual gain that the system is giving you you have to use e b over n naught. So, pretty much. All right. So, this is what happens in the hard decision decoder. So, this kind of a system repetition code with the hard decision decoder is really bad. So, now there could be two reasons for why this is bad. What are the two reasons? Comparison with coding, uncoded systems, this has become bad. There could be two reasons. What are the two reasons? Rate is less. I am sorry. Rate is less. Well, one question is maybe the code was bad. So, one might say the code was bad. Repetition code 000111 is not a good code. Maybe you should pick up some other code of the same rate that is one thing that could be a point. Another thing is you are doing a suboptimal decoding. Maybe the decoding was bad. So, what we will do next is to use the same code and use the optimal soft decision decoder. There also you will see there is no coding gain. In fact, there is coding gain is 0. So, it will be lying right on top of the uncoded curve. So, that we can show very nicely. So, once we show that then we know that the code is bad. The reason why the code is bad is basically k equals one and all is too small. You do not have that kind of flexibility in looking for this. So, we will see that as we go along. So, the next thing we will see is a soft decoder for repetition codes. So, what should you do for a soft decoder? You have R1, R2, R3. A soft decoder is going to directly work on this R and produce a C hat or M hat. That is what it does. So, there are several ways of doing it. We have already seen these detection problems. So, how do you when you receive R, you have transmitted two possibilities SS 111 or minus 1, minus 1, minus 1. What is an optimal rule? What is the ML rule? I am sorry, nearest neighbor. So, that is what we saw. Same thing as ML. So, there is no difference between this and decoding on a constellation. So, make sure you understand what I am saying. So, so far we have looked at constellations as one symbol per time. It was only either plus 1 or minus 1. You sent it per unit time. So, now it is as if my constellation has expanded over three dimensions. For bit 0, I am sending plus 1, plus 1, plus 1 over three dimensions. So, my constellation is a three-dimensional picture, but still I have only two points in my transmit constellation. My received point can be anywhere and the optimal rule is definitely to look for the nearest neighbor, nearest neighbor in Euclidean distance. So, you look at the distance and look for the nearest. Clearly, that is the optimal rule. There is no problem with that. You already derived it. So, that is what I am going to use. So, you look at the ML rule. So, you look for this distance. So, r minus 111 square and then compare it with r minus minus 1, minus 1, minus 1. So, you see if this is greater or lesser. So, let us do some simplification here. If you look at this simplification, it becomes what? It should really simplify to something nice, I hope. So, this is nothing but r 1 minus 1 squared plus r 2 minus 1 squared plus r 3 minus 1 squared. You want to see if this is, you see what I am doing. This is my distance from one point. This is my distance from the other point. I am seeing which one I am closer to. So, this is going to be r 1 plus 1 squared plus r 2 plus 1 squared plus r 3 plus 1 squared. Go ahead and simplify this and see if you get a. So, it is just a question of comparing whether r 1 plus r 2 plus r 3 is greater than or less than 0. If it is greater than 0, you decide c hat is greater than c hat is 0, 0, 0. If it is lesser than, you decide c hat is 1, 1. So, this is an optimal soft decision decoder. So, you have to look at r 1 plus r 2 plus r 3. Is there a question? I am sorry. Let us see. I will give you an example. Suppose r is minus 10, 1, 1. Suppose r is minus 10, 1, 1. What is your soft decision decoder going to tell you? So, these two are not the same. Clearly, they are not the same and one is better than the other. Definitely that is also true. So, pay some attention to how do I delete it? Anyway, so let me just do this. So, easily you can cook up situations to show that one is not the same as the other. The next question is, how we have a soft detector, the ideal detection that you can do for this code, for this error control code I have. So, what is the probability of error for this? So, that is the computation you have to do. So, again condition on fix m equals 0, which means what? The transmitted symbol vector is 1, 1, 1. Which means what will be the PDF of r 1 plus r 2 plus r 3? What will be r 1 plus r 2 plus r 3? So, you can easily show if this is the case, r 1 plus r 2 plus r 3 will be distributed normal with mean 3 and variance 3 sigma square. Do you see that? Once you condition on a given transmitted code word r 1, r 2 and r 3, all three of them become jointly normal distributed, jointly normal. In fact, independent. Once you condition on transmitted code word being 0 0 0. If I do not do that conditioning, it is clearly not Gaussian. If I do not condition on the code word, what is the distribution of r 1? It is a mixture Gaussian. So, there are two Gaussian's which are adding up. So, I cannot say anything about the sum of those two distribution. I will do a painful convolution. So, I do not want to do all that. So, if I condition on one code word being transmitted r 1 plus r 2 plus r 3 becomes very easy. So, normal with mean 3 and variance 3 sigma square. So, now when will I make an error? If a transmitted symbol was 1 1 1 or the bit was 0, when will I make an error? If r 1 plus r 2 plus r 3 is less than 0. So, given that r 1 plus r 2 plus r 3 is distributed as normal with mean 3 and variance 3 sigma square, what is the probability that r 1 plus r 2 plus r 3 is less than 0 is the question I am asking. Q of 3 by root 3 sigma. So, that is the probability. So, it is a very easy computation to do. So, you see probability of error goes as Q of 3 by root 3 sigma which works out to Q of root 3 by sigma. Okay. All right. So, you might argue I only conditioned on m equals 0. If you do 5 m equals 1, what will happen? r 1 plus r 2 plus r 3 will be normal with mean minus 3 and variance. It is not minus 3 sigma square. So, it is 3 sigma square. So, now, but probability of error will be when r 1 plus r 2 plus r 3 is greater than 0. So, once again you get a Q of 3 by root 3 sigma and you can show that. So, for the type of code I have chosen it is enough if you freeze on one code word and do the computation. One can show that that is enough. So, we will do that. So, this is the probability of error. So, let us do a conversion to E b over n naught. So, what is E b over n naught? 3 by 2 sigma square. So, use that here and you get probability of error to be Q of you should get 2 E b over n naught. So, after all this effort, what have you done? This is for the coded system. What have you done? You have found a probability of error which is the same as what you would have for the uncoded system. So, if you now plot B e r versus E b over n naught for the soft detector, what will you do? You just get one curve lying on top of each one. So, nothing in the analysis I did is specific to n being equal to 3. If I replace n by any other number say 10 what will be probability of error? Same Q of root 10. So, instead of 3 you will get 10 and if you do E b over n naught substitution then what will happen? You will get the same you can say then you will be same. Whatever n is probability of error will be in terms of E b over n naught Q of root 2 E b over n naught. In terms of sigma it will change because sigma the relationship is changed. So, the conclusion clearly is repetition code is not good enough provides no coding gain. So, I am not saying it is not good enough because it is used several times in many systems people use something called ARQ which is in some way just a repetition. It causes a loss of capacity as in does not provide really coding gain, but it is used because it is a very practical tool and it really works at some level it really works, but it does not provide any coding gain provides no coding gain. So, why did I do the repetition code in such detail? So, there are a couple of important reasons why I do it. First thing is to understand the encoding process and the definition of E b over n naught and how it works with sigma. So, that is an important change when you go to the coding system coded system. So, that is the first thing. The other thing is your probability of error changes depending on the decoder and typically soft decoders are better than hard decoders. So, all these lessons we have learnt even though we looked at a bad code. So, these three lessons are definitely true. So, definition of E b over n naught and all that is very carefully done and then you have to build a good code and then on top of that you have to decode it properly. You cannot just do any kind of decoding you want then you will not get any gain. So, any questions, any point that you are not so clear about? So, one question that is always I ask all the time is this in such a system R 1 and R 2 and R 3 are they independent or not? If you look at them as random variables are they independent or not? They are independent R 1 and R 2 are independent. Okay. So, it is a question that confuses people. So, clearly they are not independent if they were independent then yeah hard decision should be clearly optimal right. So, what is the distribution of R 1, R 2, R 3? If you have to write down the PDF or will you write it down? Let us say okay forget about R 3 just R 1, R 2. So, two dimensional PDF what is the PDF? Two Gaussians on 1, 1 and minus 1, minus 1. So, that is clear. So, if you do R 1, R 2, R 3 it will be some of two Gaussians again, but it will be in three dimensional space it is a little bit confusing. So, you see clearly R 1, R 2 are not independent. So, independent things do not have such joint PDFs. It will be nice and distributed with respect to of what axis why should have the same thing. So, this is changing all over the place. So, it cannot be that way. So, R 1, R 2, R 3 are not independent. But I am able to conveniently use some independence assumptions in the probability of error computation. How did I do that? Once you condition it becomes independent. See it is a Gaussian centered on minus 1, minus 1 and plus 1, plus 1. But once I condition what happens? It is only one Gaussian and then you have all your independence properties coming back together. So, each of those Gaussians are nice and uncorrelated Gaussians only. But since they are adding two of them they become dependent. So, once you condition on something it becomes independent and you can use a lot of simple analysis. So, once they are conditioned it is independent why can't they do hard decision? So, there is no conditioning. So, when you do receiving you do not know what was transmitted. If you know what was transmitted you do not need to do anything. So, that is why it is not very clear. So, these notions are important you might say seems very silly, but people make mistakes in these notions. I have seen particularly in exam problems all kinds of answers for questions like this. So, I think we will stop here. In the next class we will look at Hamming code which is a very standard and good example of a good code which provides a very simple coding game, but still it is a nice code to look at.