 Here is the, I forgot the lecture number, 17, so here is lecture 17 for the course. So this is a point where we break away from the classical notions and we are going to look at more modern things in error control coding. So from quiz 1 to quiz 2, it is the main topic is going to be LDPC codes. What are LDPC codes? They expand as low density parity check codes. Many people will agree that to date these are the most powerful codes that have been found. So the way they are described, the way you analyze them is completely different and you need a lot kind of, a lot of background which is different from the background you needed for classical codes. What did we need for classical codes? We needed to know about binary vector spaces, some finite fields, there is a simple connection between minimum distance and parity check matrix and all the constructions were based on that and minimum distance pretty much played the most important part. How was your decoder design? Your decoder design was basically the implementation of the syndrome decoder and you worried about correcting up to the error correcting capability. If there were t errors and I can correct t errors according to the code, I have an implementation which is reasonable and I am happy. So the whole emphasis is more or less on pure error correction. I can correct this many errors and I can do so much. But if you go back and look at the classical beginnings of coding and information theory, those are something else. So people worried about something called capacity which is, which has a very different kind of a definition and it is not quite the same as error correction. So basically the main shift is because from now on we will try to think of coding gain as the important parameter. So far we are worried about error correction. Supposing so many errors happen, how do I correct them? That was my worry so far. From now on we will worry about coding gain in a communication system. So this is very crucial and this will mean we focus on other things. So the way we construct codes will be different, the way we decode them will be very, very different and a little bit closer to practice, a little bit closer to actual digital communications. So I will explain a lot of things with respect to the BSc also. So far the only channel model we have looked at is what? The binary symmetric channel model which is quite a powerful channel model and some of the explanations I will do for the BSc again. So it is easy to explain with the BSc but another channel model which will be of interest to us is what is called the AWGN channel model. So what is AWGN? Additive white Gaussian noise. So I think most people here will be very familiar with this channel model and one can view the BSc as a suboptimal approximation of the AWGN channel. So I will talk about it soon enough but AWGN is a very, very realistic channel model. In fact most system design today is done with respect to an AWGN channel model. Even when you have the most complicated wireless system out there which has a different model, you always design assuming for a sufficiently short interval of time the channel is actually AWGN. So pretty much most of the design is for AWGN and we should look at them. There is one more thing that one needs to talk about when you talk about AWGN. In a realistic system you actually deal with, see at one level you deal with bits. Bits enter the channel and bits come out but in a realistic communication system those bits need to be converted to signals, some actual signal that goes through the channel. So in digital communication when you read the theory you describe signals in something called signal space. You can always decompose the signals you use for representing the bits in terms of some basis and you have a representation in signal space and in fact for every signal that you put out you have an abstract constellation on which you represent it. If you have a bit zero you say I transmit this point on my constellation. If you have bit one I transmit some other point on my constellation. That point in the constellation translates into some pulse shape or something, some actual signal which goes through some filtering, some channel and then maybe some receive filtering. At the end you can also do some correlation type thing and actually get out another point on your constellation. So that constellation completely captures this transformation from bits to the signal and then back from the noisy signal to the point, the signal space point once again. So this constellation in signal space ideas I am sure you are learning somewhere or you have learnt before. I am not going to go through that. I will simply say we will use BPSK modulation and the corresponding signal space. There are more complicated modulation schemes QAM all those things but we will do BPSK in this course. So I will not do the other things. So this is enough for coding most of the other things. So how does the constellation, how is it typically drawn? Even though you can do two dimensions with one signal. So typically you think of the passband signal as something where you can do two dimensions but BPSK does only one. So you are not really worried about the other dimension in BPSK. So I will take my point corresponding, my two points in this constellation as minus one and plus one. Plus one will correspond to bit 0 and minus one will correspond to bit 1. So this will be my modulation. So I am assuming all these terms are clear to you. Maybe coding gain is not too clear. I will explain what it is soon enough. So we will slowly shift from the BSE channel model to an AWGN channel model and correspondingly we will use a BPSK modulation for transmission on this AWGN channel. Is there anyone here for whom all these things sound really, really strange? You have never heard this before? No right? So you are familiar with this because usually we have some students from computer science who get scared when they put these things out. So good that we do not have such problems in this class. Did I not do that for the BCH course? I did the division thing. I said division can be implemented with LFSR. You want the hardware pictures? Look at the books. There is no decoding using LFSR. I did not say anything about decoding using LFSR. There is an interpretation for that but we will not look at those things. Only the encoding part. So yeah, I mean this is pretty much the end of the classical codes part. I will not go back and revisit them again. So this is the end of them. So how does the whole picture look now? So you have bits, so you have message bits that go into an encoder. Let us say, so think of this as a linear code and this is a systematic encoder if you will. So you have got a message vector m. This is going to put out a code word c. So presumably if this is k bits and this is an nk code, what will be this? n bits. So these are all things you can imagine. Any code you want you can imagine. Let us not jump so far. I will come slowly to the first. Just describing the picture now just to get the model through. So far what we have been thinking of, this code word actually goes through a binary symmetric channel. So that is what we have been assuming. Now I want a more realistic model. So I will say first thing I do is I convert this bits into signals, but I will just stick to the BPSK constellation description. I won't give you the actual signal description and all that. So I will say I do BPSK. What does my BPSK do? It sends 0 to plus 1 and 1 to minus 1. One way of describing this, remember these are bits 0 and 1. These are actually real numbers, rational numbers or whatever you think of integers plus 1 and minus 1. So those are numbers, but a loose way of thinking about it is what this operation does is it does 1 minus 2 times B. You see this? If B is 0 you get plus 1. If B is 1 you get minus 1. So what you get out here is 1 minus 2 times B. So if you were to for instance write a MATLAB program where you want to do BPSK modulation, your B will be a vector of bits. How do you do BPSK modulation? 1 minus 2 times that way. So it's very easy to do BPSK modulation in MATLAB. So out of this you get a vector which I will call a code symbol vector. We just need a name for it. I'll call it S. So what is S now? It's not n bits, but it's very close to n bits. It's just exactly like n bits except that instead of 0 I'm going to say plus 1 instead of 1 I'm going to say minus 1. But technically this actually belongs to what? The way to write way of writing it is plus 1 minus 1 to the power n. So it's a sequence of, sequence which contains each entry is either plus 1 or minus 1. There are n such symbols. How many different code symbols are possible? 2 power k. Remember it's a code symbols. These are not set of all possible plus 1 minus 1. Only 2 power k code symbols are possible according to my code. So this is going to go through an AWGN channel. What does an AWGN channel do? It's additive white Gaussian noise channel. So what this will do is it will add a vector n. So it's always a little bit tricky because n is always used for block length and noise is best called n. So if you call it by any other letter it's confusing. So I'm going to abuse this notation a little bit. But whenever I refer to it as a vector I'll put a bar below. So hopefully it will be clear. And in the context it's very unlikely we'll get mixed up between the block length n and the noise vector n. Look out for that but it can be a little bit misleading. What is n now? So n is the length n vector with real numbers and each number is generated according to a Gaussian distribution with zero mean and what variance? Some variance. So it can have some variance. And what else is true in white Gaussian? What is white now? Any two entries of this vector have zero correlation. So since they are Gaussian and have zero correlation they'll also be independent. So all that is all that I'm assuming you would have seen before. So in a realistic system where does this noise get added? Why is this a good model for noise? Receive an electronics noise. So it's a very good model for the receive electronics noise in this good theory about why this works out as independent Gaussian and all that. So all that is buried into this n. So if you have to write a formal description, this n is normal distributed with zero mean and i times. So it's best to write sigma squared times i. Sigma squared i n. So what is this zero? This vector is zero. It's the mean vector. It indicates each coordinate in this random vector n has zero mean. And why did I write sigma squared i n? What is i n now? It's a covariance matrix. So the variance of each coordinate is sigma squared and the cross correlations are all zero. I'm assuming you've seen this before somewhere in your lifetime. So it's not that's the noise. So what you get after noise gets added is what? Is your actual received vector r? So what is r now? r will belong to r n. Because noise is a random vector in many real numbers. Once you add it to n you would get any r. So now what needs to actually come? I don't know. It doesn't let me think I need to select it. I want to move it. I've lost touch man. It's been a long time since we had class. How do I move it? I have to move this one. So I'm trying to move it somewhere. Keep it here. There you go. So up to this point we have not done anything suboptimal. So one can show that based on the signaling and the filtering etc. This vector of r is actually very good statistics on what you can hope to get. So this is nothing suboptimal here. So this is fine. Now there are several things you can do at the receiver. So traditionally if you're really constrained by hardware and all that, what people used to do is to simply have a comparator here. You'll have a comparator here. If the received value r is greater than zero, what do you say? It was bit zero. And if the received value was less than zero, you would say it was bit one. Once you do that suboptimal comparison, you get a BSE. So from then, from code word to the received vector, it would become a binary symmetric channel. So what you're doing is from this real number r technically cannot really be a real number. But the real number r, we are only getting one bit of information. You're doing one bit quantization. So you're putting a comparator at zero. With advances in technology, it's possible to do more than one bit quantization today. So it's very easy to get very high speed converters, which converts from analog to digital and give you more than one bit. So how many bits can you get? You can get maybe 5 bits, 6 bits. It's not too difficult to think of even getting 8 bits today at a reasonably fast rate. So if I were to do something that's very, very close to practice, what should I do? I should actually have a model which converts this real number into 8 bit quantized values and have a discrete channel model for that. But that's too clumsy. And it turns out it's not really necessary. So what you do is, you imagine at the receiver, you can tackle, you can deal with this entire real vector r. So in practice, the way to think about it is, I would have a really, I would have a quantizer here with a high number of bits, so that it doesn't really matter. And in theory, I can deal with it as a real number itself, a real vector itself. So in all my receivers, I will assume that I can process this real vector r. In practice, you have to assume it's a highly quantized version of that. And it's okay today because you have the technology. So it's suboptimal because all of these, each of these, see, this is a question that every time it comes. And I think I've tried to explain it very carefully, but let me try once again. So maybe you would really understand at that time. If I did not have a code, it is optimal, you're right. If the system is uncoded, if each symbol that was coming out here was completely independent of the previous symbol, then you're right, it would be optimal to do just one bit quantization. But here, the symbol vector s, okay, each individual symbol might be uniform, but the entire vector s is not uniformly distributed in plus 1 minus 1n. In fact, only 2 power k of them are nonzero probability. The remaining have zero probability. The other vectors cannot even occur, right. So the successive bits inside, the symbols inside s, they are not independent. They are dependent. Each symbol might be uniformly distributed. I have no problem with that. But one symbol and the next symbol are dependent. And when you make the suboptimal decision, you are ignoring that dependence. You're simply saying I will decide on each symbol like it were independent. That is the suboptimal nature. Maybe we'll see some examples. It will be more clear at that point. Okay, so what do you do next is the question, right? Suppose I have to process R in a receiver, it's completely different from the previous problem that we've been handling. Okay, so far what was our model for the received vector R in the BSE? It is the code word plus a binary error vector. And you had some statistics on that error vector, right? What was the distribution for E? It was p power w times 1 minus p power n minus w. And you were able to use the distribution and come up with syndrome decoder, etc, etc, etc. All that you are able to do. Now you go from what s to R, which is a much more complicated model. Of course, the noise has a simple enough description. But it's all in continuous domain. So you have to use some more of the probability that you've been learning. But the principles are all the same. You still do something called maximum likelihood decoding. Except that the distribution becomes more difficult. So ideally, we would like to do some decoder here and produce C hat directly. So you can think of doing maximum likelihood decoding here. All those other things are possible. But this is the picture. I haven't described what the decoder will look like and all that. We'll see as we go along, what is the ideal thing to do? What's the best thing to do? How to go about it? We will see that as we go. But this is how the overall picture looks. And I want you to be comfortable with this. So for instance, I'm going to ask you a few questions, which will tell you whether you're comfortable with all that I've written here or not. Suppose there's an example. Suppose I take n to be 3. And suppose the thing I'm transmitting is plus 1, plus 1, plus 1. I'm considering a situation where the block length is 3. I haven't told you what the code is. Maybe we'll take the code to be the repetition code 000111. So the symbol vector transmitted could be what? Plus 1, plus 1, plus 1 or minus 1, minus 1, minus 1. I want you to spend some time and write down the pdf for the vector r. That's the first thing. Okay. Try to write down the pdf for the vector r in this situation. So you have to write down f of r. It's a very simple situation. Try to write it down. Please write down. Assume that the noise variance is sigma square. Okay, some sigma square. And the code words are used with uniform probability. Give me the pdf for f of r. I'm going to give you any more information. Let's see, let's see. Okay, so there is a problem which you have to pay a lot of attention to. R1 and R2. If I were to write R as R1, R2, R3. R1 and R2 are independent or not? Or R1, R2, R3. Can I say R1 and R2 are independent? Think about it closely and answer the question. Okay, one needs to be careful when you write this down. That's why I asked you to write it. Okay, so this lies at the heart of the question that Manigandar asked. Why isn't it optimal to just simply do a comparator for each R1? Is R1 and R2 independent? Will they work out to be independent? Okay, so one needs to be careful. Okay, so you'll have to carefully write it down. The best way of thinking about it is the following. Okay, you have to condition on what? On S. Okay, so that if you don't condition on S, you won't get that independence between R1 and R2. Okay, once you condition on S, okay, once you fix S, then yes, R1 and R2 will become independent. Because you know what S is being transmitted. Okay, noise that gets added is independent. There's no problem with the noise. Okay, but the SS that you transmitted are correlated. Okay, that's why once you fix S, it will become independent and you can use your Gaussian formula, standard formula to do it. Okay, so it's important to condition on S. Okay, without conditioning on S, it's difficult to write this probability down. Okay, so the best way to do it is to say half F of R given S equals 111 plus half F of R given S equals minus 1, minus 1, minus 1. Okay, if you did not write this down, you'll go wrong. Okay, if you try to average each bit independently, each R independently, you'll go wrong. Okay, in fact, that case will correspond to uncoded situation with 3 bits, all 3 will be sent equally probable. Okay, so okay, how do you simplify this now? Now, as I said, once you condition, once you say what I'm transmitting is plus 1, plus 1, plus 1. I know how to write R, R is what? What is R1? 1 plus N1, R2 is what? 1 plus N2, R3 is what? 1 plus N3. For that, it's very easy to write down the PDF. What is it? In this case, R becomes normal with mean 111 and variance. Yeah, you'll get the same, the correlation matrix, right? There's no problem. Okay, or I don't know, think about it maybe. I think that's fine. Okay, so here what will happen? Normal with mean minus 1, minus 1, minus 1 and sigma square. So, now it's easy to think of how to do that. So, I'll write down a simple enough expression. This will become, in fact, if you write that down, you'll see you can easily decompose since the things are independent. You can simply write down each thing separately and multiply. Okay, so you'll have 1 by root 2 pi sigma to the power what? 3 e power R1 minus 1 squared plus R2 minus 1 squared plus R3 minus 1 squared divided by what? 2 sigma squared. Oh, minus, I'm sorry, some old minus. Okay, then you'll have another expression which is half 1 by root 2 pi sigma whole power 3 e power minus, what will you get here? R1 plus 1 squared plus R2 plus 1 squared plus R3 plus 1 squared, 2 sigma squared. Okay, clearly, when I write it down like this, you can easily see that R1, R2, R3 will not be independent. There's no way you can factor this into a product of 3, 1 that depends on the marginal. Okay, you can find the marginals. Marginals will be, what will be the marginal? You can find out. Okay, think about how will you find the marginal distribution for R1 and the product of the 3 marginals will not be equal to this. Okay, so now that answers money contents question perfectly. So definitely, it's not optimal to deal with each R independently. Okay, because because I'm transmitting only two code words, the individual Rs are not independent. Okay, given that a particular vector is transmitted, yes, they're independent. Otherwise, they're not independent. You have to deal with it very, very carefully. Okay, so this is my overall PDF for the random vector R. Okay, so I will buried in this are these two, these two conditional PDFs, which are very, very crucial. Okay, yes. Yeah, I'm going to come to it. The reason why I wrote the overall PDF was to convince people that it's not independent R1 and R2 are not independent. Okay, otherwise people will never get convinced of that. Sorry. Yeah, I'll come to it. Okay, so like the question says, if you go through, if you go back to my derivation of the maximum likelihood decoder, you remember I did that, I stumbled through it like last month sometime, right, you might remember that. Okay, what was crucial in the decoder was what? The conditional distribution of PDF of R given that the symbol vector was something. Okay, and then how do I pick my maximum likelihood code word? That code word which maximizes my the conditional PDF. Okay, so the ML decoder needs only the conditional PDFs and the conditional PDFs are nice enough. Okay, the overall PDF looks a little nasty. And the conditional PDF you'll see is very, very nice. Okay, so let me generalize this and write down the conditional PDFs in the general case. Okay, so this is in general. Now we are getting out of the example. What is the conditional PDF I'm interested in? I'm interested in, suppose I receive R. Okay, I know this vector R. Okay, R is my received vector. R1, R2, Rn is my received vector. Okay, the conditional PDF I'm interested in is f of R given s equals s1, s2, sn. Okay, you'll see it's very easy to write down a simple formula for this. Okay, how do you do it? It's just based on the previous thing. You'll get 1 by root 2 pi sigma raise to the power what? n. That will be a constant. It will occur in every, for every s that will occur. It's independent of s. Okay, and then you have e power minus R. What will you have? Okay. Right, so I'll use a convenient short-hand notation based on vector distance. Okay, so what is this? R minus s squared. Right, which is exactly what you said before. Okay, divided by 2 sigma squared. Okay, this is my conditional PDF in general. Okay, should I worry about f of R given the codeword c equals some particular codeword? Or is this good enough? Yeah, it's not a big deal. Right, codeword to the simple conversion is just 1 minus 2c. Whether I write it as c or 1 minus 2c, it's the same thing. Right, BPSK modulation is just simple conversion. So I can either output c hat or s hat. It's the same thing. Okay, I have no problem. It's just a different way of, I think instead of writing 0, I'm writing plus 1 instead of writing 1, I'm writing minus 1. There's no change other than that. Okay, so I could even decode s hat for instance. So don't worry about c hat. Okay, so what will my ML decoder be now? The maximum likelihood decoder which will minimize my error rate. Okay, it's the optimal decoder in that sense. Okay, s hat will be argument of the maximization over u belonging to c of f of R given. Okay, so maybe I'll write c hat here. I'm sorry, I think it's easy to write c hat. Okay, f of R given what? s, but s is what? s equals 1 minus 2u. No, I'm not going to look at that case. Okay, so this is the maximum likelihood decoder. There's no problem about it. It will be optimal when all the codewords are equally likely. Right, if the codewords are not equally likely, there's a further adjustment that you have to do. Okay, we are not going to worry about it. Okay, so this I'll say is optimal if all codewords are equally likely. Okay, and we'll only look at that case. We don't care about the case where some codewords are more likely than the other. All codewords are equally likely and this is optimal in that case. Okay, so in fact, everywhere I'll make this assumption I'll never really state it, but you have to assume that all codewords are equally likely. Okay, is this clear this argument of the maximization? This is what you do. Okay, so I'm not worried about those things. Okay, something other than BPSK there is a very similar model. You can always do ML. There's no problem, but it'll get more confusing to write down. Okay, so let me not worry about that and simply write for BPSK. So same thing I'll apply to something like QPSK for instance. Right, but beyond QPSK it always gets a little bit complicated and people usually don't do the most optimal thing in that case. Maybe I'll comment about it if you have time. Okay, so this is what you do for the optimal case. Okay, so we're going to simplify this a little bit and you'll see the simplification will result in a minimum distance decoder where instead of hamming distance for the BSE case, we'll get what? We'll get the vector distance, the Euclidean distance in this case. Okay, it'll come for very similar reasons. There's no real difficulty here, we'll do the derivation real quick. Okay, so for convenience I'll make this definition S is equal to 1 minus 2u for u belonging to C. Is there what? What is this? Set of all possible codeword symbols. Okay, so C is the set of all codewords. We do 1 minus 2u for u belonging to C. I get set of all possible codewords. Now that ML rule can now be easily written as argument maximization, u belonging to S maximize r given S equals u. Okay, notice there's a slight abuse of notation if you want to be very precise and careful mathematically. You have to use different notation for the random variable and the value taken by it. Okay, so you cannot use the same notation but we'll do that. I mean I'm going to assume that you're comfortable with that. Okay, so this r for instance stands for the random variable as well as for the value taken by it in a particular instance. Okay, so we'll have to use that notion. Okay, so now let's do this simplification. It's quite easy, it's not that difficult. u belonging to S, what do you do here? You know this formula, right? What is the formula here? 1 by root 2 pi sigma to the power n, what? e power minus r minus I guess u square divided by 2 sigma square. Okay, now notice I'm trying to maximize this over all u and here is a term which is independent of u. Okay, so I can drop this term. Okay, then notice what's happening here. I have e power minus r minus u squared divided by 2 sigma square. So if I want to maximize e power minus x, what should I do to x? Minimize x. Okay, so this will be the same as argument minimum u belonging to S, r minus u square. Why can I drop the 2 sigma square? It's a positive constant and it won't change my minimum or maximum and it works out in that. That's all. Okay, so this is my maximum likelihood decoder. Okay, so it's very simple. Okay, even in this case. Okay, so what do I do given a received vector r? I look at the minimum distance from two possible symbol vectors and pick that one which gives me my, because the distance between the possible symbol vectors. Okay, and I pick that symbol vector which gives me the minimum distance. Okay, so it's useful to visualize this in Rn. Okay, so how do you visualize this in Rn? Okay, suppose I were to do Rn. Suppose this is Rn. Okay, the n dimensional real space. Where are my code symbols? Code symbols will be some stars. Okay, so I can put these stars down here. Another star here. Another star here. They'll all be somewhere. Okay, so you'll have so many stars like this. Okay, so you always transmit a star, but what happens? A random noise vector gets added to it and you can get any received vector r. So this could be r. Okay, so this is u1 if you will, u2, u3 so on. Okay, then what do you do to find the most likely code word? Look at the distance from the stars and pick the nearest star. Okay, it's very simple geometrical view. There's a very similar view for the binary symmetric channel. What happens there? Instead of Rn you have the binary vector, vector space. The stars are still your code words and then your received vector r can be any binary vector. What do you do? You look for distance but there the distance is what? Hamming distance. Okay, number of places in which two vectors differ. Here the distance is what's called the Euclidean distance or the vector distance, the typical squared distance. Okay, so that's the geometrical view. It's very pleasing and simple and intuitive. No problems. Okay, let's go back to this example that we had. What was my example? n equals 3 and my set of symbol vectors equals 111 and minus 1, minus 1, minus 1. And I'm going to give you some received vectors r and ask you to tell me what the code word should have been. Okay, so let's do this. 0.2, 1.1, 0.3. Okay, so it's very easy to see distance. You can do the calculation but if all of them are positive, basically it's going to be closer to 111 than minus 1, minus 1, minus 1. You don't have to worry so hard about finding the thing. So in this case, s hat is going to be 111. The ML s hat, the maximum likelihood s hat. And of course, it could be wrong. In the worst case, if the noise was really, really high, then you could have been pushed to this side but that's the maximum likelihood thing. You can't do anything better than that. Okay, so suppose I say I keep 0.2, 0.3 the same and then maybe I push this to minus 1.1. What happens? Oh, it's tough, right? You have to do the calculation. 2.1 square I believe is 4.41. Okay, so you can use that. What are the two distances? So the distance from this two, 111 is what? 5.54. The distance to minus 1 is what? 3.14. So what would you pick? Minus 1, minus 1, minus 1. Okay, so now if you had done the BSc suboptimal estimation, if I had done the comparator, what would have happened in the second case? If I had done, for each R, if I had suboptimally decided what it was, I would have actually concluded the decoded code word was 111 in having distance. So you see the soft or the actual ML gives you one answer and the BSc ML gives you another answer. So it can give you two different answers. And what do we know as optimal? Optimal is the soft. Okay, so by the way, there is this distinction between hard versus soft decoders. So if you do a one-bit comparator first, is used first and then you do whatever decoding you want. This is called a hard decision decoder or a hard decoder. So this received vector R, for each RI, I make a suboptimal decision first whether it is 0 or 1 and then go ahead and put everything together and do my syndrome decoder or bounded distance decoder or whatever decoder I want to do. That is, that would be a hard decoder, a hard decision decoder. This uses real valued R. Okay, that is a soft decoder. Okay, and of course the soft decoder would be the optimal one in the hard decoder is a suboptimal one. But because of complexity, you might have to do the hard decision decoder. And like I said in real life, you will neither be, when you do finite quantization, you will lie somewhere in between the ideal soft decoder and the hard decision decoder. And in fact, with in most cases with four or five-bit quantization, you can get very, very close to the ideal soft decoder. So it really doesn't matter, the quantization won't hurt you all that much. Okay, any questions on how this worked? You are okay? Okay, so the next thing we are going to do is look at a slightly more complicated example just to convince you that it's not very easy to do soft decision decoding. So it's clear that it's not very easy to do ML soft decision decoding. Yeah, coming to it. Okay, so for one, the ML hard decision decoder itself is very difficult. We gave up on it long back. Right, if you have to look at the received word and find hamming distance from each possible code word, there are just too many of them, two per K when K becomes large, N becomes large, it's very difficult. Okay, now what do you have to do? Now you have to find Euclidean distance from each symbol vector. Okay, but if when K and N become very large, all that is really very difficult. Okay, so basically the point of this example is ML decoder is difficult to implement for large and then K. Okay, and I've been saying all along that only when you go to large N and large K do you get really good performance or whatever from it's good to go to large N and large K. Okay, so I've been saying all along, we'll see why briefly, but that's a good thing to do. We want to do that. We want to go to large N and large K and we want to be able to do the optimal decoders. Okay, so but clearly it's very difficult. Okay, let's take a slightly more complicated example. Okay, so I'll say 0101, 1010, 1110. Okay, so this is my code word vector. Okay, so I'll give you a simplification which you might find easy to do. This distance between r minus u squared is what? Summation ri minus ui squared i equals 1 to N. How does it simplify further? Ri squared plus summation ui squared minus 2 summation ri ui. This is also a standard trick here. Now, you notice this term is independent of u. This term is independent of ui. Whether ui is plus 1 or minus 1, what is ui squared? Plus 1. In fact, this will be equal to N. Okay, so it's independent of u. Okay, so these two guys can be dropped. So if I want to minimize the left hand side, what should I do? Maximize summation ri ui. Okay, so my C hat can be simply written as argument of maximization u belonging to C, the dot product of Rn 1 minus 2u. Okay, so summation ri ui is the dot product. Right? So I can write it like this. Okay, so this is a simplification for the ML decoder and this heavily relies on BPSK. Okay, if it is not BPSK, it won't work. Okay, so for BPSK, it's possible to do this. This is BPSK. Okay, one can use this for this code word thing. I mean, once you use it, maybe it's not too hard. If I say the received vector is 0.5, 0.5 minus 0.7 minus 0.9. What is C hat? See, I mean, it's a simple enough code. I mean, not even a difficult code. Hamming distance is much easier, right? If you give you, if you give you a received vector, you can very quickly do Hamming distance for this lens. I mean, it's not too hard. But when you do have to do correlation, it's just, it's not the easiest thing in the world. And you have to do minus 1 plus 1 subtraction. It's not, it's not the easiest thing. 1111, people agree it's 1111. Okay, so that's what is being said. So, so you see it involves work, even when you have to do the dot product, you have to convert each code word into the symbol vector, then do the dot product and check which one is the maximum dot product among the four. It's work. I mean, you can't, you can't throw it away all that easily. Okay, so leave alone, leave alone doing for K equals 500. I mean, let's forget about it. It's over, finished. You can't even think about it. You can't do it. Okay, so that's very, very, very, very clear. Okay, so we'll continue from here on the next class. And if all this is very strange to you, please go back and read up a little bit on digital communication on that. If you're doing a course already, I think you'll be fine.