 Okay, so this is how many of you remember the number of this lecture, lecture 4, okay. What do you think the probability is that L will come up correctly? It's really hilarious, lecture 4, okay, of error control coding, of course, and we've been talking about generator and parity check matrices, we're going to give one last example to drive home a point and then we'll proceed, okay, so what's the example that we'll see the example that I've been talking about, which I said we'll carry through all through for a while to introduce some basics. So let's see, the first column in this is 110, 100, second row is, sorry, first row is 110, 100, second row is 011, 010, third row is 101, 001, right? So each of the parity checks represent that P1 is M0 plus M1, P2 is M1 plus M2 and P3 is M0 plus M2. So what is the, what are the parameters of this code? This is a, right, block length is 6, that's very clean. What is the message length? 3, right, in this case you have to be careful, n minus k you know is 3, right, which is the rank of the parity check matrix, n minus k, okay, from n and n minus k you have to figure out k, okay, in this case it's very simple, n minus k is the same as k, but in other cases it might be different, okay, so it's a 6-3 code, okay, and suppose you have to list the code words of this code, right, one way of doing it is to convert to the generator matrix and take all linear combinations of the rows or you can do it just with this, right, where do you know the message will be? The message is going to be here, right, M0, M1, M2, the first three columns are going to correspond to the message bits and the last three columns are going to correspond to the three parity bits, right. So you just try all the eight different possible messages on top of the first three columns and then calculate the parities that will satisfy each of these checks, so you can just do it by hand, it's the same as doing that P transpose IE and then doing linear combinations, the exact same thing, just doing it in a different way itself, okay, so you can do that and list out all the code words of the code, so if you say the code is C, what are the code words that it contains? The first code word of course is 0000 and then you can keep listing it, suppose I put 001 as the message, right, so you notice what are the parities which allowed to be 1, which parities will be 0 and which parity will be 1, 001 is my message, right, P0 will be 0 and P1 and P2 will both be 1, do you quickly see that? It is a useful trick to learn early on when you are looking at parity check matrices, how do you quickly figure out what the code word should be, right, did you see how I figured out, forget this out, right, once M2 is 1, P1 has to be 1 and P2 has to be 1, right, so other things are all 0, right, so this is the only way in which it can satisfy the next message is probably 010, if I do that, what are the parities which will be 1 and which the parities will be 0, yeah, 110, right, so you see the parity as a vector will actually be the combinations of the columns here, right, so it is a very nice way of looking at it, then 011 would be 101, sorry, 011 would be 101, okay, do you see how I quickly figured it out, 011, you have to XOR these two columns together, the M1 column and the M2 column should be XORed, if you XOR that you will get 101 and that has to correspond to the parity, right, so that is a quick way of figuring out these things, okay, and then 100 will be 101, okay, yeah, I think I made a mistake, oh, it is fine, 101 and what else do I have, 101 will be 110, oh my goodness, keep doing the same thing, 101 will be 110, 110 will be 011 and then 111 will be 0000, okay, so you notice that is a quick way of writing it down, it is just a simple trick, I mean if you are not comfortable with this, rewrite the generator matrix and just take all combinations of rows, that is another way of doing it, this will help you out in some cases, okay, alright, so that is all I wanted to do about generator matrices and parity check matrices, if there is something that is disturbing you, now is a good time to ask me, okay, if there is something that you have thought about and did not make sense, if you are in the habit of looking at your notes long before the exams, you might notice something, if there is nothing disturbing I am going to keep moving along and we will move to the decoder side, okay, alright, so couple of administrative announcements, I think I have a reasonable list of people with me, I think which seems to be quite reasonable, I have to get the website started, once the website starts, I will put all the homeworks in there, even now, all the homeworks for the whole course from last time, I will put it up, maybe I will add some problems to it as we go along, for now you will have problems, okay, that will be a good source of problems, other than that I do not know if there is a cheap book available out there, I will encourage you to pick up problems from books and look it up, but that homeworks should be a good source of problems for you, if you want to test what you have been learning, okay, so let us move on to the decoder side, okay, so the decoder side will give us a lot of interesting insights into what else is important in this code, okay, so right now we have been seeing for linear code, pretty much the only property we have seen are the block length and then the dimension, right, so this is only so far, we can go with that, so what else is important in the code and that is where the decoder will enter the picture, okay, so I am going to redraw this diagram, okay, when we had the message m, which was encoded by a, with a say NK code, okay, linear code if you want, then we got a code word C, which went through a, let us suppose a binary symmetric channel with transition probability and then you would have got a received vector R and now you have to decode and I will try to write down a description for the decoder, okay, and will be a little bit different when we write down the decoder, previously when I wrote down I said the decoder has to figure out what m was, I will say, so to really get to the core of the problem, we will define the decoder as something which produces C hat, what would C hat be now, which is a estimate of the transmitted code word, okay, so the advantage with this now is what this does is basically takes you from code word to code word, which means the encoder does not matter at all, okay, you can replace the encoder, the decoder will not change, all you have to change is what you do from C hat to get the m hat back, okay, so that is an advantage and it also very important because it captures the most important property, see the code word is what goes through the channel and anything you want to optimize should be at the code word level, what you do from the message to the code word is usually not very relevant, okay, so the code word is what's important, so that's the important thing, so as we saw before, this is one of 2 par k possibilities, right, this is k bits, right, the message m is just k bits, one of 2 par k possibilities and the code word C is n bits, even though it's n bits, it's how many possibilities, one of 2 par k possibilities once again, this is the one to one map, right, we said the code word was a list of 2 par k vectors, but what do we know happens after the binary symmetric channel? You can get 2 par n possibilities, okay, but what do you think will happen? Do you think all the 2 par n possibilities will be equally likely? Yeah, it cannot be, there's equally likely there's really no point in transmission and reception and all that, might as well give up, right, it will not be equally likely, there'll be something that will happen, what do you think will be the most likely received vectors are? And just based on the binary symmetric channel probability, it flips a bit with probability p and usually p is less than half, okay, so it's always, you take p to be less than half, so let me ask an interesting question, what do you do if p is greater than half? Yeah, just rename your 1 and 0 and 0 as 1 at the other end and you'll have another channel with p less than half, so p greater than half is really not a unique different case, p less than half is what's interesting. So once you have p less than half, what do you think are the most likely received vectors are? The code words, right, do you see that? The code words were the transmitted vectors and since p is less than half, you're not going to be flipping with very high probability, right? So you're very likely to get the code word itself, some code word itself. Then after the code words, what will be most likely? Vectors which have one bit flip from the code word, why? See, anytime you flip, you are doing something which is a less probable event, okay, so what is more likely is going to be code words and vectors that are one bit flip away from the code word, two bit flip away from the code word likewise, okay? If you have a vector which is many flips away from any code word, it's very less likely that you'll receive it, okay? So in case now, if you want to write down a probability distribution for R over all the set of 2 power n n bit vectors, it will not be uniform clearly and it will peak at the code words, okay? At each code word, it will have the maximum probability which will also be the same, okay? Right? All those things are simple things you can figure out based on the probabilities of the binary symmetric channel, okay? So now what is the task of the decoder? The decoder is basically a function which goes from 0, 1, n to what? So now that I say c hat, it will be actually the code c, okay? So let me call this code as c, okay? It has to go from 0, 1, n to c, okay? So without any other consideration, how many such functions can you have, right? What do I mean by a function from 0, 1, n to c, okay? A function has an input which is an n bit vector. Every time you give it an n bit vector, what does it have to give out? Some vector from the code, okay? Without any other consideration, in how many ways can you do this? 2 power n, 2 power n, 2 power nk, right? Am I right or wrong? Okay? To each vector, to each n bit vector, I can assign any code word, any one of 2 power k code words. So I have to now do that 2 power n times, so it will be 2 power, 2 power, am I right or am I wrong? 2 power k raised to the power n or 2 power n? Anyway, do the calculation. I think it is 2 power k raised, whole thing raised to the power 2 power n, which will be 2 power k, 2 power n, okay? So that will be the answer, but that's a lot of things. You can easily rule out a lot of things, right? If you receive a code word, what do you think the c hat will be? If r equals a code word, what do you think your c hat will be? The exact same code word, right? So you wouldn't assign it to anything else. So all those functions will go theoretically, they are there. You don't have to worry about any of them, okay? But what you would really like is what? To pick that function which will give you least error, right? That's what I want. I don't want to pick any function. I want to pick that function or maybe there are more than one functions like that. It can also happen, okay? That will happen. We'll see some cases where there can be more than one function. That one of those functions which will give me the least possible error, okay? What's error now? Okay? So my figure of merit is very important. Probability of c not equal to c hat is error, okay? So this is probability of block error. Yes, give me one minute. Let me write this down. Yes? I can't understand why you chose c hat instead of m. And why you c hat was more optimal? I didn't say c hat was more optimal. I could, first of all, do you understand, do you accept that this is the one way of doing it? There's nothing wrong with this, right? So you can do c hat. The thing that this frees me up from the encoder, right? If I give out c hat, oh, it's all very advanced. I'm going to come to that soon enough. Please wait. Be patient, okay? So I could put down m hat here, but I'm limiting myself by the encoder choice. Then I'll have to do the joint encoder decoder design. I don't know who knows. You'll see if you actually do that also, this is a good enough thing to do. You can decide on the decoder first since it's anyway one-to-one mapping. You can definitely decide on what's code word was transmitted and then go back to the message. It's also optimal. It doesn't lose anything. But what it does in the meantime is it tells you what's really important. What's really important is the set of code words and not how you went from the message to the code word. You can do that mapping in any way you want. What's important is the set of code words. You'll see later on as we go along. This brings out a very important problem. So what I said makes no sense, man. You have to do it this way. Even if you start with those kind of assumptions, you'll end up with something like this. You don't worry about it. So out of all these functions that you can possibly have, I want to pick that function which will minimize my probability of block error. From here, you'll see you'll go to this maximum likelihood decoding, which will take you to some minimum distance decoding for the BSE. All those things will happen. We'll go to that slowly. So I want to introduce that. But this is my objective. I want to pick that function which will minimize my probability of block error. So this is my main objective. So let's see. So what I'm going to do next is a little bit of a derivation to show something called distance is important. Distance between something called hamming distance between two binary vectors is important. If you are already convinced that it's important, you don't have to pay much attention to this derivation. So this is the derivation which will tie up that function to the function that we are looking for, the optimal decoding function to the distance properties. So just to derive that the distance is important and it's an important measure for the binary symmetry. So let's see. Let's look at this probability that C hat is not equal to C. So when I deal with these probabilities, so when can I say, talk about these probabilities? I need random variables. I have to define the probability space. I have to do all that carefully. What I'll do is I won't do all that. I'll simply take the same notation that I have, which I have for code words and just whenever I need random variables, I'll simply use the same thing. But that's a big abuse of notation. You should never say probability of X when X is a random variable and a vector. You can't keep using it interchangeably. You have to say probability that a random variable equals some value. That is what makes sense. You can't just say probability of a random variable. You have to say what it is equal to. But I won't do that. I'll simply go through this without worrying about that. So let's see. So first thing I'm going to do is, so this is a probability that says C hat is not equal to C. So one of 2 power k code words might be transmitted. First thing I want to do is condition on what that code word could have been over the 2 power k possibilities and figure it out. So I'm going to say this is equal to summation over, I'll write down summation over U belonging to C all code words. Probability that C hat not equal to U given C equals U times probability that C equals U. Is that fine? That's a very simple expression that I wrote down conditioning on what code word that I actually transmitted. And what do you think this probability that C equals U is going to be? Unconditional. What's the probability that I get? Yeah, is it reasonable to say it's 1 by 2 power k? Yeah, if my messages are all equally likely, then I'm going to say this is equal to 1 by 2 power k. And any minimization I do over C hat not equal to C should actually be a minimization of the conditional probability and not the, right? Do you accept that? Yeah, you don't care really about the other term, since that's anyway a constant, it's going to come out. So you really need to worry about minimizing this case. So you can show, I mean, if you're not convinced, think about this. Minimizing probability that C hat is not equal to C is equivalent to minimizing probability that C hat is not equal to U given C equals U for each U. So you'll have to do that. So once you do that, you will get this. Is there a question here? Yeah, capital C is a set of all coordinates. I'm sorry, thanks for reminding me. I think I thought I wrote that down in the previous page. Maybe it's not clear. So what is the C? This is the code. Set of all coordinates. So I mean, even if the way I write it, the size is not very clear. If I put a bar below, it's a vector. I just write a capital looking letter, it's the code. So I'm actually interested in minimizing this conditional probability. So if you do that, you will actually get a minimization. It makes no sense for one codeword alone to pick something that doesn't minimize that, right? You can go back and pick that again and you'll show it will be less. It's a technical thing that you can show. But this will be, this will work out to the same thing. You can easily see from that expression that this is the best thing to do. For each U, you have to minimize that conditional probability. Now I'm going to bring in my binary symmetric channel, right? So what is the C? C was, let's try to evaluate this guy. How am I going to evaluate this guy? Suppose this U is, let's say U0. Okay, so let me be careful here. Okay, no, before I do this, let me just, let me do the other thing and then I'll come back. I'm sorry. Yeah. How did I directly say that? You really want me to explain this? Let me do this after the class if you want to come and talk to me. Okay, so now let's deal with this. Did I write it down correctly? Okay, so I'm trying to figure out what the next step is. Okay, so let me do this equivalence then. I think this should work out okay. This is the same as, can I write 1 minus probability that C equals C hat equals U given. Okay, I'm minimizing this over all U in C. Okay, I think I'm getting lost here. This is what happens when you do, you want to do this off the top of your head. We just spend a couple of minutes and do this. Some more have to get R into the picture, man. R is not entering the picture at all. What's happening? I don't know. I think it's C hat. Wait, wait. Give me one minute. I think I used to know this. There's also a place where you have to do use base rule and all that. I don't know why this thing is working out. Oh, okay, okay, okay. Now I know what I have. Okay, so I have to bring in R also. So I think I got myself into a nasty soup by not bringing in R properly. Okay, okay. Okay, so I think, let me, let me, let me, I'm sorry, I'm sorry for this. I think I know where I went wrong. So let's, I think I have to condition this on R. I think I have to condition this on R. Then I'll get the right. Okay, so the first conditioning should be on R. Okay, so I think I jumped a little bit. Apologize for that. First conditioning should be on R. And then I need to get R into the picture and then write something. I think Mukunda and you have a lot of work to do. And I write it down. Okay, so let's condition on R first. Okay, and then get this R out of the picture. Get R into the picture and then I'll do this. Sorry for this. Okay, yeah, I think this is the, yeah, this is where the argument comes. Okay, suppose you want to minimize probability that C hat not equal to C. You can show you have to minimize. Yeah, now this works out. Minimize probability that C hat is not equal to C. I think there's no U. I'm sorry. Yeah, I think that's why the mistake came. There was something wrong. C hat not equal to C given R for each R. Right? I think this is what I wanted to say first. Sorry for that. So you have to minimize this guy for each R. There's no sense in not minimizing this for each R. Right? So if you have to minimize this, it's better to minimize this for each R. Or if you don't, then you can you can you can show that there's a better choice. Okay, so minimizing this will be same as minimizing this for each R. Okay, apologize for that. Now, once you do this, then you can start with this expression and that expression is not too bad to deal with. Okay, so what am I going to say now? So let me look at this expression closely once again and minimizing probability that C hat is not equal to C given R is the same as maximizing what? Maximizing probability that C hat equals C given R. Okay, so this is the same. There's no problem here. This will be one minus that. So obviously I can do that maximization also. Okay, yeah, from here I can I can do something very easily. Okay, so now let's use base rule now and try to evaluate this. Okay, so what happens when you write it with base rule? Probability that C hat equals C given R is the same as probability that receive R given C equals C hat. Okay, then you would multiply with probability that C equals C hat. Okay, divided by probability of R. Okay, so that's what is this expression. Okay, all right. So now I'm trying to maximize this guy, right? Okay, so notice which are the terms that are going to be important here. See, this term will it depend on what C hat is, right? Or C is it doesn't really matter, right? Whatever C is, you'll get a probability of R which is not going to change. Okay, so similarly here, this is also a constant, right? All these things will go away. Okay, so when you want to maximize probability that C equals C given R, assuming all these things are going to drop out, it's the same as maximizing probability of probability of R given C equals C hat. Okay, so that gives me what's called the maximum likelihood decoder. Okay, so if this argument was not very convincing to you, apologies for it. It's not very well prepared for this, so this that shows. Okay, so you want to maximize probability of R given C equals C hat. Okay, so this is over all C hat belonging to C. This is what you want. Okay, so R is your received vector. You try all the C hats in C and pick that C hat which gives you a maximum for the probability of R given C equals C hat. No, I mean made a messy way through it, but finally I think I got the right answer. Okay, so how do you pick your C hat at the decoder? You go through each and every codeword in your code and calculate this probability of R given C equals C hat. And then pick that C hat which gives you a minimum maximum probability. Okay, this is the maximum likelihood decoder. Okay, so the C hat is a little bit misleading. C hat you want to write down as the final output of your codeword. So a better way of writing it is the following. Okay, a better expression is the following. I am sorry, so far I have not used the assumption that P is less than half. I have not come to that at all. P does not play a picture. Play a picture, play a role soon. Okay, a better way of writing it is this notation. C hat is argument of the maximum U belonging to C probability of R given C equals U. Okay, so this is a very compact and nice way of writing down a maximum likelihood decoder. Okay, so the final decoded codeword is the argument of the maximum. What do I mean by argument of the maximum? Okay, that U which maximizes that expression. Okay, probability of R given C equals U. Okay, so now you might wonder is that probability evaluatable? Can you find the probability? Probability of R given C equals U. You can find that probability. I will show you how you find that. It is not very difficult. I am sorry, just give me one minute. Yes, you can show since all the terms are positive. No, but all of them are positive. So you have to minimize each term. Yeah, I understand. For minimizing that, you have to minimize each of these conditional terms because all the terms are positive, right? Even though they are not equal, they are all positive, they all add up. So if you have one decoder in which the conditional is not minimum, I would go to that point, adjust it so that that condition becomes minimum, then you will get the least. But this is fine. That's a fine argument. It's not wrong. What's the confusion? That won't happen. Why do you say that will happen? You don't have to... No, no, no, no, no. I think that will not see that that will not that will that will only involve given R no, given that you received R. I think you see, you see the point. I'll have a policy for given that I received R. I mean, it won't change anything else. If the R changes, my decoder completely changes, right? I produce some other output. It's not tied up. Do you understand what I'm saying? So this probability, this conditional probability, okay, let's take this outside of class. Okay, but I'm convinced that this is correct. See, you understand for a particular R, I have an algorithm which will give me my C hat, right? If the R is different, I have another method, something else which will give me my C hat. My policy is different, right? So this probability will not be affected by what I do for another R. It's given R. So it's very clear. Don't worry about it. You have to minimize each of these conditionals to minimize the total. Okay, so all right. So let's look at this probability once again closely. This says the probability of R given C equals U. Okay, so this is the probability that I want to evaluate. Oh, am I going to evaluate it? What's being transmitted is U. Okay, so what is U? The vector U which is, let's say U0, U1 till U n minus 1. This was the vector that was transmitted. This is going into the BSC which say error probability P. Okay, what's coming out is R, right? R is R0, R1, Rn minus 1. Is that clear? This is the situation. This is what's happening. What is the BSC going to do? It's going to take U0. It's going to flip it with probability P and it's going to retain it with probability 1 minus P. So what do I know? Each Ri will be equal to Ui with probability what? 1 minus P and it will be equal to say let me say Ui bar with probability P. Right? What is Ui bar now? It's the complement of Ui. Okay, it'll flip it with probability P. Okay, so given R and given U, right, probability of R, you know what R is. R was the received vector. Given that the transmitted code word is U, this probability is very, very trivial to evaluate. Okay, right? How do you evaluate it? You look at each R. If it was flipped, you put a value P next to it. If it is not flipped, you put a value 1 minus P next to it and then multiply all of those together, you'll get this. Okay, so probability of R given, I'll simply write U. Okay, this is P times what? What will you have here? Number of places where R and U agree, right? So that's the, that's the exponent here. Now disagree, I'm sorry, number of places of disagreement, right? And then what? 1 minus P to the power n minus this k. Okay, is that clear? Okay, okay. So this number of places of disagreement between two vectors is what's called the hamming distance of two binary vectors. And you see it plays, and you see it plays a very key role in the decoding over a binary symmetric channel. Okay, so that's why the hamming distance makes sense and it's introduced. Okay, so let's introduce the hamming distance, hamming distance. I'll use this notation d sub h of U v is the number of places where, places of disagreement, I'm sorry, between U and v. Okay, yes. I'm sorry, you want me to repeat the probability? Okay, let's take an example. Suppose U is 0, 0, 0. And suppose R is 1, 1, 0. What's the probability of R given U? In the first instance, you would have transmitted a 0 and you would have received a 1. What's the probability of that happening? P. In the second instance, you would have transmitted a 0 and you received a 1. What's the probability of that happening? P. In the third instance, you transmitted a 0 and what did you receive? 0 again. What's the probability of that happening? 1 minus P. Now, how do I know all three are independent? That's the assumption of the binary symmetric channel. I said whatever happens to the first bit does not influence the other bits. So what's the total probability of R given U? You can multiply all these three individual conditional probabilities and you'll get P times P times 1 minus P, which is P square into 1 minus P. Is that clear? So the same thing I wrote down for the general case, P to the power number of places where there is disagreement and 1 minus P to the power n minus that, n is the total length. It's a very simple term. So that's the Hamming distance definition. So with the Hamming distance notation, I can write probability of R given U is what? P to the power Hamming distance between R and U times 1 minus P to the power n minus the Hamming distance between R and U. Is that clear? So I'm slowly getting to what you were suggesting that the decoder should do. It should try to maximize the likelihood of receiving R given that a particular code would have transmitted and that would eventually turn out to be related to the Hamming distance between R and each of those code words under consideration, which is U. So let's see. So let me rearrange this a little bit. So you notice this d power dHRU is showing up in two terms. Maybe I can collect both of them together. So this you can write as 1 minus P to the power n times what? P by 1 minus P. So here is where the P being less than half will play a role. So this is what I wrote down. Since P is less than half, which will be greater? P or 1 minus P? 1 minus P will be greater. So P by 1 minus P will be a fraction which is less than 1. It has to be that. Remember that. We'll use that soon enough. So now what am I trying? Let me remind you what is it that we are trying. We are trying to find the best possible code word which will minimize my probability of error, which I said has to be the argument of the maximum over all U and C probability of R given C or given U. I am sorry. This we agreed. Now I have shown this probability is nothing but what? Some constant which I don't care. See 1 minus P to the power n will not change if I change U. There is no dependence on U. What's the only thing which will change? So I can say this will be completely say proportional to P by 1 minus P to the power the hamming distance between R and U. So for instance, I am trying to maximize P by 1 minus P to the power dh of R U. Maybe I will take logarithms. Maybe I will take logarithms to move the hamming distance away from the exponent. But what kind of a function is log? It's an increasing function. So the maximization will remain a maximization even if I take log. So this is the same as argument of maximum over U belonging to C logarithm of this. What's the logarithm of this? dh of R U. Can I ignore the other term? I have to be very careful there because it could become negative. If it's positive, yes, I can ignore it. 1 minus P to the power n I know is definitely positive whatever value I take. But I have to hold on to this log P by 1 minus P till I make sure that it is positive or negative. Is it positive or negative? Log P by 1 minus P. It will be negative because P by 1 minus P is less than 1. So what should I do to the maximum? I should convert it into a minimum since it's negative. So I see finally my C hat becomes argument over the minimum, argument of minimum over U belonging to C the hamming distance between R comma U. So that's a quick definition. It was meant to be a quick definition. As I said, if you're convinced that the hamming distance between the received word and the code words is important, you don't really need to understand the definition. But this is how theoretically it works that it's optimal to do distance decoding, distance based on decoding based on hamming distance. So let me go back to the proof once again and visit each of these things and just remind you of all the assumptions that we made. The first place where we made an assumption was here. What was the assumption? All code words are equally likely. That's an assumption that we made. I said that was a constant. I threw it out. That happens only when all the code words are equally likely. In practice in many situations, you might have some idea about this probability. You might know that some code words are more likely than the others. So in communication system, you have several stages of coding, etc. In many stages, you might know that the code word is more likely to have been something else. In those cases, it is not optimal to drop this. You have to remember that. So that's crucial. So that's the first simplification we made. Once that is clear, then you can obviously come to the likelihood and simply try to maximize the likelihood. Up to here, there is no problem. There is no further assumption other than the code words being equally likely. How did we bring in the distance? We brought in the distance only because we were dealing with binary symmetric channels. If the channel was different, you will probably get some other distance or maybe something else depending on the conditional distributions. Anything can happen. Here, the conditional distribution was nice and binary and symmetric. So we got this wonderful having distance simplification. We finally got there. So as we go along in the course towards the later half of the course, we will relax some of these assumptions and we will generalize it. At that point, I will come back and revisit this. You should not be surprised. You should not be surprised that this minimum distance decoder which you thought was optimal all along is suddenly not optimal anymore. It's only optimal under these two assumptions. But nevertheless, this is a great result. So imagine from all those huge number of functions that I had for the decoder, what have I come to? I have come to just a specific method of finding what C hat should be. Eventually, we will see we can even simplify it slightly further for the linear code etc. But for now, this is a nice formula to keep in mind. So let me illustrate this with an example for the 6-3 code. I will show you how you work it. Once you get convinced and you are comfortable with this notion, we will look at this having distance more closely and figure out what's the relationship between the distance of the code words and all that. Distance will be a very important measure for our code. We will iron that out after we go along. So let's take this example. We will start with one of the simplest examples out there. I will take the n equals 3 repetition code. My code just consists of two vectors 000 and 111. Notice I am not even talking about the encoder. The encoder is probably clear to you. Not talking about it. The code is enough. For the decoder, I have to start with the code. Encoder is an afterthought. So let's see my decoder. How will I describe the decoder? What's the best way of describing a decoder? I could have a table, right? A table which tells me how you go from R to C hat. So that's my decoding table. Let's say I am going to put R here. I am going to put C hat here. So what's my method? My method is going to be ml or maximum likelihood decoding over the binary symmetric channel. Suppose I say R is 000. How will you do maximum likelihood decoding? So I know it's trivial but let's go through the process. So I will find the hamming distance between 000 and each of the two codewords and pick that codeword which gives me the minimum hamming distance. So here there are only two codewords. You are zero away from one codeword and three away from the other codeword. It's very clear. You're going to say C hat is 000. So likewise spend a couple of minutes and keep doing it on your own. Don't look at the screen. I will do it also but spend a couple of minutes and try and do it on your own. It's quite simple. So you see it's also in the maximum likelihood decoder also makes very simple direct intuitive sense. You're transmitting 000 and 111. If you receive something with two or more ones, what are you going to say it is? It's 111. Otherwise you're going to say it's 000. It makes simple, very clear, intuitive sense. Now I'm going to ask you a slightly difficult problem. How will you compute the probability of error for the maximum likelihood decoder? It says maximum likelihood decoder. It sounds like the perfect decoder. Will it make an error? Of course it will make an error. Will it make an error with non-zero probability? Yes, of course with non-zero probability. And you can even compute it. What is it that you cannot do? You cannot find any other decoder which will achieve a lower probability than the error probability of this decoder. That's the only thing you cannot do. It doesn't say probability of error is zero. It only says you can never find any other mapping from R to C which will give you a lower probability of error. That's what it says. You should remember that. Even when you receive even when R is 000, there can be error with some probability. All three bits could have been flipped. You can have an error. I want you to spend a couple of minutes and try and find the probability of error for this decoder. It's an interesting exercise. When I do it for the general case, it will help if you think about it on your own and try to do it. It's a simple situation. Try to find the probability of error. So this is once again a test for you. I mean, do you know enough probability? This is a test for you. If you see yourself getting into loops, then something very basic. Okay, for evaluating probability of error this way, it's better to do the conditioning that I did wrongly first. I mean, when I tried the derivation, I did the conditioning on the codewords which was not the first thing to do. Condition on received vector only, then you'll get a nice codeword definition. For evaluating probability of error given a codeword, you should condition on what the codeword transmitter would have been. That's a very simple way of doing it. It'll give you an answer or if you're very smart, you'll immediately see the only way that error will happen is if the channel makes two errors or three errors. So you can use all these fancy tricks and come up with the answer. So if you do that, you'll see probability of error will be probability that the channel makes two errors which can happen in three different ways with each with probability p square into 1 minus p and then it could make three errors which can happen in only one way with probability p power 3, right? It's a simple situation to add up. If you want, you can add it up, you'll get 3p squared minus 2p power 3. Okay, have we achieved anything? By coding and decoding in this way, have we achieved anything? What have we achieved? Exactly, p square is going to be less than p, right? Why? He's less than 1, okay? That's enough, okay? But there are disturbing things. There's a factor 3 that's multiplying and then there's a minus 2, minus 2p power 3, you are happy, right? I mean, you don't care, right? It's not only reduced things, but even otherwise, p power 3 will not play a role when p square is in the picture, right? You're typically thinking of p being what? p will not be 0.4 or 0.45. Then you're really in a bad shape, right? You're thinking of p being 0.01, right? 0.01 at least, right? 1% of the time maybe you have errors and then you're trying to put in a code to fix it. 0.01, what is my probability of block error now going to be? Right? 10 power minus 4, right? Three times 10 power minus 4. I can safely ignore the other term. It's not, it won't hit me too hard. So I've achieved something, right? I've gone from p to p squared, which is better. What is the penalty I have paid? Rate is 1 by 3, okay? So you notice rate is 1 by 3 and probability of error has become p square, okay? You might wonder, is this a good enough trade-off? Can I go to a rate 1 by 3 for a probability of error p square? It depends on the situation, right? You have to evaluate the situation in reality and see if this is a good enough trade-off. You'll say in many cases, this will not be a very good trade-off. We'll see later on, this will be a bad trade-off, okay? But anyway, this is at least a trade-off. You have something, right? It's not like you don't have anything to do, okay? So next example we're going to see is slightly more complicated. It's going to be the 6-3 example. And I'm going to ask you to do this, but not the complete table, right? Partial table. Let's take the 6-3 example, okay? The 6-3 example, I was told I can cut and paste very easily. So I'm going to go back to the time when I wrote down all the code words of this code, okay? So I was told I can copy like this, okay? Then say copy, and I can go back to the, go back here and say paste. No, no, no. Paste, right? Paste. Ah, there you go. It's brilliant, okay? So this is my example for the code, right? So let's see, if I have to now make a table for the decoder, for the optimal decoder between R and C hat, it's a lot of work, right? First of all, how many, how many entries will you have? 64, okay? So that's a lot, okay? But on top of that, figuring out what C hat you have to choose for each R is also a little bit non-trivial. It's quickly getting a little bit non-trivial. It's not trivial anymore in the previous case. For instance, some cases will be very trivial. Suppose I say R is this, right? Which is, this is a code word that will give you the minimum hamming distance from all zeros. The all zeros is already there, right? So obviously, there cannot be any other code word which is distance zero from zero. So you can easily decide this will be all zero. Likewise, for each of the code words, R being equal to each of the code words, you can happily write down what it will be, okay? There's no problem. But suppose I say, suppose I say something like this, let's say, okay? Suppose I tell you R is this. I mean, one cannot immediately see what the closest code word will be, right? Can you, can you quickly see? Okay. Okay. Maybe in this case, you can quickly see. Okay. So even otherwise, you see, it's slightly non-trivial. If I keep throwing out R at you, you can't, you can't quickly do it. Okay. Right? It is, it is a little bit non-trivial. It's not so easy to do. Okay. Now, suppose I say, I get, I go to a 1000 comma 500 code. Okay. Okay. Suppose I give you a G which is 500 by 1000 and define some encoder. Okay. And you know what the optimal decoder is, right? You have to, to go through all the two power 500 code words and we obviously, you can't do it. This doesn't even enter the picture. Okay. While we know the optimal method, we don't know if it can be implemented very efficiently. Okay. So efficient implementation of ML, you'll see will be a problem that will stay with us for a long time. Okay. So it's a very difficult thing to do in reality, but it can be done in many cases. Okay. Efficient implementation is, is going to be non-trivial. It's, it will not be a very trivial thing to implement efficient. Okay. Okay. So by the way, the decoder that we've been doing, okay, the R min of U, U belonging to C, the hamming distance between R and U is called the nearest neighbor decoder or one can even call it the minimum distance decoder, but nearest neighbor decoder is very, depending on whether you like British or American spelling, nearest neighbor decoder. Okay. So this is the name for it. It didn't officially name it. We'll name it the nearest neighbor decoder. It happens to be the maximum likelihood decoder for the binary symmetric channel. Okay. That's the, that's the decoder. All right. Okay. So there's a very useful graphical tool to understand, understand nearest neighbor decoder. Graphical tool, as in there's a picture that you can draw, which will kind of tell you what nearest neighbor decoder is doing. Okay. And you'll see the role the hamming distance plays. And from there we'll define some important quantities for a code. Sorry. Yeah, I'll talk about the cube. The sphere you can call it if you want. Okay. Welcome to that slowly. Okay. So what we'll imagine is we'll imagine there is some space where all this set of n bit vectors are sitting. Okay. So here's a graphical view of nearest neighbor decoding, which is very important. Okay. So this will give you a nice idea of what's happening. Okay. So I'm going to draw a big circle here. But you have to imagine the circle as some space which contains all the n bit binary vectors. Okay. So it's not a continuous circle or anything like that. It's got all n bit binary vectors. How many points does it have? 2 power n. Okay. So all those points are there. Maybe I'll, okay. I'll not, maybe, maybe I'll draw, maybe I'll draw with some other color. I think there's a color. I don't know who we choose color. I think this should be color. It's medium. That's highlighting. Okay. I don't know. Let me not bother with it. It's got 2 power n points. So maybe I'll put them as dots. Okay. There's lots of things around. Okay. If you're an expert in rangoli or column or something, you can do it in a nice pattern, but it doesn't matter for us. Okay. So that's, that's the 2 power n vectors, n bit binary vectors. Okay. So that's how it is. Out of these 2 power n vectors, my code consists of how many vectors? 2 power k vectors. Okay. So the, since my code is very important, I'm going to say, I'm going to denote my code words with a star. Okay. Those are my stars, right? My code words are my stars. Okay. So among all these stars, you have Shah Rukh Khan, who's the all zero code word, and maybe you have the other people. If you want, you can name them anyway you like, but these are my code words. Okay. I know when I transmitted, the transmitted vector was actually one of these stars. It was not, it was not any point on the, on this space. Okay. But when I receive, what can I receive? I can receive any point on this space. So suppose say I received this point. Okay. What do I have to do when I decode? I have to find the hamming distance from the stars. Okay. So I find the distance from the stars. How far am I from this star? How far am I from this star? How far am I from this star? Etc. Okay. So I find all these distances and then I go to the nearest code word, which is the nearest neighbor decode. Okay. So I decide this is my transmitted code word. If this was my received code word R, received vector R. Okay. So if this was my received vector R, my transmitted code word is going to be C hat. Okay. So this is a graphical way of visualizing what is actually happening. I am somewhere in this space. I look for the nearest star. Okay. And then I say that is my transmitted code word. Did you have a question? No, no, no. There is no difference between the stars and the other dots. There are my code words. There are my designated code words. They are again N bit vectors. They are also N bit vectors. There is nothing different about the stars. Okay. Just I have chosen them to be in my code. So they are special. Okay. Stars are also human beings. Right. Shahrukh Khan is also a man. Okay. So there is no difference except that they have been chosen to be the code words. Okay. So this is what I am doing. This view is very important. Okay. This is a very graphical, nice visualization that you can do of what is actually happening. Okay. Even when you go to N equals 1000 and K equals 500, this picture can be in your head. Okay. There will be so many stars and there will be many more regular dots. Okay. But you can always picture this. So that is what you have to do. Okay. All right. So. Yes. Total number of bits. What is the idea? As if bits are there. Yes. Yes. No, no, no. Don't think of it that way. If you have, if you have, suppose I am looking at N bit vectors, how many total N bit vectors are there? 2 power N. That is the correct way of looking at it. Okay. Now, what do I mean by 2 power K out of the 2 power N? My code words are only 2 power K vectors. Okay. So that is the way. So in an NK code, I don't understand what your question is. Think. See, for instance, I wrote a code down 0 0 0 1 1 1. How many possible 8 bit vectors are there? 3 bit vectors are there? 8, right? Out of the 8, I chose 2 to be 2 to be in my code 0 0 0 and 1 1 1. Understand. All right. There is no code word which is K bit in size, which is the code word which is K bit in size. All code words are N bit. 0 0 0 1 1 1. Both of them are 3 bits, which is the code word which is 1 bit. There is no K bit code word. K bit, things are message. So K bits out of the N bits will be the message. That's okay. It's different. I don't understand. I still don't understand what your question is. Okay. Maybe outside you can talk about this. Okay. All right. So now you see, now you can see, you can, in this picture, you can easily see the distance between the stars will be very important. Right? You're starting off at the star. You know, when you transmit it, it was a code word. Okay. And when you receive, you can be anywhere. But why will the distance between the code words themselves matter? If you stray too far away, you can get closer to some other code word. Okay. So whether your decoder succeeds or not depends very much on the distance between the code words. In the worst case, right, what can happen? There might be so many errors that you jump to the other code word also, but that's not really not, not going to happen. You have to really push the stars far away. And in the average, you will not really go very far from the code word. And as long as they are far apart, you can always decode. So the distance becomes an important factor. And that motivates the definition of minimum distance. Okay. This is a very, very, very important fact. Okay, for a code and then K are important. Yes, you should know that very, you should know that very well. But the next important property is minimum distance of a code. Okay, I'll write down the definition and then I'll relate it to this picture. And then you'll see how, what an important role it plays in nearest neighbor decoding at least. Okay, so here's the definition. The definition is, I'll write down a very technical definition. Okay, so it's usually denoted D. Okay, D equals minimum over all u v belonging to the code, the hamming distance between u and, okay, sorry, u and v. Okay, of course I need u not equal to v. Okay, so u is equal to v then of course all minimum distances will become zero. And I don't want that. u is not equal to v, minimum over all the code words. So graphically, what's the interpretation? It's the distance between the two closest stars. Right, distance between the two closest stars. Okay. Okay, so from this picture with this graph, the graphical picture in view and this definition, I'll make a statement next. It'll be very clear. I don't even have to prove it. It'll be very, very clear. Okay. Okay, so this is the statement about error correcting capability. Okay, T equals floor of D minus one by two errors made by the channel, by the channel are correctable for a minimum distance decode for a, okay, I'm going to do for a code with minimum distance t. Sorry. Okay. All right. Why is that true? What do I mean by errors made by the channel? Okay, what do I mean by first errors made by the channel? It's clear kind of number of bits that it flips, right, you transmit n bits, number of bit it actually flips is the errors made by the channel. Okay, so why is it very clear that if D minus one by two errors are made by the channel, right, you can always denote decoded with nearest neighbor decoding. See, you know, you started with the star, right? Suppose you start at a particular star and any two stars are at least a distance D apart and you go only D by two less than half way through. So there's no way another code word can be closer to you than your transmitted code word if you had D minus one by two. Okay, so there's a mathematical way of proving it by contradiction. If you say if you went to another code word, there should have been two this two code words which were less than distance D away, there'll be a violation. It's very easy to write that down also. That's why I do the floor of D minus one by two. Okay, so it's very, very easy to that's where the error correcting capability comes from. You know, you need two things that distance D apart, you draw circles of diameter of radius D by two, what will happen? They'll never overlap. Okay, and if you make less than D by two errors, you will never leave that circle, which means you'll always come back to the original code word that you transmitted all these pictures should be in your mind. Okay, it's very nice to write these expressions. But if you don't have that picture in your mind, you'll never really truly grasp it. Okay, that's what's happening in a nearest neighbor decoder. Right, you're going to the closest possible code word. And if you know the channel never made more than D by two errors, right, you will always go back to the right code word. Okay, of course, there can be errors that the decoder itself can go wrong. When will it go wrong when the channel makes more than D by two errors, if it goes into the sphere of the next code word, then you will get into trouble. Okay, so what people do usually is they'll come back to this picture and then draw radius circles or spheres of radius D by two, and each of those will be associated with the code word to the middle. Okay, it's like a solar system with planets and stars, each of those planets associated with the star, right. So anytime you go to a planet from a star, you come back to the star in your decoder. That's all. It's a very simple picture to keep in your mind. Okay, all right, let's go back to this example. Once again, the three one example. Okay, so you see that's what was happening here. Suppose if I draw the graphical view, okay, so maybe I should draw the graphical view for the three one picture. Okay, so let me show an example here. Okay, what are my stars? My two stars are, okay, 000 and 111. Okay, what are my other vectors? Okay, well, there should be only three of them, I'm sorry. I'll just put them like this. It's not, it's not to mean anything. Okay, so I'll say 001 is here, 010 is here, 100 is here. What are these three other dots I'll put? I'll put 110 here, 101 here, 011. Okay, this is my graphical picture for n equals 3, the three one code, c equals 0000 and 111. Okay, maybe this will clarify your point what I mean by the stars. There are two codewords, k equals 1 and number of codewords is 2 power 1, which is 2. Okay, all right, so what is the sphere if I draw, so d is what? What is d? Given this code, you can find d, right? How will you find what's the minimum separation between any two codewords? Okay, there are only two codewords here, obviously d will be 3. It's very easy to see that. Okay, it's a very, very simple thing. What's the error correcting capability? Okay, d minus 1 by 2 itself is a whole number, you don't have to do any flooring, you will get 1. Okay, it means it can correct one errors. You draw a sphere of radius 1 around 0000. What are the codewords it will include? All these three guys. You receive anything in this circle or sphere or what do you call it? What will be your decoded codeword? 0000. Okay, and you draw a sphere of radius 1 around 111. You receive any vector within this sphere, what will be your decoded codeword? 111. You go back and look at the table, that's exactly what we did. Okay, yeah, it can, that can happen. For instance, in this picture, you never got a dot outside of these circles, no? Right? In the general case, you'll have dots outside of these circles. All kinds of things can happen. I will slowly see some more examples with this picture. You'll see how it works out. Is it clear? So this is a simple, simple case. Okay, so let's take a slightly more complicated example where this picture will become more fancy. I'll take, let's say n equals 4 because that's pretty much the only thing I can manage beyond that is very tough. Okay, I'll take, what shall we take? Shall we take k equals 1 or what shall we take? k equals 2? You like k equals 2? Okay. Okay, what do you want the code to be? I'm doing, I'm going to 430, right? Everybody knows. Okay. So C is, we'll take the all zero to take the all zero and we'll take 0011 and 1100. Take 1111. Okay, so that's a reasonable looking code. Is it linear? Be careful before you answer the question, right? Right? It's a tough thing to, yeah, it is linear. Okay, so how do you check? If I give you a set of two power k vectors, binary vectors, how will you check if it's a linear code or not? What's one way of checking it? If it is a linear code, it's not so easy to check. Okay, so it might, it might be very involved. Okay, but it's not necessary. But believe me, it's a linear code. They start with the basis and if you want, try all linear combinations, it's a local. Okay. Right? That's a, that's a code word. Okay, so let's, let's, that's a code, linear code. Okay, so let's try to do this graphical picture. Okay, so you'll see a lot of interesting things will emerge from this graphical picture. Okay, so if I draw a sphere of, okay, what's the minimum distance? Okay, that's important. Let's, let's find out minimum distance. What's minimum distance? Minimum distance is two, right? Everybody agrees two. There are, there are code words which are a distance, two apart. So what's error correcting capability? Zero, right? So you cannot correct technically, you cannot correct any error. Okay, so it's, it's a strange, strange, strange situation. Okay, but can you have a maximum likelihood decoder? Yeah, of course, you can have a maximum likelihood decoder. Nothing from, nothing prevents you from going to the nearest star, except that you're not guaranteed that you'll always go to the transmitted star, even if you make one error. Okay, so you can see which is the code word which will cause confusion. For instance, which is the received word which will cause confusion. I'm sorry. 0010, for instance, 001 or 0010. There are two code words which are same distance away. So what will the maximum likelihood decoder do? Yeah, you can actually decide any one. Okay, and you will not lose anything when it comes to probability of block error. It'll be the same. That's what I said. That's what I meant when I said there can be multiple decoding functions which give you the same probability of error, but you can never go below that. So you can pick any one you like. Okay, right? This is more complicated situations emerging. I mean, you can have more fancy cases this way. Okay, is that clear? So let's try, let's try some other example. Maybe this is, this will be more interesting. n equals 4, k equals 1. Okay, so let's try that. And let me take c to be 0000. Okay, and maybe I'll take, let's say 1110. Okay, that's just for fun. No, I mean, why not? I'll put 10. I mean, it's probably no real use to it, but I want to show, I want to show you some more interesting things can happen with the shape and all these things showing up. I mean, it's good to see these things because later on you won't see it and when I use it, you should be comfortable with it. Okay, so what's D? D equals 3, you agree? Okay, so what's T? T is 1. Okay, you agree with me? Is that clear? Okay, so if I draw a sphere of radius around 0000, what will be the vectors that are in there? Right? 0000 itself will be there. It's no problem. Or you can flip any one bit. No, you can, you can take a step in any direction. Only one flip is allowed, right? Not more than one flip. One one distance one is allowed, 0100, 10000. Okay, right? If I draw a sphere of radius one around 1110, what will I get? First, first you'll get 1110, or you'll get 1111, or you'll get 1100, 1010, 010, 1010. Okay, right? So if you receive any codeword in this set, what will be your decoded vector? 0000, it's no problem. If you receive any codeword from this set, what will be your decoded codeword? 1110, it's no problem, right? Right? So now you notice that doesn't exhaust all possible 4-bit vectors, right? How many have you gone through? Only 10. There are six other 4-bit vectors and they're not within this sphere, but still they will have to decode to something, right? You can't just leave them hanging. They will be closest to some codeword, okay? And you have to add them to either this or this depending on what they are close to, okay? So to this, you'll have to add, what else will you add? 0011, is that fine? 0101, 1001. He's saying all these three should be added to this set, is it correct? Or how did I decide that? They should be closer to 000 than 1110, okay? So you see, that is true always, right? Okay? The remaining vectors will be added here. What else will you add here? 1101. 1101 is already there, no? 1101 is not there. 1101, what else? 1011, okay? These three guys have to be added here and that will be a maximum likelihood decoder, okay? Okay, is that clear? Okay, so this is the maximum likelihood decoder. In many cases, you'll see people will be very happy if you can decode just the spheres. Forget about these things which are lying outside of these spheres. If you can decode, right? Only, see, you can see why you will not leave the sphere, no? I mean, if there's only so many errors, you know, you'll be in some sphere for suitable radius and it's enough if you decode within that, okay? You don't have to do the complete maximum likelihood decoding, but it's not maximum likelihood decoding. Okay, so who's going to give me the probability of error for this decoder? And say same as before. It's not an acceptable answer for me. Give me the expression. Give me the expression, okay? So, like he points out, you can be very smart and say the last bit will not matter, but how will you do it? I mean, if you have to actually do the computation without knowing any of that, how will you do it? What's the probability of error? You understand how to do this? And I think there are some people who understood some people have no idea, okay? Suppose I transmit 0000, when will I make an error? If I receive anything from this other block, so I have to compute the probability that 0000 will be received as any one of these guys. That's so many cases. For instance, this one will be p power 3 into 1 minus p. So, all those probabilities you'll have to add up, okay? Likewise, if you transmit 1110, you can receive any one of these guys. All those probabilities you have to add. So, you do half times this plus half times that, you'll get the final probability. It'll all work out at the end of the day to like it points out 3p squared minus 2p power 3. It has to, right? I mean, the other bit doesn't really matter, okay? So, but you should be comfortable with these calculations. Why is it that it's making an error? How do you actually compute whether it's making an error or not? All those things you should be reasonably comfortable with. Okay? Any questions, some things that's disturbing you now is a great time to ask. There's some things that are not very clear. Well, this is very important. Later on, you'll see even for the 6-3 code, I cannot write down examples like this very nicely, right? What will happen for the 6-3 code? Okay? My space will have 64 different dots. And for you, how many stars will there be? Eight different code words. Okay? And then I have to keep drawing spheres around it. It will not work. It will not be very easy. It will be tough. So, in these cases, it's very, very easy and trivial and you should understand it from this point of. Okay? So, now, once again, let's go back to this 6-3 code which I have beautifully saved. Okay? Okay, let's go to the 6-3 code. This is my example. Okay? Don't feel a terrible urge to write down everything I have. All this is going to go on the net. If you find yourself painfully writing down stuff, just look at the screen. It should be fine. Okay? What is the minimum distance for this? Okay? It's a lot of work, right? It's a lot of work, right? You have to take pairs of different code words and then look at their distances and distance is not very easy. Like, for instance, if you look at the distance between these two, what is the distance? Two or three, right? Three. So, you have to look at all those other distances. It's not very, very trivial. Okay? So, now, you'll use the fact that it's a linear code to simplify this computation. So, linear codes that computation can be simplified. What is my minimum distance? Minimum overall, u v belonging to c, u not equal to v, the hamming distance between u and v. We'll make an observation about the hamming distance. The hamming distance between two vectors is what? The number of places where there is a disagreement or the number of places where they differ. You will see the hamming distance between two vectors is the same as what's called the weight of u plus v. Okay? What is weight? Weight is number of ones. Okay? Weight is defined for one vector. Hamming distance is between two vectors. Weight is defined for one binary vector. It's equal to the number of ones. The hamming distance between two binary vectors is the weight of the XOR. Bitwise, XOR of those two vectors. Why is that true? Wherever they are differing, the XOR will be one. Wherever they agree, the XOR will be zero, right? You're doing that mod 2. Okay? But what do I know about linear code? The code is linear. What do I know about u plus v? u plus v is some other code word, w in my code. Can the w code word be zero, the all zero code word? In this case, it cannot be because I insisted that u should not be equal to v. Okay? So you notice this is the same as minimization over some w in the code, w not equal to zero, weight of w. Right? So this is possible only for linear codes. Okay? So remember, this is only for linear codes. Okay? If you use this simplification, at least for this specific case of a linear code, finding the minimum distance is very easy. What do you have to do? You have to find the non-zero code word with minimum weight. What is the non-zero code word with minimum weight? There are several of them and that weight happens to be 3. Okay? So you can easily find out that minimum distance is 3. Okay? So it's also used, typical to use this notation, NKD code implies what? N is length, K is dimension, D is minimum distance. Okay? So that's one small simplification in minimum distance of linear codes. But you see even that is not useful in the general case. Suppose N equals 1000 K equals 500. I give you a huge generator matrix. How will you use even this rule? How many non-zero code words will you generate? You will have to generate 2 power 500 minus 1. It's not possible. Okay? In general, in fact, people have shown in the worst case, finding minimum distance given the generator matrix is what's called an NP hard problem. People have proven that. So it's a difficult problem. Okay? So it's not very easy to solve. You cannot really find that answer. Okay? For small cases, you can look at it and figure it out. Okay? All right. So, so that's the thing about minimum distance. Okay? So now you know D equals 3 for my 6-3 example code. Okay? In my 6-3 example code, D was 3. So what is my error correcting capability? 1. Okay? I know error correcting capability is 1. Now go back and use the sphere idea. What does this mean? You include all the vectors that are a distance 1 away from each of these code words. There should be no overlap. Okay? You can test that if you want. If you're interested, if you're getting really bold, there will be no overlap. Okay? So let's try to at least write down a few of these guys. Okay? See, it's very easy for me. So you notice doing this on the blackboard is quite non-trivial. Okay? So you write down all the vectors of that are a distance 1 away from this. What will you get? 0, 0, 0, 0, 0, 1 all the way to 1, 0, 0, 0, 0, 0, right? So maybe I'll write this one alone just for 1, 1, 1, 0, 0, 1, down to 0, 1, 1, 0, 0, 1. Likewise, you'll get a bunch of vectors. Okay? So how many total vectors would you have covered this way? Just including vectors that are a distance 1 away. How many do I have here? 6, right? Right? I can choose to flip one bit. 6 plus 1, 7, 7 times. 8 is what? 56. Okay? So I have covered only 56 vectors. How many total do I have? 64. So there are 8 other vectors which are outside of that. I also have to decide how I'm going to decode that okay? You see what we did in the last case, he was, I think some way he came up with all those 6 extra vectors that were outside and we could easily decide whether it has to go to the all 0 vector or 1, 1, 1, 0. For the n equals 4, k equals 1, it was trivial. For the 6, 3 code, it's not very easy. For the 56 vectors, you know what to do, right? Presumably, you can write it down. Then you'll have to figure out what the 8 other vectors are, then actually find the nearest code. Okay? And you'll see surprisingly for linear codes, a beautiful thing will happen. What do you think will happen by this symmetry of the whole thing? Each one of those vectors will distribute to each one of these things. So you'll decode back to the code word for 8 different received vectors. Okay? That will wonderfully happen. Okay? So beginning with next class, we'll try to prove that and come up with some simplifications for this process nearest neighbor decoding for linear codes. Okay? That's the end of this lecture. We'll meet again.