 Okay, so that's, that's, that's, that's lecture five. Okay, so let's take stock of what we have right now. We seem to have some sense of an NKD code, okay? So let's say it's linear and it is binary, right? So throughout this course, we'll only look at linear code, so you don't have to really worry about the term linear. I'll simply say code and it's assumed to be linear, okay? But as we go along, we'll see codes that are non-binary also. So it's good to emphasize the binary code once in a while, okay? So what is this? What is an NKD code? How would you describe an NKD code? It's a set of, set of code words, right? Each code word has n bits. How many code words are there? 2 power k and they form a k-dimensional subspace of the n-dimensional binary vector space. And what's the d signify? Any 2 code words in the code are at least having distance d apart, okay? So you have to do at least d flips to go from one code to another code, another one code word to another code, okay? So this is what we've seen right now. So this is just a description, but what we don't have is how do you construct, okay? We don't have a way to construct NKD codes, right? We have a way of constructing NK codes if you don't care about distance, right? You can come up with any number of NK codes, but how do you know that it has a particular distance d? Okay? So the only technique we saw so far is what? List out all the code words and find the code word of minimum weight and decide for yourself what the minimum distance is, but we don't have anything to do the opposite. Suppose I desire an NKD code for some n and some k, right? Can I actually construct that code? What does it mean to say construct? What should I come up with? Come up with a generator matrix or a parity check matrix. I should either come up with k basis vectors for my subspace or n-k basis vectors for the dual subspace. Okay? One of those two things you'll have to do. Okay? So typically you'll see when minimum distance is involved, people will try to construct the parity check matrix, okay? So there's a wonderful reason for that. There's a connection between minimum distance and parity check matrix which we will see in this class, okay? But there was also one more thing that we saw, okay? So we also saw the decoder, okay? In fact, how do you decode is also a question that is still open. We don't, we only know the simplest possible solution. What's the solution right now? We know the optimal solution. We know what's the optimal thing to do is across a binary symmetric channel, which is nearest neighbor decoding, right? But what's the only way we have of implementing it? Brute force exhaustive search, you go through one code word after another and then figure out which code word is closest to your received word, right? That's what we would do right now, but, and as I said, it's really not possible to implement any of that in hardware for any reasonable N and K, okay? So you might ask this question, why do you really want large N and K? It seems to have lots of trouble, right? It seems to be too much trouble to go to large N and K. It turns out, we may not see it in this class, it's, it's, it's lots of good theory. If you read the books, you'll know why large N and large K imply large D usually. No, I mean, you can get large D only when you go to large N and large K. So why would you like large D? Or actually large D over N, okay, to be very precise. You want good D over N. Why would you want large D? Well, if you want to correct more errors, you need large D, right? You need, you need D. So it's easier to construct codes with large D over N and N and K become large, okay? So for instance, you can think of a simple bound. D has to be always less than N, right? Right, so some simple bounds like that based on N. So only when N becomes large, it's possible. There are better bounds than that, but I'm giving you very simple way of thinking about it, okay? Only when you go to large N and large K, it's useful. Sorry for that. So that's the, that's the thing. Those are the two questions we'll try to answer, okay? How do you construct NK decodes and how do you decode, okay? So how do you decode question does not have a simple answer. In fact, people have shown that it's really, really hard to decode a general NK decode with maximum likelihood decoding or nearest neighbor decoding. In the worst case, it's NP hard, it's been proven. So you can't really decode very well. But if you allow for some approximations or some suboptimal decoding, you can do it. And we will see it very late. It'll take some quite some time for us to see an efficient decoder for a very practical code, okay? So it'll take some time for us to see. But on the way, there is some other decoder called a syndrome decoder for linear block codes, which is kind of a stepping block towards this very simple and efficient implementation, finally, okay? Syndrome decoder is not really that efficient to implement for large N and large K, but we will see that because that presents a very critical idea in moving towards a very simple implementation for linear codes, okay? Also, what we will see is how do you construct codes, okay? We'll see that later since the decoder is fresh in our minds. I'll just finish up the syndrome decoder part and then move towards the construction and the connections and all these things, okay? So for now, we'll assume somebody gives us an NK decode, okay? Somebody tells us that these are the parameters. And let's try to decode it with the best possible, okay? So that's the next thing we'll see, syndrome decode, okay? Essentially, this is nearest neighbor decoder implemented for linear codes, okay? Okay? So let me begin with an NK decode. C supposing, let's say H is a parity check matrix, okay? A quick question to check whether we've been grasping a lot of things. Why did I say A as opposed to the, okay? If I say the parity check matrix, it means there's only one, okay? But I've said A parity check matrix. Why have I said A, I'm sorry? Yeah, so well, row swaps maybe, but even otherwise, you can have another set of N minus K basis vectors for your dual space, I mean, any set of base vector will give you a matrix, right? It's not a big deal. So you can have many parity check matrices for the same code. You can have many generator matrices for the same code, right? So there's lots of possibilities there, okay? So that's why HF set is a parity check matrix, okay? So I'm going to describe the syndrome decoder. But I'll change the way we looked at the channel a little bit, okay? So far, we've been looking at the channel as a BScp, okay? The only thing is we're looking at it bitwise, right? What it does to each bit. Now, we'll look at what it's going to do to the entire vector, okay? Simple change in notation, but it's exactly the same channel, okay? So what I'll say is I have a code word C, okay, which is going into the channel, which is actually the channel is going to put out a received vector R, but I'm going to model what it's doing as introducing an error vector. What does the error vector do? The error vector adds to the code word, okay? Does that make sense? Okay, so far, I've been thinking of the channel as a binary symmetric channel and looking at it, what it does, looking at it as what it, describing it as what it does to one bit, okay? Now I'm going to say the entire code word is going to be received as R. And I'm going to collect all that information about what it did to each bit and put it together in one error vector, okay? So what does this mean? R equals C plus I, let's look at it at a bit level, okay? RI equals what? CI plus EI, okay? So what does it mean when EI is one? There is an error, okay? So the error vector, the right model is what? Probability that EI is one is what? P, okay? And probability that EI is zero will be what? One minus P, right? And then what else should I say to incorporate the BSC completely? Yeah, IID, right? EI and EJ will be independent for I not equal to G, okay? So once I say that, the description is fully complete for the error vector, okay? So now we can come up with the probability distribution for the error vector, okay? Which error vector will be most likely? Given that P is less than half, which error vector will be most likely? The all zero, what's the probability of that all zero error vector? One minus P to the power n, okay? Do you understand? So I have defined probabilities for each individual bit in the error vector. And then I'm saying EI and EJ are independent. So obviously you can come up with nice probability distribution for E itself. So you see for instance, can write probability that E equals all zero will be one minus P to the power n, okay? What about a general E? I will write probability for a general E. P power number of ones in E, one minus P to the power n minus number of ones. What did we define number of ones in a vector to be? The hamming weight, right? So I can write this as a nice formula. P to the power weight of E times one minus P to the power n minus weight of E, right? So that's a nice probability distribution for the error vector. And this channel is exactly the same as the binary symmetric channel. There's no difference between them. Just looked at it as a vector form as opposed to the scalar form in which I looked at previously, okay? So that's my channel, okay? Is that clear? So now the decoder, okay? So what does the decoder do? It looks at R and it has to find the code word which is closest to R, okay? So the decoder, nearest neighbor decoder is going to minimize, okay? Well, it's going to take the argument of this minimum, okay? So let's keep that in mind. I'll simply say it tries to minimize over all U the hamming distance between R and U, okay? Now R is what? R is the transmitted code word plus E, okay? So the decoder knows R, okay? We are looking at it as some, the decoder, how do we describe the decoder? It's trying to find C. An equivalent thing to do is to try and find E, okay? So you might say, find this, find C, or you could also find E, okay? At the outside, it seems like a stupid thing to do. I mean, C is only one of two par K possibilities while E is what, one of two power N possibility. Why do you want to find E over C? What is the advantage? What's the difference between C and E? What's the advantage? Sorry? Yeah, it's equivalent. I agree with you. Finding E is equivalent to finding C because once you find E, what do you do? R plus E will give you C, but why would you want to go towards E and not C? Why would you not, why would you do that? See the big difference is the probability distribution, okay? What do we know about the distribution of C? Equally likely among all the two power K, what do we know about the distribution of E? It's heavily biased, right? If P is less than half, you know, all zero is most likely and then what will be the most likely one? Weight one error vectors will be more likely then weight two error vectors will be more likely. You can try it in that order without worrying about without going to all the two power K, okay? So how many weight one vectors are there of length N? N of them, not two power K, right? So maybe you hit it right there and then you'll be fine. Yeah, you won't go to large probabilities. And so it's going to be mostly close to the, close to R and you can get away with looking for small weight error vectors which is going to be more efficient in the longer. So that's the fundamental idea. So you move your focus from C towards E, okay? So now I try to minimize the distance between R and U. What's the DH of R comma U based on our formula? It's the same as the weight of R plus U which is the same as weight of E, okay? So my minimization over, okay? So let me write this down carefully, I'm sorry. I should be very carefully. This is the same as minimizing over, okay? So there are various ways of looking at it. The first time I'll say E such that what? R plus E belongs to C, what? Weight of E, did I write that down correctly? Right, I can't put arbitrary E, right? Even though I know all two power N could be possible for E based on R, I can eliminate some possibilities. Why? I know R plus E has to be some code word. It can't be an arbitrary thing. So even there, there are some simplification but let's just do it that way. But in general, you could also relax this a little bit and simplify, there are some other ways of nicely simplifying this expression, okay? So that's the advantage, okay? So here, there are so many ways of doing this minimization over weight of E and you know the probability of E is favors you, right? It's biased towards some things and you try those things first and hopefully you'll succeed fast enough, okay? Right? Is that simple enough? Is that clear enough to you? Yeah, exactly, so that's why it's hard. So eventually it will not give you a symptotic improvement. So we'll come to that later, okay? Eventually it will not be that good. That's why I said it's kind of an in-between thing. I mean, you think of it this way and it'll help you design some decoders but it's not saying directly this will give you the simplest decoder, okay? So that's the principle. That's the idea behind with which people work, okay? Yes? Yeah, capital C is the code. Yes. No, it'll be only two power K, no? The E's that you will get here will be two power K possibilities. Hopefully you don't have to exhaust all of them, right? How will you exhaust them? You won't exhaust all of them. You have to be smart about it, right? You know the lower weight error vectors are more probable. So you should try them first, you know? I mean, hopefully it won't be two power K. You'll get that earlier than that. Okay, any other question? Is something similar to the... What are you trying to find? What are you trying to find? What is your search for? How does it ask? Okay, okay, okay, I understand. All the things that you can go to. Okay, I understand, I understand. Yeah, something like that, one could say that. Okay, all right. So let's try to see how to do this, right? So there's one confusing aspect here. I mean, how do you go towards all these E's that will satisfy this condition, right? How do you, how do you, right? Do you understand? How do you come up with all the E's that will satisfy this condition, right? That seems to be a little bit of a confusing aspect here. That's the only thing which is not very clear. Is there a question? What's the question? Two power? Yes. It could be two power N, but I have R, right? See, I know R. And I know R plus E has to be a code word. Whatever I transmitted, whatever error vector that could have happened to give me R. Given R, it becomes two power K. If you don't know R, yeah, I mean, I agree. You want to have two power K. Given R, it becomes only two power K. Weight of which vector? Yes, upper bound on what? Weight of E, what's the largest E, weight E that you have to look at? I don't know, I mean, that will not just depend on the minimum distance. For instance, yeah, you could say, for instance, I will only look at those error vectors of weight E or those error vectors E that have weight D by two or lesser. Right? But that's not the best thing to do. I mean, you know, your syndrome, your nearest neighbor will always work, right? Even if you're outside of the sphere, it will work. You have to associate it with one code word. But you could say that. In fact, that's the idea which simplifies the decoder finally. People will say, I will only look at those error vectors which are weight D by two or lesser. That's the only problem that you can solve for with some practicality. If you want to solve the entire problem, that's NP hard, et cetera, et cetera. Okay. So let's see, let's try to look at this set of all error vectors, right? Given R which for that R plus E will become C, okay? That's very easy to do, okay? You know, R is going to be C plus E, okay? What I'll do is I'll multiply the whole equation with H, okay? I'll do H times R transpose. Okay? That's the definition of the syndrome. Okay? Okay? S is defined as H times R transpose. Remember, this is something that the decoder can compute, right? It knows H, it knows R, so S can be computed as H times R transpose. But look at the right-hand side of the equation. What will happen when you do H times E transpose? You'll get zero, right? That will vanish. H times E transpose, you know the code word. So actually the syndrome you calculate is what? H times E transpose, okay? You see that? So to answer this question of set of all E's which will give you R plus E belonging to the code word, you have to solve all E's which will give you which will satisfy this equation, right? So you understand? Any E that satisfies H times E transpose is S will be such that, will be such that what? Such that R plus E belongs to C. You see that? It's a very simple argument. I mean, you know, H times R transpose is S. If H times E transpose is also S, what will happen to R plus E? When you do H times R transpose plus H times E transpose, obviously it becomes zero, right? It's very simple way of looking at it. But that's the important part, okay? Does it agree with our two power K possibilities? Let's look at it very closely, okay? Let's look at this equation very closely. How long will this vector S be? H is N minus K by N, okay? So this S will be a N minus K long vector, right? So let me write that down also very clearly. So when I write this equation down, let me say S1 S2 SN minus K. Why did I go to N minus K? Because I know this is H times E transpose. So what's, what is H? H I know is N minus K by N, okay? And what is E? E is a E0, E1 till E N minus 1. It's a length N vector, okay? So if I were to try to solve this equation, how many solutions will I get, okay? It's got N variables and how many equations? N minus K equations, all of them are independent, right? So how many free variables will I get if I do my Gaussian elimination? We'll get K free variables, right? K free variables. What kind of values those free variables can take? Binary values, zero or one. So obviously you'll get two power K possibilities and this ties up with what we expect, okay? So it's perfectly fine, okay? So all solutions to this is what you need to do, okay? So let me rewrite that once again. So the decoder, the syndrome decoder, okay? So you see where the name syndrome decoder comes from. It tries to minimize over all E such that what? S equals H times E transpose, the weight of E, okay? And it will take its argument to be E cap, okay? Argument of this minimization would be E cap and what would be C cap? Do you wanna understand what I mean by E cap, right? So let me write that down here explicitly. I think, I guess people are confused. E cap is this and then C cap is R plus E cap. There are various ways of looking at this addition. You can say I will look at E cap and see its places where it is one and then go to R and flip those places. I'll get my C, it's the same thing, okay? So this is how the decoder is going to work, okay? I'm gonna come to it slowly. Okay, there's no real implementation. That's just the idea, I will not really, okay? So this is what it wants to do. So in short, it works out to finding minimum weight solution of solution of what? Solution of linear equations, okay? Solution to the linear equation, S equals H times E transpose, okay? Out of the two power K solutions that you have, you have to pick that solution which has least weight and that will be your E cap, okay? So that's the idea, is that clear? Okay, so let's see an example of how you do this for the six comma three code, okay? So six comma three code, okay? I can put a six three three code, right? I know now the minimum distance is three. We saw that just by listing out the code words. The parity check matrix was what? One, one, zero, zero, one, one here and then one, zero, one here and then the identity, okay? So let's try to see if we can find the minimum weight solution easily, okay? So let's say my R is my received vector after transmitting some code word. I don't know what code word I transmitted. Let's pick R, zero, one, zero, one, zero, one, okay? So let's see, let's see, this is the, let's say this is the R, okay? Okay, I want you to think about how you will go about finding the minimum weight solution of this equation. So what's the first thing you have to find? S, okay, so what's S? What's S for this R? You do H times R transpose, what will you get? I'm sorry, zero, one, one, right? Is that clear? Zero, one, one, okay? And what are we trying to solve here? Let me write that down, let me write this equation down fully. In this case, I think it's useful to think of it this way, okay? Solving equations like this, what do you do when it's over determined and all? You find the null space and then you find the particular solution and then the particular solution plus any vector in the null space will be a, will be the set of all solutions for this equation, okay? So we can use approaches like that to find all the solutions, but we want to do it more efficiently. So the better thing to do is to see, if we can find the least weight solution very quickly, okay? So what do you do? So if you write down the whole thing, maybe it's zero, E1, E2, E3, E4, E5. One of the best ways of solving this is to keep just by trial and error, okay? So if I tell you, you have to solve this by trial and error, what's the first error vector that you'll try? You'll try all zeros first, okay? Right? Why? Because that's the most likely. But will all zeros work? It'll not work, it's not a solution. And then what will you try? All error vectors of weight one, okay? Okay, let me see. If you try all the other vectors of weight one, will you get a solution? Yeah, there is a solution, right? The error vector of weight one, which will nicely give you an answer is 001, 00, zero. So how do you think of, see whenever you multiply a matrix by a vector on the right, what is it that you're actually doing? You're taking linear combinations of the columns of this matrix, right? So that's the way to think about it. So which linear combination will give me 011, okay? So I want weight one, which means which column is equal to 011. And you have 011 being the third column, 001, 00, zero will be the error vector, it's awesome. There's a much more efficient way of doing it, but if you were to list out all the null space vectors, you would also get the answer. I'm not saying you won't get the answer, but this is probably a quicker way of getting to the answer. Okay, just keep, do trial and error, start with the most likely solution, then go to the next likely, next likely, so on. Okay, keep on trying all the other vectors of weight. Okay, is this clear? So we found that. So what's the, what will we see cap? What does see cap? You have to flip the third bit, right? So it becomes 011, 101. Is that a code word? It has to be a code. Okay, unless you made some mistake, has to be a code word, and that'll work out. Is this clear? This is how syndrome decoding would work, okay? Okay, so you tried weight zero, it didn't work. Then you tried weight one, all right? So this is how it works on each received vector R. So you'll see that there are ideas in which people try to, people try to find, for instance, suppose if you ask a question, what's the probability of error of this decoder, right? It seems like a little bit non-trivial calculation to do. So you have to figure out so many other things, right? So there is this notion of coming up with something called a syndrome table. So you list, so you list the syndrome and then map it to the e-hat and all these things. So there's ways of doing that. And I'm not sure if I should go into that, okay? So it's a good thing to know, it's important to know, but this is the basic way in which it works. The syndrome table is something that comes later. I'm gonna say that's self-study, okay? If you're interested, please go through and look at what the syndrome table is. It'll give you an important idea of cosets and all these things which are very important. And finally it'll help you calculate the probability of error for this decoder, okay? But as it turns out, it's not so crucial later on. I mean, it's enough to know how to work with each code word individually. And mostly that's what helps you as we go along, okay? So we will not spend too much time trying to find probability of error for this and understand this decoder very well. Okay, we'll keep jumping ahead. And that's something that I'm not doing this time. I usually do it in every coding theory class. I'm not doing it this time. So you might either want to look at notes from last year or look at books, books have a good discussion of this section usually, okay? So self-study is syndrome table, okay? So obviously, all error vectors of weight one will be correctable, right? You'll be able to correct all of that very easily, okay? And there'll be some error vectors of weight two that you'll also correct, but not all of weight two can be correct, okay? So all those things are important. So the only comment I'll make is about a brief comment on complexity, okay? Before I move ahead, I'm sorry. Yes, I'm sorry, I don't know. No, this code has minimum distance three. And they showed the syndrome decoder is the exact same thing as the nearest neighbor decoder, right? I didn't, so yeah, please remember that syndrome decoder is the maximum likelihood decoder, right? If you find the minimum weight error vector satisfying S equals S times E transpose, you're doing maximum likelihood decoding for the linear code. There's nothing, I never made any approximations, okay? So if you do that, you're doing maximum likelihood decoding. Of course, if nearest neighbor decoder can correct TRS, this will also correct TRS. There's nothing wrong with that. It's a very perfectly fine decoder, okay? I just want to make a brief comment on complexity, okay? Suppose you have an NK decode, okay? In a syndrome decoder, okay? One would expect that you'll have to try all error vectors of weight up to D minus one by two floor, right? Up to T, okay? You'll have to try at least that. In fact, you might have to try more. Okay, if all of that doesn't work, you'll have to try more. It depends on how N and K are organized and all that. But at least you have to try all error vectors of weight, T in the worst case, okay? You might get lucky, you might get lucky earlier, but in the worst case, you might have to try all error vectors of weight up to T, okay? Try E of weight less than or equal to T, okay? So how many error vectors will you have to actually try? If you count that, that might seem like a, right? How do you count? How do you count the number? What is the number of these error vectors? Of course, you have to try the all zero, okay? That's one. How many error vectors of weight one are there? NC one, right? So I'll write NC one this way, okay? So this is NC one for me, okay? And then how many error vectors of weight two? And choose two, okay? So on till and choose T, okay? You'll have to try this many error vectors, okay? So what order is this? Okay, roughly how big is this summation? Yeah, so you would think N power T, okay? So why is he saying N power T? What does N choose T? N times N minus one times N minus two, all the way till N minus T plus one. So you have T terms involving N, okay? But it seems like it's a little bit misleading, okay? So a very good approximation here, you would do using Sterling's approximation, et cetera. You'll get a very good approximation here. This would work out approximately as two power N times some function of T, okay? Which is this function is between zero and one, okay? It's in fact the binary entropy function of T by N if you know what that is, okay? So you can get a good approximation which is about this. So what this means basically is this number, the number of error vectors you have to try is growing exponentially with N. So if N becomes thousand, you'll have to try two power some thousand times a small constant, okay? Which will again be a large number. So the syndrome decoder while it's interesting theoretically might be very difficult to implement in practice by brute force. You cannot do the solution of S equals H times E transpose by brute force, okay? Maybe we'll see one more example. Maybe there'll be some trickery involved. You're not very clear, okay? But in general, the syndrome decoder cannot be implemented by brute force for larger, okay? So you need smarter ways of solving for S equals H times E transpose. How can you solve for S equals H times E transpose and find the minimum weight solution in a smart way rather than by trying all the possibilities, okay? This is what we will see towards the end of this month, maybe I don't know month or two we'll come to that. How do you solve for this S equals H times E transpose smartly, okay? So you see that we need to solve for S equals H times E transpose smartly, okay? I'll leave it at this. We'll come back and pick up from here when we look at good codes where you have a good decoder as well, okay? All right, so let's see one more example. Maybe with a slightly complicated looking parity check matrix, okay? So maybe I'll pick, what shall I pick? I'll pick this, okay? So what are the parameters of this code? This is a parity check matrix, remember, okay? So it's gotta be N minus K by N. What is N? N is eight, what's N minus K? Four, yeah? These four rows are linearly independent, right? We can easily check that. So N minus K is four, so K would also be four, okay? Right, is that clear? So usually I've been writing parity check matrices P and I, now the order has been changed. That's all, I've put the I before, but it doesn't matter, right? I've said, you can reorder the columns in any way you want, it's the same. So I would put the message bits here if you want, and then the parity bits would come here, okay? So I could, for instance, put the message vector here, then the parity vector would come here. What will be the relationship between P zero and the vectors of M? P zero will be M naught plus M two plus M three. You see that? Okay, so those are the equations that you can write down, okay? This is a simple parity check matrix, okay? So let's say, let me just give you a random R and let's see if we can correct this, okay? So I'll pick, no, no, no, no, I think. Yeah, maybe we'll start with this. Let's correct this. How will you correct this vector? I think I picked that correctly. What's the syndrome? Okay, so the syndrome, if you compute, it'll work out to zero, zero, zero, zero, okay? So check that very carefully. So once the syndrome works out to zero, zero, zero, zero, what will be the minimum weight solution for S equals HE transpose? The all zero vector. So you'll say E cap is all zero vector and R itself will become C cap. Yeah, there's no problem, okay? So let's change this a little bit. Let's try another example. Let's say R equals, let me just pick zero, one, zero, one. What's the syndrome? I'm sorry? One, zero, one, one. So what do you get? Zero, one, I don't know. One, zero, zero, one, okay. Maybe we should use some hexadecimal notation or something. Okay, okay, so what's E cap now? So you see the all zero vector obviously will not do it. And any vector of weight one will also not do it. Okay, any vector of weight one, what kind of syndromes can it actually generate? It'll be equal to the columns, right? So none of the columns are equal to this. So obviously weight one also will not do it, okay? So you have to go to weight two and weight two will be optimal. You don't have to really worry about optimizing beyond that, but you'll see you can find multiple E cap. Okay, one possibility is one, zero, zero, one, zero, zero, zero. Okay, what's another possibility? Okay, so let's say R cap equals, I'm sorry? Zero, one, zero, zero, zero, zero, zero, one. Everybody agrees? Okay, so you'll have several possibilities. But any of them are okay. No, I mean you won't lose in probability of error if you choose any one of these things, okay? So it can get more complicated than this. Things that look very, it's not very easy always to find the minimum weight solution. Because see, the identity part will be okay. You think, okay, the identity part looks very easy, but the other part can play very misleading tricks, okay? When you add any two columns there, some strange things can happen suddenly that might give you, okay? All right, so that's an example. If you get two different error vectors of the same weight, yeah, yes, you're right. R is exactly between, it's the same distance away from multiple code words. Whereas I think R plus E has to be a code word. And the distance between R and the code word is equal to the weight of E, yeah, you're right. It's exactly between two. How did I rule out one-bit error vectors? Okay, suppose you take any one-bit error vector. One way is to do it by brute force. Try one followed by all zeros, two followed by, you'll never get this. But the easier way of looking at it is when I multiply a matrix by any column on the right, by any column vector on the right, what am I actually doing? I'm actually taking a linear combination of the columns of this original matrix, okay? If I say my weight is going to be only one, only one of those vectors is non-zero. Which means when I multiply on the right with the column of weight one, all I'm doing is picking out a column of H. Okay, if I do one followed by seven zeros, my answer will be the first column. Zero one followed by six zeros. It'll be the second column. Zero zero one. It'll be the third column likewise. And if none of these columns are actually equal to the syndrome, I'll never get the syndrome that I have with the weight one E. So I'll need at least weight two. And I found weight two, so it's fine. Then the, yes? Yeah, yeah, definitely. If you have, if you find any situation where there are multiple error correcting, multiple error vectors of same weight, then the error correcting capability has to be definitely less than that. Can't be equal to that. Because we found one case where you're right in the middle. Can't say error correcting capability. The error could have been this or this. You're not able to correct it. Exactly. If you can say if error correcting capability, you should get only one error. That's all. It's okay. So this will have an error correcting capability of one. All right? Okay, so yeah, I mean it's good to go into the details here, but I've seen traditionally that you don't use much of the syndrome decoder and cosets and all that later on. The only thing you use is the fact that you have to compute syndrome and find the minimum weight error vector from there and you can stop at T if you want to. I mean stopping at T is not the optimal thing, but you can stop at T if you want to do a suboptimal decoding. Those are the only notions that carry over. Computing error probability for syndrome decoder turns out to be a wasted time in lectures. I don't want to spend too much time on it. Please, please read it on your own. It's a very interesting, interesting area, know about cosets and all that. Okay, so in the remaining time that we have, I'm going to spend going back to the one of the other questions that I asked, okay? What was the other question? First question was how do you decode? We don't really have an answer for it, okay? But we at least know, so there's one pathway, I mean at least know a path, the beginnings of the path, we haven't gone on it. We'll come back to it later, but the other question is also equally important. Suppose I don't even know how to construct a code with a particular minimum distance. There's just no point in trying to correct too many things because I don't even know if this is correctable or not, okay? So there's no, that's a more important issue. So we'll go back to that, okay? For that, the most crucial thing is the connection between minimum distance and the parity check matrix, okay? Seems like a very unobvious connection, but it's actually quite simple in there, okay? Minimum distance and H, yes. Yes, yes, you will be leaving. Yes. You may not go into the sphere of another code word, no? See, you might be some middle of nowhere. See, there are vectors which are inside those spheres, but there's also vectors outside, which belong to nothing. And if you do nearest neighbor, they might go to just one of them. It might happen or you may go to two and you might assign to one, but the optimal decoder show has to do that. It cannot just give up when it's outside the sphere, right? If you give up, then you're not doing the optimal thing, okay? So in fact, for instance, one of the most popular codes that are actually used, they're called the Reed-Solomon codes, they're really, really useful. So one fascinating property you can see with them is, if you leave the sphere with very, very high probability, you'll never get into the sphere of another code. You'll always be floating in the middle of nowhere, okay? So even if you only look at the sphere, you'll never make an error. You'll never make a wrong decoding. You'll either decode or you will know that I'm not inside the sphere, so I give up. You can do that. So that's used to a lot of advantage in practical codes. I mean, you either know how to correct or you don't correct. You know when you cannot correct. So even though you cannot do it. So all those games you can play later in practice. Okay, so the crucial thing that we have to see next is the relationship between minimum distance and, and the parity check matrix H, okay? So this is exploited a lot in the construction and you have to spend a lot of time thinking about it. So very, very silly, simple concept, okay? But if you don't get it inside your head, it will never, never stay there and you'll lose out a lot on this simple and elegant way in which these codes are constructed, okay? So the first thing I'll say is, I'll prove a statement. Suppose I have a code word C belonging to a code, okay? C is a code. I have a code word C belonging to a code. What do I know about C and H? Okay, I know H times C transpose is zero. Okay, why did I put a bar below this zero? It's got multiple zero, so it's one zero, okay? So now I'm going to again use this point that I've been making all the time. Whenever you multiply a matrix on the right with a vector, all that you're doing is taking linear combinations of the columns, okay? So let me just beat that point in a little bit. Suppose I say the first column of H is H1. The second column is H2, okay? So I'll put vectors here, okay? I'll put transpose just to be a little bit, okay? Why am I putting transpose? Okay, so far I've been talking about vectors as row vectors. Now I'm saying column vectors, so I'm putting transpose just for consistency, okay? If you don't like it, you can just ditch it, okay? H1, H2 to HN. So those are my N columns, okay? I'm going to multiply on the right with a code word which is C0, C1, till CN minus one. Equals what? Zero, okay? Okay? So let me actually show you how that multiplication happens. How do you multiply? This is the same as saying C0 times H1 transpose plus C1 times H2 transpose. Actually, I should have done this also from H0. I'm sorry for this. I'll go back and renumber this. Apologize for this confusion, okay? I'm going to renumber this as H1, H0, H1 so that we have the same numbers here. Plus so on till CN minus one HN minus one transpose equals what? Zero, okay? Is that clear? Okay? Whenever I find a code word, this has to be true. Now, suppose I say weight of C is W, okay? Some W. What does it mean? Out of these N bits in C, there are only W ones and the remaining are zero, okay? So and let me say the W ones in C are at positions I1 through IW, okay? What does that mean? The I1th bit in C is one. The I2th bits in C is one. So on till IWth bit in C is one. The other bits are all zero. For instance, one possibility could be from one to W, okay, all these things. So the first W bits are one, the remaining are zero. I'm just mentioning clearly having notation for which of those zeros. Now, if I use this knowledge here in this equation that I have, what will happen? I can throw away all the positions where it is zero, okay? Only retain those positions where it is one and I know it's one. So I don't even have to retain that but I'll simply retain it for fun for a while. That's the way it will generalize, okay? So I'll say CI1 times HI1 transpose plus CI2 times HI2 transpose plus CIW times HIW transpose has to be zero. What does it mean now? Just because I had a code word of weight W, I went once at positions I1 through IW. What does it imply for the columns of H? The columns at the I1th column, I2th column, so on till IWth column have to add up to zero, okay? Or if you want to be more general, you can say those columns have to be linearly dependent, okay? So that implies columns I1, I2, IW of H are linearly dependent. In the binary case, linearly dependent is the same thing as saying they add up to zero. Am I right? It's just a little bit of, you don't have to add up to zero. The subset of them have to add up to zero maybe. So in the binary case, I think a stronger result is to say that they have to add up to zero, right? So it's a little bit weird, but in general, you can say they're linearly independent. Linearly dependent, I'm sorry. Okay, maybe it's the same thing. I'm a little bit confused about the binary case now. Okay, maybe it's the same thing, but in the binary case, you have a simplification saying those columns have to add up to zero, right? So that's the difference, okay? Okay, so suppose I now, so there's a relationship between the parity check matrix and the weight of the code word, okay? If I have a weight W code word, it means a certain W columns of HR, linearly dependent. So now, how will I define minimum distance using H now? Okay, this is also an if and only if relationship. Do you see that? Okay, what do I mean by if and only if? Okay, if I said, if I have a weight W code word, then the W columns are linearly dependent. Now if I go back, suppose I say there are set of W columns of H which add up to zero, do I have a weight W code word? Yes, very much. Just put the ones in those places, they'll add up to zero, you'll get that, okay? So that is also going to happen. It works both ways, okay? It goes both ways. It's also, this also implies this, okay? Both ways, you know, it has to work, okay? Is that clear? Okay, so since it works both ways, one can nicely define the minimum weight of a non-zero code word, okay? In relationship with the columns of H, okay? What will be the minimum weight? It'll be the minimum number of columns of H that are linearly dependent, okay? So that's the relationship I'm looking for which is really wonderful and nice, okay? Minimum distance of a code, so this implies minimum distance of a code of C is equal to minimum number of linearly dependent columns of H, okay? Okay, so this is a slightly abstract result I've shown. I'll apply it in some definite cases and we'll see a lot of examples to drive the point through. This is very important to think about. What is the rank of a matrix, okay? Believe me when I tell you, it's completely different from this, okay? It's related to this. It gives you a very loose upper bound to this but it has got absolutely nothing to do with this, okay? Finding rank is very easy for a matrix. Finding this is very, very difficult for a matrix, okay? It's not very easy to find. What does rank tell you actually? It'll tell you if I say a rank of a matrix is R, what does it mean? There are R columns which are linearly independent and any set of R plus one are linearly dependent. It tells you both but it doesn't tell you anything about the minimum. There could be the all zero column there, right? Which means there is one column which is linearly dependent, right? But rank doesn't care. It'll just ignore that and go to the other columns and say my rank is still R but for minimum distance I need this, okay? And this is a highly non-trivial thing to calculate at a matrix. In the worst case, of course, people have shown it's NP hard again, okay? So it's not very easy to calculate, okay? So keep that in mind, okay? I want you to reflect about this. When I come back and give you some examples, it'll be clearer, okay? I want you to think about this more, okay? This is very different from the rank, okay? All right, this is the end of this lecture.