 So, this is lecture 44, ok. So, we are talking about coding and I gave you a brief idea about minimum distance and I went through real fast. So, I am going to do a brief review of this generator matrix, parity check matrix and all of that and then proceed with minimum distance, ok. So, basically when you are thinking of an NK linear block code, ok. So, I did not use this maybe block word before it is called linear block code because it happens in blocks of n bits, ok. So, everything is a block code in practice, ok, but still the traditional naming block code, ok. So, an NK linear code, ok. So, you the various ways of describing it, most elegant way of describing it is through the generator matrix which is systematic, ok. So, IK and then a P which is a K by n minus K matrix, ok. Corresponding to this there is a description in the dual space which is P transpose I n minus K, ok. Remember this is a K by n matrix, each row belongs to the code, it is a code word and the row space is the entire code, ok. So, this is a parity check matrix which is n minus K by n, ok. So, you can see the way the dimensions worked out. I wrote P transpose here, P transpose is n minus K by K and then I n minus K is n minus K by n minus K. So, it looks like that, ok. So, what what do you describe it? The code can be described in two ways, ok. So, the code itself using the generator matrix, it is set of all c equals m times g where m is a K bit vector, ok. So, this is one way of describing the code, ok. The equivalent way of describing it in terms of the parity check matrix is set of all c in 0 1 n such that what? H times c transpose is 0 n minus K transpose, ok. So, this is both of them are the exact same way of specifying the code, ok. One is in the language of basis and generator matrix, the other is in the language of the dual space and the parity check matrix, ok. So, both of these are exactly equivalent, ok. So, one can imagine immediately if you look at this definition, the generator matrix can be used for encoding, ok. So, you see that very easily, generator matrix can be used for encoding. What do you think the parity check matrix can be used for immediately? For a process called error detection, ok. Suppose you have a n bit vector and you want to find out if it is a code word or not, ok. Yeah, so you multiply by h times c transpose and see if it evaluates to 0, ok. It is difficult to do with generator matrix because you have to generate all the code words and test one after the other. Well, not really, right. How will you do it with the generator matrix? I will, suppose you do not know the parity check matrix trick. You have to look at the first k bits, right. You know the first k bits have to be equal to the message and you can do it. It is the same thing you are doing the exact same thing, but it is good to think of it in terms of the parity check matrix, ok. So, this is the definition, ok. Related third quantity which is very important is the minimum distance, ok. So, d is the minimum distance. So, typically people talk of an n k d code. So, d is so important that it can be put in the parameter itself. So, people talk of an n k d code. So, d is the minimum distance and that definition is very, very important, ok. It is basically the smallest distance, hamming distance between any two valid code words in your code, ok. So, since it is a linear code, you can simplify it and evaluate it in a different way. You look at the list of all possible codes, code words, look at all possible non-zero code words and find the minimum weight of a non-zero code word, ok. So, remember there can be more than one code word with minimum weight, but that does not matter to you. Minimum weight is the definition for the minimum distance. Both of them are equivalent ways of doing the minimum distance, ok. So, that is one thing. See, the other way to think about coding, ok, which is very important from a decoding point of view is what coding does to the constellation, ok. So, typically you always keep the BPSK in mind as your constellation, ok, right. You always think of BPSK as the constellation and code as something that is sitting outside of the constellation, but in reality that is not true. Why is that not true? Why can I not think of just in a coded system? Why can I not think of just BPSK as the constellation for the decoder? Because each symbol is not independent of the other. So, I cannot just use BPSK and decode, unless I am doing hard decision decoding. If I am doing hard decision decoding, I can think of BPSK make suboptimal decisions and then do hard decision decoding. That is one way of doing it, but if you want to do soft decision decoding or optimal decoding, I have to think of a larger constellation in n dimension, ok. So, code and NK code induces a larger n dimensional constellation, ok. So, you have to take the n bits together, map them into the symbols, ok. So, you will get how many possible symbols if you do BPSK symbol by symbol, ok. Out of these n, in n dimensions you will have 2 power k points, ok. So, that is what is actually happening, ok. If you were doing independent BPSK in n dimensions, you would still have 2 power n and it is in fact no need to do all that because you can happily do decoding symbol by symbol. Now, we are not doing that. You are only picking 2 power k of the 2 power n possibilities in this large dimension signal space, ok. What 2 power k you are picking is given by the generator matrix and the parity check matrix, ok. To do optimal decoding, you have to think of this big signal space, ok. So, now, when I want to do soft decision decoding, what should I do? I should. So, it is easy to see. Once I have a big constellation, soft decision decoding, optimal ML decoding is very easy to define. What do you do? Yeah, nearest neighbor. Given a received vector, evaluate distance from all of these 2 power k signal points and then pick that point which was closest to you. Forget about complexity for a while. We will say, ok, fine, whatever k we will do that. So, implementing the soft decoder is this at least in words is very easy, ok. So, that is what I am going to write down next, ok. So, that is what we will see next, but before that I want to give some examples for minimum distance before we proceed, ok. So, that is the last thing we saw. I quickly went through the minimum distance. I want to go through, do the minimum distance examples first and then we will see the soft decision decoding and you will see it is very easy to just write down. It is not very difficult, ok. So, we will see some examples for minimum distance, ok. Minimum distance is also called daemon, ok. It is always said daemon, daemon, daemon. So, it is just shorter to write, ok. So, what I am going to do is give you a generator matrix and ask you to give me the minimum distance of the code, ok. So, this is a generator matrix, ok. I am going to give you a generator matrix and ask you for the three parameters n, k and d, ok. So, it seems reasonable, right. So, what is n? n is what? 6, right. So, what is k? 3. 3. 3. That is the easiest to figure out. Next is minimum distance, ok. And there are two ways of doing it. I said there is one way, there is actually two ways. First way is to list out all the code words, enumerate all the code words. How many code words will you have here? Eight. It is not too difficult to list out the eight code words here. List out all the eight code words and pick that code word with minimum weight. So, let us do that. So, the code itself is what? First code word is 000. Then you have, ok. So, I am quickly listing out. Let me know if I am making a mistake, ok. Those are my eight code words. I just did all possible linear combinations and came up with the eight code words. So, now from here minimum distance is very, very easy to evaluate, ok. So, you look at the non-zero code words. Minimum weight is 3, but how many minimum weight code words are there? Number of minimum weight code words equals 4, ok. In fact, you can completely list out all the weights, ok. There is one code word of weight 0 as it will always happen. And then there are four code words of weight 3 and three code words of weight 4, ok. So, that is how the code looks, ok. The other way of figuring out minimum distance is to look at the parity check matrix and try to guess or quickly find which is the smallest weight code word which will give me 0 when I multiply, ok. So, the parity check matrix is going to be, ok. That is the parity check matrix, ok. And for doing this, going from the parity check matrix, you have to be very careful, ok. So, several of your intuitions will fail because this minimum distance is not an intuitive quantity like the rank, ok. Rank is a completely different quantity of a matrix. This is a very complicated quantity, cannot very easily compute it. The best way of doing it is to start from the all-zero code word, well not the all-zero code word, code words of weight 1, code words of weight 2, code words of weight 3 like that, keep eliminating everything till you get to a successful point, ok. Some bounds you can quickly get, but beyond that it is very dangerous, ok. So, do not do anything beyond that is the moral of the story, ok. So, here if you look at code words of weight 1, ok, there cannot be any code words of weight 1, ok, right. What is the only way in which you can have a code word of weight 1? There should be an all-zero column in the parity check matrix. If you, unless you have an all-zero column in the parity check matrix, you can never have a code word of weight 1, ok. How will you have a code word of weight 2? Two columns should be identical, ok. So, those are two things to look for, ok. So, just look at the matrix and see if there is an all-zero column. If there is an all-zero column, then immediately you can have a code word of weight 1. Do you see that? No, ok. Suppose this first column is 0, ok. What is a code word? If you take this code word, if you take this vector, this will be a code word, ok. Remember when you have a matrix and you multiply with a vector on the right, you are doing a linear combination of the columns of the matrix, ok. So, that is very important, ok. When you have a matrix, you are multiplying with a vector on the left. What are you doing? Rho. Linear combination of the rows, ok. So, these are the things to keep in mind. So, when I am multiplying on the right with the binary vector, what am I doing? I am only doing an XOR of the columns, ok. So, there is no scaling involved, ok. There is no in the linear combination, I am only doing addition, ok. So, using those things, it is very easy to see, ok. So, let us go back to the old matrix 101, ok. So, now the question is, when will I have a code word of weight 2, ok? If I have to have a code word of weight 2, then 2 columns must be identical, right. Only then those 2 columns will be linearly dependent and I can add them to give you 0, ok. So, this is binary, ok. So, this is not non-binary. If you used to non-binary spaces, this will seem a little bit confusing, but this is binary. So, that is the only way in which it can work, ok. The next is 3. 3 is a little bit more complicated, ok. So, you have to look at it and check, ok. But here you can easily come up with code words of weight 3. For instance, if you put a 1 here, then I put a 1, 1 here, it has to be a code word. How did I come up with that, ok? So, I put a 1 here and then looking at the identity matrix, I selected those columns which will cancel out all the ones that I got here, ok. So, that is a smart way of doing it. So, you see from here also that minimum distance is 3, ok. But whichever way you cut it, for a large general code finding minimum distance, in fact, that has been proven to be what is called an NP-hard problem, ok, which means there are no known algorithms and unlikely to be very easy algorithms to be solved, ok. So, that is the story here, ok. So, this much you should be comfortable doing. Given a generator matrix or a parity check matrix, you should be able to list out all the code words, figure out N, K, and D or find the generator, find the parity check matrix and find D from there, ok. So, that is a simple enough thing to do, ok. So, I think that is about minimum distance. Any question on how I did this, ok? All right. So, ok. So, coming back to the general picture, ok. So, suppose I am looking at soft ML decoding, soft maximum likelihood decoding, ok. So, like I said, you have a K-bit message which goes through an encoder and you get a N-bit code word. And then you do symbol by symbol VPSK to get symbol vectors, ok. So, this again N, ok. So, let me just say N symbols, ok. So, so you have to be careful when you think of your decoding. You cannot do symbol by symbol if you want to do soft ML, right. So, you have to look at the entire symbol as a whole and then decode it, ok. So, you have noise adding to this and you get R, ok. So, this belongs to what is called Rn or n real numbers, ok. So, then you have to build a decoder, soft ML decoder and produce say, let me say C hat or S hat or M hat. All of these are equivalent, right. So, any one you produce, you have done decoding, ok. So, the actual formula is really, really simple, ok. So, you know your actual constellation is n dimensional, ok. And you know the list of all symbol vectors that you have, ok. So, so writing it down is very, very easy, ok. So, my code is C, suppose my code I denote as C, ok. A code words the C will belong to capital C, right. So, when I go from code word to symbol, what am I doing? Every 0 goes to plus 1 and every 1 goes to minus 1, ok. So, corresponding to the code, I will also have a set of code symbols, ok. Like just like I have 2 power k code words, each code word when converted through BPSK will give me a symbol vector. So, I have a set of 2 power k set of code symbols and these are my actual constellation points in n dimension, ok. So, this set I will maybe call capital S, ok. So, there is a very succinct short way of describing S in terms of C which is also very misleading because it is the way I write it down, it will be a integer operation and not a modulo 2 operation. I can write S as 1 minus 2 times C, ok, right. So, do you see that? The reason is this S small s will be 1 minus 2 times small c, ok. So, if C is 0, CA is 0, what is SI? It is 1. If CA is 1, SI is minus 1. So, it is a very simple way, just a notation for denoting BPSK modulation. It does not really matter how I do it, just a way of writing it down, ok. So, keep this in mind. So, once I know all this, writing down C hat is very, very easy, ok. C hat is argument of minimization over C and C of what? Distance between R and 1 minus 2c, square or no square or whatever, ok. So, that is the definition, ok. So, it is a simple definition. I have written down how I do my minimum distance decoding. I look at each and every symbol vector and evaluate its distance from a received vector R and then pick that symbol vector or that code word which gave me the smallest possible distance, ok. So, this is easy to write down, but one can easily see it is very difficult to evaluate in practice, ok. So, if you have k being 500 or something, you have 2 power 500 symbol vectors, you cannot do this in practice, very easy, ok. So, there are codes for which this is easier, but in general it cannot be done very easily, ok. So, if you manage to do it, how well will you do? Can you get an estimate of your probability of error is the next question, ok. So, if you manage to do soft ML decoding, how can you estimate your probability of error, ok. So, once again we will use the same union bound technique of pairwise error probability, ok. Suppose I transmit a particular code word, I am in a certain point in my n dimensional constellation. There I have to figure out my closest nearest neighbors and find the distance between my nearest neighbor and myself. Then q of d by 2 sigma is an estimate multiplied by the number of nearest neighbors is a reasonable estimate of my probability of error, ok. So, it is a pairwise error probability type computation. Of course, it is not accurate, accurate estimations are very difficult. So, I do a approximate estimation. So, now what do you think will control that distance? The daemon of the code, ok. So, the daemon of the code will play a very crucial role in telling you what the distance will be, ok. So, can you compute that distance in terms of daemon? Suppose I give you an NKD code, ok. What is the distance between nearest neighbors in ND signal constellation? What is the answer? Root d, ok, N by all kinds of computations, ok. So, let us see. Suppose I have a vector u which is u 1, u 2, u n, this is a binary vector and a vector v which is v 1, v 2, v n, ok. Suppose this is my two vectors, ok. So, what is the distance between 1 minus 2 u and 1 minus 2 v, ok. Wherever these two binary vectors agree, what will happen? There will be no distance. Wherever they disagree, there will be a distance of, ok, well 2 squared, right. So, you will have to add it up. Am I right? You will get 2 squared and then how many such distances will there be? As many as the hamming distance between u and v. So, I can easily write this as d h of u comma v times what? 4. 4 is the daemon of my constellation, daemon square of my constellation. Is that fine? No, it will not be d h square. Number of places in which u and v differ matters, ok. So, it says d h square is not going to enter the picture, ok. So, for each position in which they differ, there will be a contribution of 4 to the sum, right, wherever they differ. If they are the same, then they will cancel out, there is no problem. Wherever u and v differ, there will be a contribution of 4, right. 1 and minus 1, minus 1, minus 1, you will get 2, 2 square is 4, ok. So, this is the magic formula for distance. So, now you know this, what is the answer to my question? What is the distance between closest neighbors? This question is going to be 2 times root d, ok. That is going to be the distance, ok. So, that is all. This factor 2 is just comes because of my minus 1 plus 1. If that is a minus a plus a, you will have a 4 a squared and it will be a 2 a there, ok. So, that is all, ok. So, 2 times root d, ok. So, the next thing you need for estimating probability of error is a rough estimate of how many nearest neighbors you have and that is a difficult thing to estimate, ok. So, like so, for a general code it is very difficult, d itself is difficult. On top of that finding the number of minimum weight code words can be more difficult, ok. So, you take that as a constant k, ok. So, you say k is the number of minimum weight code words, ok. So, I said number of minimum weight non-zero code words, ok. So, let me be very clear, ok. So, what it means is from the all zero code word I have k nearest neighbors. Now, I am going to say since I have linearity from any other code word also I will have exactly k nearest neighbors, ok. So, because of the linearity I can simply add all these things to that and you will get the same thing, ok. So, you will have k nearest neighbors for any particular transmitted code, ok. So, a good bound for the probability of error, ok. Good estimate of the probability of error is k times q of 2 root d by c, ok. So, this is remember for bpsk plus 1 minus 1, ok. So, if you had something else then other formulas will occur. Is that ok? No, there should be 2 sigma, right. So, it becomes k times q of root d times root d by sigma, ok. So, I am happy with this formula, but I am not too happy. Why? I am sorry. Well, I do not know d, you know, but I am saying from a comparison point of view I need to write pe as a function of eb by n naught, ok. So, I have to do that manipulation, ok. So, change it. What will you substitute, ok. So, eb by n naught is what? eb by n naught is 1 by 2 r r is what? k by n. So, this is my formula. So, 1 by sigma have to be substituted. So, pe becomes approximately capital K q. So, 1 by sigma can be written as root 2 k by n eb over n naught, am I right, ok. So, there is going to be a root d added to it. So, you get 2 k d by n eb over, ok. So, this is for the coded system, right. If you have an nk code, nk d code with capital K minimum weight vector. So, usually I think it is good to write k sub d because d is the minimum weight, ok. So, I will write k sub d in my notation, ok. So, you do not know capital K sub d, but you do not have to be too worried about it because when eb over n naught becomes large, the term that will dominate will be the q. q goes exponentially down and this is a constant it does not vary with sigma clearly, ok. So, at least that much we know, ok. So, you can happily ignore that, ok. So, what is it for an uncoded system? If you have an uncoded system, this is q of what? root of 2 eb by n naught, ok, right. So, what will happen if I plot these two things with pe in log scale and eb over n naught and db scale roughly for large eb over n naught, the difference will be what? So, coding gain nominal it is called nominal coding gain because there are so many factors that are ignored is what? k d by n in db. So, you have to do a 10 times log 10, ok. So, you do 10 times log base 10 of k d by n db, ok. So, this is a good formula and it works for several cases. For instance, for the repetition code what does it work out? k is 1, d is n it works out to 0, ok. So, for at least for the repetition code it is an accurate estimate of coding gain, ok. So, it is a reasonably good estimate at high eb over n naught it is a good estimate, but it turns out it is not a very good estimate at very low eb over n naught, ok. So, and today people work at very low eb over n naught, ok. The codes that are today can give you all kinds of coding gains very close to capacity and you work at very very low eb over n naught, ok. So, because of that this bound is slightly lesser importance, but eventually, but this is very useful at least for first course to understand what coding gain is all about this is a very good bound, ok. So, you see it works there are two contradictory factors here, ok. So, if you decrease k you can hope to increase d, ok, but it actually depends on the product, ok. So, there is a trade off. So, how much can you decrease and how much can you increase will it increase fast enough to give you a reasonable coding gain is the kind of game you have to play, ok. So, the next thing I am going to do is give you an example for the 7-4 Hamming code and let us try to evaluate its nominal coding gain, ok. So, I gave you a parity check matrix, ok. So, I am not going to be able to probably produce the exact same parity check matrix, I am sorry, I am sorry, yeah. You are saying log 10 should not be there. But then for the repetition code it works out as something else, no way you are saying it should not be, see eb over n naught is not in db, it is an actual number. I will use some function, I am not quite sure if you are right about that, I think you can definitely say 10 times log 10 k d by n naught, it is a scaling factor, I think this is ok, ok. I can check once again, but I am quite sure this is ok, ok. So, it has to show up, see eb over n naught I have not written in db. So, look at the definition here, 10 by 2 k sigma square. If I wrote that in db and then I wrote an exponential inside, then you are right, ok. I think this is ok, think about it anyway. So, let me see if I can reproduce the exact same. So, what was the parity check matrix I had for the Hamming code? I do not know seems to be too many discussions going on, I think it is 10 log 10 k p k d by n, ok. So, it is, I am reasonably confident about that, then you think about it. So, what is the next column I had, I forget what the, so it is a 3 by 7 matrix, right. I forget what I had, ok. So, what did I have for the first column? Parity check matrix for the 7th for Hamming code. 1 1 0, next one is 1 0 1, 1 1 1 and 0 1 1, ok. So, the actual order does not matter, ok. So, it does not make any difference, but anyway I just wanted to produce the same thing once again, ok. So, for this code what is the minimum distance? It turns out it is 3, ok. So, you can show this once again, ok. So, it is 3 and it can be proven quite easily using the same technique as before, you can show it is 3, ok. So, there are no, there is no all 0 column. So, it cannot be 1, no 2 columns are identical, it cannot be 2 and there are 3 columns which add to 0, ok. So, you can quickly produce that 1 0 0 0 1 1 0, ok, that adds to 0. So, that is the way 3 code word, ok. So, this is the code word. So, this is way 3. So, how many such code words will there be? You need to estimate somebody who can quickly count, ok. So, turns out the answer is 7, ok. So, you can think about it, maybe you can justify it quickly, but the answer is 7, ok. So, that is the, that is how it works. And these are, these are things that you can quickly compute. If you want you can list out all the 16 code words, not too difficult. See there will be an all 0 code word, there will be an all 1 code word, there will be 7 code words of weight 3 and 7 code words of weight 4, ok. So, that is how it will work, ok. So, this is KD, ok. So, now if you do a computation, what is the approximate nominal coding gain? KD does not matter for the nominal coding gain, I just gave you the information. This 10 times log 10, 4 into 3 by 7, 12 by 7, that will work out as what roughly? Let me see how good people have a DB computation. What is 12 by 7 in DB? It is not quite 3 dB, right. 2.5. 2.5 maybe, ok. So, roughly, ok, 2.5 dB, ok. So, there are several ways of doing it. So, roughly it will be 2.5. It is not quite 3, but it cannot be too far away from 3 because you know 2 times 7 is 14 and 12 is fairly close there, ok. So, it will be maybe 2.5 dB, just giving you a rough number. So, you can expect that much coding gain from the Hamming code. And remember this is, ok, I think you should be excited about this, ok. So, it is just a 7-4 code. In fact, you can even imagine doing a soft ML decoder with this. And you are getting 2.5 dB extra for in your system, ok. So, no other technique that you ever do will ever give you anything like this. Any, you can come up with any kind of equalization technique, you will only pay a penalty, ok. So, you will never get any gains. Coding is the only thing which gives you this improvement in your tradeoff between probability of error and SNL. In all equalization, you are trying to limit the damage, ok. So, in every other technique, signal processing technique you are the receiver, you pretty much limit your damage, ok. Coding is the only technique which gives you an advantage, ok. So, that is a useful way of thinking about it, ok. So, but to get this 2.5 dB, what should you do? The price you pay is, you have to compute 16 different distances or you can actually, you can look at that distance formula very closely and it will work out to just 16 different dot products, ok. But you have to do 16 different dot products. Maybe you can even simplify that computation. So, it is not a very difficult thing to do, but maybe your hardware is not capable of producing say 8 bits or 7 bits or 6 bits of data for each R. Maybe you can only produce 1 bit. So, you are forced to quantize or make a hard decision on each symbol at the receiver. If you are doing that, then you are forced to do what is called hard decision decoding. So, that is what we will see next, which is suboptimal already, but we want to see that and study that because that is a nice alternative, ok. So, maybe you do not want to spend all that time doing soft decoding. You want to quickly decode and figure out what the code word could have been. So, then you take just 1 bit of information for each RI, ok. So, how does that system look? This is how it looks, ok. So, hard ML decoder is what we will see next, ok. So, the picture is same as before you have M and then you do an encoder, you get C, then you do what? BPSK, you get S and then you add noise to it. You get R, but you do what? Because R is not accessible to you. You do a symbol by symbol slicing, which is immediately suboptimal, but you do it just to take that and then you get a B which is n bits again. And you want to decode this, well, hard ML decode this or whatever and produce C hat or well, S hat is quite irrelevant here. So, I will do only M hat, ok, ok. Is that clear? So, this model since you are anyway finally doing hard decoding, you can simplify this model further. You do not have to go from bits to symbol and then add noise to the symbol and then take symbol by symbol decisions again. You can go directly put a model from here to here. What will that model be? Ok. So, I can bypass all these things and develop a model for this hard decision system which jumps directly from C to B. If I transmit a C i, B i will be equal to C i with what probability? With probability what? Sum P which is equal to what? You can easily compute this. Q 1 by well 1 minus Q 1 by sigma. Do you see that? That is the probability of correct decision in a symbol by symbol slicer. And B i is not equal to C i with probability, well, let me be careful, yes. I want to write P for the probability of error, I am sorry. So, I will write 1 minus. So, P equals Q 1 by sigma, right. You know this is what will happen if you do symbol by symbol decisions. If you send the plus 1, you will go wrong and decode it as a minus 1 in a symbol by symbol decision with probability Q of 1 by sigma and this is what will happen from B i to C i. So, an equivalent model is to have an error vector here which is E 1, E 2, E n. And E i is 0 with probability what? 1 minus P and 1 with probability P and it is iid. So, instead of, so, you build a simpler probabilistic model in which you do not go to the continuous time domain and then take it Q and come back. You just write a simple model where you go from C to B directly. So, E is a, well, I do not know what you call it, it is maybe a binomial process or something like that. So, and instances of it and you get a vector E. All right. So, this is, this kind of a model is what is called a binary symmetric channel. So, you have M do an encoding, well, the coding is okay. And then you do what? You go through what is called a binary symmetric channel with transition probability P and you produce B, okay. Then you decode this, produce C hat or okay. So, what we had previously is a BPSK over AWGN model, okay. In the hard decision case on BPSK over AWGN, you can simplify it to a binary symmetric channel model, okay. So, my hard ML decoder while being suboptimal in the BPSK over AWGN model, it is the optimal decoder for the BSE model, right. Once you know it is a BSE, there is nothing better you can do. You do not have access to the R anymore, okay. So, the best thing you can do is this. So, it is optimal only in this model, okay. So, a good example of a binary symmetric channel in real life is what? Okay, you need speeds so high that you cannot get more than one bit in your A to D at the front end. So, what kind of systems have speeds that are so high? Optical links, backhaul optical links, etc., okay. So, they are going to 40 gig and all that. So, it is difficult to get more than one bit in practice at those rates. So, you have to work with just one bit. Not only that, when you are sending at 40 gig, you have to do your decoding at a reasonable speed. They do demultiplexing and actual decoders work at a much lower rate, but still you have to work at several megabits per second and you cannot hope to do correlations and soft decision decoding at that speed. So, you have to only do hard decision decoding. Because of those reasons, this is a pretty good model for fast communications, okay. So, where you just make symbol by symbol decisions, okay. So, even in such a model, so remember what this model is, okay. So, I have a code word C which is k bits, this is n bits, but my C belongs to a code C, okay. So, there are 2 power k possibilities, right. My error vector actually all 2 power n possibilities are there, okay. So, because of that clearly B also will have will be all 2 power n possibilities, okay. So, those are things to keep in mind, but B is still n bits, E is still n bits, okay. All right. So, what you are transmitting at this point is an n bit vector, okay. What is the probability mass function for this n bit vector? Yeah, at the transmitting end. Yeah, it is uniform over the only the 2 power k, for the remaining it is 0, that is the probability mass function. What is the probability mass function for the error vector? For the error vector, forget about convolving. It is all discrete, do not keep convolving, okay. So, what is E? It is different, okay. So, which will be the highest probable error vector, assuming P is less than half, okay. So, usually you assume P is less than half, means your system is working to some degree, okay. It is not getting, well P greater than half does not make any sense, okay. So, why does P greater than half does not make any sense? You just flip, you get a lesser probability less than half. So, P greater than half makes no sense, you always assume P less than half. So, once you have P less than half, the most probable error vector is what? The all 0 vector. What is the next more probable vector? Vectors of weight 1, okay. So, in fact, probability of E is equal to what? P to the power hamming weight of E times 1 minus P to the power n minus hamming weight of E. So, you can clearly write down the PMF of the error vector, okay. So, it is a probability generated by a probabilistic process and the PMF for each probability for each E can be written down very precisely, right. And from this formula, it is very easy to show that if you increase weight, the probability will go down for the error vector, okay. So, it is a clear ordering for these probabilities, while for C there is no real ordering, right. So, among the 2 power k, it is uniform, okay. Beyond the 2 power k, you know, okay, non-code words do not occur, you know that. But among the code words, there is no real preference, right. It is, you assume uniformly likely messages, then you have uniformly likely code words, okay. So, now what about the decoder? That is the question, okay. So, the decoder also is very easy to write down, okay. So, you can write down the probabilities and simplify, etc. Finally, you will see that C hat, the MLC hat is given by argument of minimization over C and C, what? The hamming distance between B and C, okay. So, you can show this for this model, it is very easy. It is also intuitive, right. So, this is what we did. If you receive a particular vector B, you are going to look at the nearest code word vectors. You look for the nearest code word vector in hamming weight and pick that vector as your code word vector. So, what do you do if there are two vectors which are closest? Can you pick any one by tossing a coin? It is okay, you are okay, okay. So, that is the tie breaking rule. If you get two vectors that are at the same distance, okay. So, that is very like unlikely to happen in most case. But if it happens, you just toss a coin and decide any one, okay. So, this is the decoding rule, okay. So, I know I have not derived it, you can write down probabilities and quickly derive it. Using this probability, you will easily get down to hamming distance of these things, okay. A useful exercise is to write a similar probability for B, okay. For the vector B, what is the probability? You want to do convolution, okay. So, this is all different. Don't be very careful when you think of convolution, okay. So, the underlying probability, the elements are what? These are vectors, B and C and all that. So, I do not know what you mean by convolution, okay. You should be very careful. This is not, I have not yet defined a random variable for you to go to a proper PMF and then do convolution, okay. I have been saying PMF, that is a loose definition. I am putting probabilities on each element in my sample space, which is a vector. So, what do you mean by convolution? Okay. So, I do not know what you mean by convolution. Be very careful when you do this, okay. So, you have to think about how to write the formula for B, probability of B. It is a useful exercise in probability, okay. All right. So, this is the decoder, okay. But now, go back and think about what you have to do. The only way you will implement this is to try all 2 power k possibilities. And that is clearly not a very nice thing to do, right. You have to try all the 2 power k possibilities and you have no preference of which one to try first because all of them are equally likely, okay. A useful thing to do now is to instead write the same decoder in terms of the error vector E. If you can write it in terms of the error vector E, then your search becomes easier. Why? Why is your search easier over E as opposed to over C? E has a clear gradation in probability. You know which one to try first. You know which one to try next because those were the most probable things, okay. So, that is what we will do next and I will probably do it in next class, okay. So, I will pick up from here and simplify this and eventually we will get to a decoder which is reasonable, but still it is not very easy to implement.