 Okay, so the last thing we were doing was looking at the turbo code, if I am not wrong look at the, I gave you a description of the decoder how it works, there was one minor point there which was just left open basically how do you incorporate the a priori information in the MAP decoder. So I am going to briefly mention that not in great detail, but so what will happen in the MAP decoder is for each branch there will be a branch metric, okay. So that branch metric will be something like this see what is a branch, branch is a any branch is a transition from a state S1 to S2, right, you can describe a branch as S1, S2, okay corresponding to every branch there will be a branch metric in your MAP decoder, okay. So the branch metric, okay, okay is actually contains a few other terms but will also contain this term, it will contain a term probability of S1, S2 given R, okay. So remember this branch metric at stage I, okay, so let us say this is a branch of stage I, branch metric of this branch at stage I would be I will have to use RI, okay, so the received value is corresponding to that stage and probability that that transition happened, what does what do I mean by the transition happened corresponding to this transition there will be a symbol vector, right, corresponding to each branch there would have been a certain output symbol vector, okay. So each branch corresponds to what an input bit stroke say multiple output bits and after modulation each of those output bits become symbols, okay. So this when I say S1, S2 I mean those symbols it will have something like this, okay. And you can even write this SI in terms of the input bit if you want, right, right it corresponds to a particular input bit also, right. So it also corresponds to an input bit I do not know I think maybe I will call it MI, okay. So the input bit, so this is the symbol corresponding to I think U is the notation we use, right, am I right? UI, okay, symbol, symbols this is the input bit, okay. So some such probability will be involved, okay and when you actually try to compute this probability you will have to do base rule, okay when you use base rule you will get some a priori a posteriori type probabilities using that is where you introduce your a priori probabilities, okay. So there will be a term like this, this will come from what is called gamma in the branch, the branch metric is usually called gamma, okay. So gamma will have a term like this and there you can naturally use probabilities for a priori probabilities for UI can be used here, okay. So some such there will be a probability that involves only see the important thing is this probability involves only what? Only things that correspond to stage I, okay. So that is where you introduce this a priori probability, okay from the previous iteration from the other decoder you would have got some a priori probabilities you use that here, okay. So that is just to make that very specific, okay. So that kind of winds up our turbo decoder. So there is one other thing in relation to convolutional codes which I have not talked about at all, okay. So that is basically structural properties and distance properties, okay. So I have not talked about those things at all and I will just quickly briefly mention this once again with just a simple example just for completeness because I think there are some problems in the homework which are based on this and I do not want to be solved on those problems, okay. So let us see. So this is a brief discussion of free distance and other events, okay. So again I am going to do this with an example. My convolutional encoder is going to be a simple feed forward non-recursive encoder, okay. So I am going to take a trellis for this, let me draw the trellis, okay. So the trellis is going to start, so I need a few stages for the trellis. So I will do, if I do an arbitrary stage, I know it has got four states, right, okay. So I think let us label them 0 0 0 1, 1 0 1 1, 0 0 0 1, 1 0 1 1, then where will the transitions happen, okay. If the input is 0, you will go to 0 0, if the input is 1, you will go to 1 0, the same thing happens for 0 1 also, for 1 0 you go to these two days, okay. So this is 0 0 0, 1 1 1, this would be 0 1 1, am I right, 1 0 0 and I could be wrong here, is it what, this is 0 what, 0 1 or what, 1 0 and the input is 0, you should get 1 0 right, 1 0, 1 0 1, then 0 0 1, 1 1 0, okay. So this was my trellis and I am going to take multiple copies of this, hopefully I should get that in this, okay, I do not know what is going to come and what is not going to come. So let us try it, oops, okay, we have got undo no, okay, let me try and put it very carefully on top, okay, I think that is good enough right, it is not too bad, okay, so you can see the trellis, okay, so I am going to do one more copy, okay, so I am going to have to do some erasing here, erasing here, erasing here, some drawing here, okay, I know it is not a perfectly drawn trellis, hopefully you can see how the trellis works, okay, so that is how the trellis works, okay. So basically what is an error event and what is free distance, okay, so think of convolutional code, think of a block code, okay and suppose you transmit the all 0 code word, okay, suppose somebody were to ask you what is the most likely erroneous code word that you will output, how will you answer that question, suppose you do BPSK, AWG and whatever, what is the most likely erroneous code word that you would have got, yeah, so I think in block codes what is the answer, in general it is a code word of least weight, right, because I mean that is closest from your original code word, suppose you deviate a little bit you would come back to a code word of least weight maybe, all 0 or if at all you go off somewhere you will come back to a code word of least weight, okay, so same thing happens here, suppose you transmitted the all 0 sequence, okay, every code word corresponds to a path in the trellis, so you traverse the top path, okay, what is the most likely erroneous path that you are likely to output, in some way it should be closest to the all 0 path, okay, so if at all it deviates away from the all 0 path, it should come back as quickly as possible, okay, so if you write down an erroneous path which deviates from the all 0 path and comes back real quick, we can talk about that as an error event, okay, so error events are basically, okay, so I have to be careful here, I mean I am just going to loosely say most likely erroneous paths, okay, so this makes a lot of sense, so for instance one thing that is very useful is to come up with an erroneous path of least weight, what do I mean by weight of the path, each path corresponds to a code word, the weight of that code word is the weight of that path, okay, so if you take a path you can take a sequence of outputs that correspond to that path, the weight of that is the weight of the path, okay, so if I ask the question what is the error event of least weight, okay, naturally that becomes the most likely error event because I am unlikely to make errors that are too far away, okay, so in this case if you want to draw something like the error event of least weight, since there are very few states and very few transitions, one can easily just eyeball that, right, so you are going to make first of all an error and deviate from the all zero state, so you have to deviate, if you don't deviate there is no error, right, so you deviate, you get here and then you have to quickly come back to the all zero state, one way of doing it is this path, okay, but the question is is there any other path, right, so the weight of this path is what? 5, right, the weight of this path is 5, you've got a 1, 1 here and you've got a 1, 0 here and you've got a 1, 1 here, okay, so the weight of this path is 5, so the question is can there be any other path of lesser weight, okay, so the answer here seems just like just looking at it carefully, it seems obvious, okay, but if the number of states is 256 and you have a 4 by 5 code, a lot of transitions happening and it's not very easy to find, okay, and you need a method for doing it and there are methods, yeah, one can use a modification of the Wittebi algorithm to, or any other shortest path algorithm on the graph to find this also, or you can also have analytical methods based on things called Mason's formula, etc., of analyzing the state diagram very carefully, one can have all those things and accurately find the, this weight of the lowest weight for an error event, okay, so that lowest weight is also called free distance, okay, so free distance is assuming all the, this, this, the weight of the lowest, most likely error event is also called the free distance, okay, so those are the two definitions which I want to write down, okay, so I should be careful here when I make these definitions that have a lot of rigorous base of defining it and giving you a loose definition, so in this trail is D3, okay, so this is denoted D3, D3 is what? 5, okay, and above example, okay, simple enough, okay, so in, in reality today you don't have to bother so much because, so because all these issues of D3 have already been solved in the, in the sense that suppose you have to design a 32 state convolutional code, suppose you figure out that in your decoder you can decoder 32 state convolutional code, right, that's the complexity, right, that's what fixes how many states you can handle, okay, then, then the next thing is rate, okay, what rate can you tolerate, maybe there are other things which tell you how low or how high a rate you can go, usually it's depend on things called, dependent on things called bandwidth efficiency, etc, etc, all these things spectral efficiency become important, so some, something might determine your rate, okay, maybe rate one half, okay, suppose I want to find the best rate half 32 state convolutional encoder, okay, suppose that is your, that is the definition, okay, so it's a simple search problem, right, you go to G of D, okay, assume all possible polynomials, right, how many are there, how many are possible, if I say I need 32 states, d power 5, d power 5 should be there, okay, and then the remaining things 4 plus 4, 8 bits are maximum of 256, okay, maybe actually less, maybe many of them are equal, okay, but still let's say 256, we'll search through all of them, find d3 for each of them, pick the one with the best d3, okay, so that's the logic, and people have done this already, okay, so today you can look at, look in the books and find the best rate half convolutional code out there for a particular number of states, okay, so that's why these things are not so critical today, the structural properties not very critical, okay, and the search space is also quite small, okay, it's very unlikely that you will ever design a 50 state, 50, what, 2 power 50 state convolutional encoder, okay, so it's very unlikely that you'll be able to decode it, okay, so in fact one can do it, but it's all very difficult, okay, so the search space will always be very very small and you can go through all of them and find the d3 and finish it, okay, so that's the, that's why I said it's not so critical that you know this today, but you should know these terms, because people will talk about these terms often, people will say free distance is the best free distance, okay, so that is one such thing, but one more thing you should realize because of the linear time invariant property for the convolutional codes, what will happen? If you have a feed forward convolutional code, what will happen? There will be, how many such code words will I have? Several, every time I keep shifting it, as my k increases, this will increase linearly with k, okay, so the number of such minimum weight error events will be fairly large in a convolutional code, okay, so those are observations we made even in the turbo code scenario, okay, another thing you'll see is when your SNR is reasonably high, okay, your convolutional code is very likely to make a string of errors, okay, right, the reason is once it starts making an error, it will make an error for a while and then it will rejoin, rejoin the all-zero state and not make the errors anymore, okay, so one can say this with based on experience or based on intuition also, that at moderately high SNRs, okay, errors will be in bursts, okay, that's the technical way of putting it, okay, so you make an error, you'll go wrong for a while in a burst and then you'll come back again and you'll be in the all-zero state and you'll maybe make another once again, go back and join, so that's something like that will happen typically for in moderately high SNR, very high SNRs, of course there won't be too many errors, very low SNRs, there'll be all kinds of errors, but moderate SNRs where you really want to operate, there's something like this will happen, you'll see there's a way of taking advantage of this property later on, okay, so I'll try to remember this property at least till the end of this lecture, okay, so eventually later on I'll call upon this, okay, so I think with this we'll say goodbye to convolutional codes and turbo codes and all that, maybe we'll come back and see that in the homework a little bit, okay, so this is, okay, all right, so yes, yes, what is catastrophic error propagation, so it's actually not a property of the code, okay, being catastrophic is a property of the encoder, okay, so if you have, okay, so for instance, maybe I should give you an example, if you have a g of t, okay, this is to explain catastrophe, if you have for instance g of d equals, maybe the example I've been citing is catastrophic, I don't know, maybe, I don't know, it's not, maybe, if you have say 1 plus d times d and then 1 plus d times 1 plus d, say if this is your encoder, okay, so what will happen if I put u of d equals 1 by 1 plus d, okay, what is weight of u, okay, think of it as an infinite sequence, what is 1 by 1 plus d, what is the actual sequence, 1, 1, 1, 1, 1, 1 forever, right, so that is, it's actually infinite, okay, what about the output, d, 1 plus d and weight of u is, weight of v is 3, okay, a situation like this is said to be catastrophic, okay, if your encoder is such that something like this can happen, for an infinite length input, if you get a finite length output, your encoder is supposed to be catastrophic, for the same convolutional code you can have a non-catastrophic encoder also, you just pull the 1 plus d out and do d1 plus d, that won't happen, so when will catastrophe happen, all these things, the gd's, the impulse responses should have a common gcd, right, when you factor it out, this 1 plus d should be there, gcd should be non-trivial, it should be something, some common factor should be there, okay, then catastrophe can happen, this catastrophe will not happen, okay, but for the same code there can be catastrophic encoders as well as non-catastrophic encoders and you better think the non-catastrophic encoder, okay, so something like this is probably not very nice, when you go from a block error rate to bit error rate conversion, it can cause all kinds of confusion, anything else, it's fine, there's also a whole bunch of decoders called stack decoders for convolutional codes, you might be interested in that and the claim is that high SNRs, sufficiently high SNRs, the complexity of the stack encoder is not even exponential in the number of states, it's lesser than that, so if you have really long, really large number of states, stack decoders can be used, okay, so that's another area which I've not spent time, okay, so that's pretty much an end for the convolutional codes and turbo codes, maybe not the end for trellises, okay, so the one last thing I want to point out before I leave these trellises for good is this notion of trellis for block codes, okay, so so far, so far we've always thought about trellises for convolutional code, can you have a graph, okay, so what do I mean by trellis, what was a trellis, trellis is a graph, right, you have nodes and bunch of branches connecting, what was the connection between the trellis and the code, every code word was represented as a path in my trellis, okay, so for a block code, how did we define block codes using generator matrices, parity check matrices, right, for a block code, can you come up with a trellis is a question that one might ask, okay, so what's the point in coming up with the trellis for a block code, it's the advantage, yeah, the ML decoding and MAP decoding are there, right, so far for block codes we never really had any implementation which claims to be better for ML or MAP, implementable for ML or MAP, right, we never had that, okay, so if you come up with, yeah, if you come up with a trellis representation for block codes, maybe one can run Viterbi on it, okay, or the MAP decoder on it and do your decoding for your block code also, so there's some advantage in doing it, okay, the question is can you come up with anything like a trellis for the block code, second thing is how many states will it have, right, that's also an important thing, if it ends up having a huge number of states, then it is maybe that's also not very efficient, okay, so I'm going to once again do this by example, I'll take an example for the Hamming code and show you how to construct, maybe a smaller code than the Hamming code and show you how to construct the trellis for it, it's very simple actually, we'll see one construction based on the parity check matrix, there's also several other constructions available, okay, so here's an example, I'll take a 633 code with parity check matrix, I'm sorry, okay, well let's see this explanation, we'll see, okay, so it's very, yeah, this is rate half, let's see this example and then I'll come back to it, the question was can you have such trellises for all rates because for convolutional codes we saw only trellises for 1 by 2 and 1 by 3 and all that, actually even for convolutional codes with 2 by 3 you can have a trellis, it's possible, I'm sorry, yeah, so far I've been saying puncturing is the idea but even if you don't want to puncture, you can have two input streams and you'll have four branches out of each state, it's possible, okay, alright, so the parity check matrix I'm going to take is very simple, I'll take 1 1 0, 0 1 1, 1 0 1, 1 0 0, 0 1 0, 0 0 1, you can check that all these parameters I've been talking about are true, okay, 633 is true, okay, then I'm going to draw a trellis for this, okay, so what do I know about every code word of my code, what does it satisfy, satisfies, h times c transposes 0, okay, let me try to write this equation in a slightly different way which possibly, which will possibly give me the, the, what, the trellis, okay, so remember what are the bits associated with each column, where will I put the parity bits, where will I put the message bits, last three columns of the other message bits, how many of you say last three columns are the message bits, no, last three columns are the parity bits, right, so the first three columns are the message bits, so I'll say m1, m2, m3, p1, p2, p3, so what can you pick, you can pick m1 and m2 and m3 to be arbitrary bits, p1, p2, p3 will get fixed by m1, m2, m3, right, see this is a different kind of a, I'm sorry, yeah, yeah, this is my choice for this, if you have something else, yeah, yeah, okay, the question is, are these, this is the only choice for message and parity, yeah, there are other choices also, but the way I drew, right, wrote down my parity check matrix, this is the most obvious choice, okay, because anything else will involve some confusion, okay, all right, so in the convolutional course things work differently, right, you could pick the message bits up to any distance you want and after that what do you do, you drive back to the all-zero state, so here also you should do something similar, right, but you're allowed to pick only the first three bits and the remaining three bits will get picked so that ultimately some all-zero state should happen, okay, some such thing as to result, okay, so I'll see how to do that with this HC transpose, okay, so what's happening, let me write that down fully, my parity check matrix is 110, 011, 101, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, okay, multiplied with M1, M2, M3, P1, P2, P3 equals what? 0, 0, 0, this is the condition I know that has to hold, okay, so I'll write this product in a slightly different way, I'll say this is M1 times 1, 1, 0 plus M2 times 0, 1, 1 plus M3 times 1, 0, 1 plus P1 times 1, 0, 0 plus P2 times 0, 1, 0 plus P3 times 0, 0, 1 equals what? 0, 0, 0, okay, so I'm going to think of a finite state machine whose inputs are, okay, it's at a particular state, okay, okay, so I'm going to think of finite state machine which is in some state, it takes input M1 and goes to some state, then it takes input M2, it goes to some other state, takes another input M3, it goes to some other state, after that what should happen? After that P1, P2, P3 should be input so that it comes back to the all 0 state, okay, so initially it was in all 0 state, comes back to the all 0 state, okay, since I want that and if you started this equation for a while, a very natural way of assigning the state is to say, right, right, I start with 0, 0, 0 as my all 0 state, after I input M1 what should become my state? M1 times 1, 1, 0, okay, after I input M2 what should be my stage? M1 times 1, 1, 0 plus M2 times 0, 1, 1, after I input M3 what should be the state? M1 times 1, 1, 0 plus M2 times 0, 1, 1 plus M3 times 1, 0, 1, okay, and then what happens? After that P1, P2 and P3 will be decided, you will know what P1, P2 and P3 should be so that you will go back to the all 0 state, but at every time what is the actual state? After P1 what is the state? The sum from left to P1, okay, so that is a very natural way of defining states and transitions, so that is what I am going to do and then draw a trellis, okay, so I will see the trellis is also very, very easy to come up with, so I am at all 0 state initially, okay, my input is M1, okay, if my M1 is 0 where do I go, okay, I remain at all 0 state, okay, so maybe this is the clumsy way of drawing the state, so I will do one thing, I am sorry, I will draw the state and write the state inside it, I think that is a very easy way of doing it, 0, 0, 0, okay, if my M1 is 0 I remain at 0, 0, 0, if my M1 is 1 where do I go? I go to 1, 1, 0, okay, and then again there are two possibilities, right, so if I am at 0 this will always happen, okay, if I am at 1 what will happen? If I get an input 1 for M2 what will happen? I will go to 1, 0, 1, right, so remember that, if I go to 1, go to, no, 0, 1, 1, right, 0, 1, 1, I am sorry, okay, here if you get 1 as an input what will happen, 0 as an input what will happen? You will remain at 1, 1, 0, if you get 1 as an input what will happen? You go to 1, 0, 1, okay, so that is what happens, so you see it is not as easy as drawing a trellis for the convolutional codes because the states differ and the same transitions do not remain the same, okay, so things change, okay, and it is very, it is a little bit confusing to draw trellises for block codes but the definition at least is simple enough, okay, and then the next stage is once again repeated, okay, so for 0's input you go to 0, 0, 0, 1 is the input you go to 1, 0, 1, okay, so I am going to have some confusion here, so let me draw that here, 0 is the input time here, I put 0, 1, 1 here, put 1, 1, 0 here, this is 0, this is also 0, am I right? So any time when 0 is an input you continue to remain at the same state, okay, so that is the only thing you can say at this constant, what happens when 1 is an input in each of these case? You have to add 1, 0, 1, okay, so you will go to 1, 1, 0 if 1 is an input, am I right? Okay, what will happen if 1 is an input here? You go to 0, 1, 1, do you agree? Okay, and then if 1 is an input here, what will happen? You will go back to 0, 0, I am sorry, yeah, so from here now you drive back to the all 0 state, okay, and that is very clear what to do also, right? You will have to do 0, 1, 1 from here, 1, 1, 0 from here and 1, 0, 1 from here, okay, so you do that, you would get what? This guy goes to 0, 0, 0, this guy goes to 0, 0, 0, finally you end up in 0, 0, 0, okay, so here you go to 0, 0, 0, 1, 0, 1, 1, you go to 0, 0, 1 and then with 1 you go to 0, 0, 0, okay, so make sure I am doing it correctly, go back and think about that, with 1 you go to 0, 1, 0 and with 1 you go to 0, 0, 0, right, so here you go with 1, 2, 0, 0, 1 and 0, 2, 0, 0, 1, okay, so I have to do this carefully, 0 to 0, 0, 1 and then 1, 2, okay, so that is my trellis for the block code, okay, that is all, trellis for, okay, one can make a lot of interesting observations, okay, so based on my definition, since I had 3 bits for my state, what might expect? I have a 8 state trellis, okay, so actually all the 8 states occur, do they occur? There is one state which does not occur, right, 1, 1, 1 never occurs, right, I will come back to that slowly, okay, so you see 1, 1, 1 never occurs here, okay, but all the other states occur, right, 0, 0, 1 occurs, 0, 1, 0 occurs, 0, 1, 1 occurs, 1, 0, 0 occurs, no, 1, 0, 0 also does not occur, right, yeah, you never go to the 1, 0, 0 state also, okay, and at any given time, how many states do you have maximum? 4, okay, so which is a little bit surprising, okay, that is the first comment I want to make, okay, it is a little bit surprising, okay, and all this happened because I chose my Ms very carefully, you know, I mean I put the Ms in the careful place, if I change my parity check matrix, this will change, okay, you know every code has several parity check matrices, right, you pick any 3, any set of n minus k linearly independent vectors from the dual code, you get a parity check matrix, if you change the parity check matrix, the trellis will completely change, okay, so for instance one problem that you could pose is to find for a given code the best parity check matrix which will give you the lowest number of states in your trellis, okay, 2 power n minus k one can always achieve, okay, if you want a trellis with say 2 power n minus k states, one can easily get it, right, that is what I told you, we will give you that, but if you want less than that, maybe there is a possibility, okay, but all these problems are not really open or unsolved or anything, okay, in fact very recent work people have shown that finding the least complexity trellis for a linear block code is what is called NP-hard, which means it is a very difficult problem to solve, okay, people have shown all these, so these things are tough to find, okay, so you would not find many implementations of soft decoders for block codes based on trellises out there in reality, okay, so it is not very practical, but it is a useful thing to know that you can have a trellis like this, and in case if you have a very short code and you want to do a soft decoder, you can use this, okay, for a 633 code, this is a very eminently implementable thing, okay, that is one thing, and what else? I think that is pretty much all I wanted to say about trellis, okay, so one can do different trellises and one can do soft decoders for block codes this way also, okay, it is a brief con, no, it is the same, what I have written is the output, so you can write 0 stroke 0, 0 stroke 0, input and the output are the same, okay, so you can imagine 0 being the input and 0 being the output, you can have a straight diagonal, you know, oh yeah, it will change, yeah, exactly, so this is like a straight diagonal, it is a slightly complicated notion, yeah, okay, so, so, so, yeah, I think that is all I wanted to say, okay, so next notion that one needs in practice, in practice you will see people use several ideas which give you some minor benefits in terms of actual design and actual practice, okay, so one such thing which gives a lot of benefit in practice in terms of complexity and in terms of being able to implement things and even in terms of performance, this is notion of concatenation, so I have not spoken much about it, but concatenation is a very, very powerful idea which is used in many systems, most systems that are out there today use some concatenation, okay, basically instead of doing one whole code, split the coding duty into several smaller blocks, okay, so I will make a few comments about concatenation, basically pointing out some important ways of concatenating things and things to watch out for and things to be careful about, okay, so one method which is very, very popular is introduced to the original concatenation methods, it is the following, I do not know how practical it is, I do not know but it is a useful way of coming up with large minimum distance codes from smaller minimum distance codes, okay, so this is how the concatenation works, so whenever you concatenate you put one thing after the other, okay, so this is what you do, you take an n1, k1 code over gf2 power k2, okay, some k2, I take an n1, k1, okay, maybe I should do a d1 also, let us put down, okay, so I can do this, right, you can imagine a reach Solomon code, a shortened reach Solomon code if you will, okay, over gf2 power 8, then I can always do this, I can do an n1, k1, d1 code, okay, so how many bits will I have here, how many symbols will I have here at the input to this block, k1 symbols over gf2 power k2, so how many bits is that, k1 times k2 bits, do you agree, you will have k1 k2 bits, right, that is what you would be encoding, so then, sorry is that a question, so what will be the output, n1 k2, right, so basically n1 k2 bits, so I will write n1 symbols, I will write k1 symbols here, n1 k2 bits, so you take each symbol which is actually k2 bits, then further encode that with a n2 k2 d2 binary code, okay, so what I am going to apply here is the same k2 here, n2 k2 d2 binary code, okay, so remember you have n1 times k2 coming in here, right, each set of k2 bits which correspond to one symbol over gf2 power k2, I am going to encode further with a n2 k2 d2 code, okay, so what will I get outside, n1, n2 bits, okay, so this is one of the most, one of the original ideas of concatenation, okay, so I believe they were introduced before me for this concatenation, maybe somebody else introduced it, it is also introduced by him and he studied this in great detail, okay, so overall, what code is this, this is an n1 n2 comma k1 k2, can I say it is a linear code in terms of bits, assuming all these codes inside are linear, can I say the overall code will be linear from k1 k2 bits to n1 n2 bits, just shake your head and say, yeah, how do you prove something like this, yeah, yeah, maybe some such thing you can do, but instead of doing g1, g2 and all that, basically you take a code word c1, okay, and then you take another code word c2, assume systematic encoding, okay, one can easily show that when you add these two, when you add the two symbols, when you add two symbols in gf2 power k2, what are you doing, you are actually doing binary XOR, right, in the binary representation, so it will also happen, the XOR addition in over gf2 power k2 will become addition in binary also, so the overall code n1 n2 comma k1 k2 code, one can show is a linear code, linear binary code, what is surprising and interesting is, one can show this is in fact a n1 n2 comma k1 k2 comma d1 d2 code, okay, so how do you prove the d1 d2 property, why do the minimum distances multiply, yeah, so that is the thing, once you know it is a linear code, only thing you have to worry about is, find the code word of minimum weight, okay, so suppose you take any code word, okay, any code word nonzero code word at the output here, it will correspond to a nonzero code word at the output of the first code itself, that nonzero code word will have at least d1 nonzero symbols, okay, it will have at least d1 nonzero symbols, each of these nonzero symbols has to be encoded by the second code into a nonzero code word again and each of those code words contain d2 bits, so d1 times d2 bits must be present in every nonzero code word and you get a d1 d2 multiplication, okay, so it is a very interesting way of doing concatenation, yeah, greater than or equal to, okay, so to be very strict it is greater than or equal to, in many cases when you concatenate it will be equal to, in some cases it might actually be greater, so it can happen, if you choose your codes carefully, you can concatenate, okay, so this is a very good idea in forgetting good codes, when you say good I mean the d by n should not vanish when k by n is kept a constant and you tend n to a large number, so you can actually design good codes using this concatenation idea, okay, so good codes can be designed, well good codes will be good, so if you can decode them it is okay, or possible by above method, yeah, let me come to that slowly, okay, so good codes are possible, what are the advantage do you think this concatenation gives you, so for instance this question was why can't I just construct a Reed-Solomon code for the larger block length, yeah this will be simpler because you can get away with a smaller Galois field size maybe, so so many other things are possible, maybe something else is also, maybe there are some other simplifications, but at least one can imagine a smaller field size for the Galois field, okay, but how will you decode this kind, is that good enough, what do you mean by decode first the outer code and then the inner code, is that good enough thing carefully about how will you decode, it's not that trivial to decode because the outer code when you decode you can only correct how many errors, d2 minus 1 by 2, okay it's got only d2, d1 d2 will not naturally come just because you concatenate it like this, okay, then when you pass it on so many things could have gone wrong and you may not be able to correct errors, okay, so while you can encode in multiple stages the only thing that will recover the entire decoding property is if you do joint decoding, okay, if you do individual decoding you may not be able to get back your full d1 d2 minus 1 by 2, okay, you have to be able to think about it, there are some possibilities here, there are some multi-stage decoders possible, but decoding is not straightforward, okay, if you want to get your full decoding capability, if you want to do something suboptimal it's fine, but if you want to get your full d1 d2 minus 1 by 2 then joint decoding is necessary, you can't do single decoding, okay, so decoding problem, decoding is not solved, okay, decoding is still a problem, okay, decoding up to d1 d2 minus 1 by 2 remains a problem, you don't really achieve any simplification there, maybe in some special cases you will but not necessarily always, okay, so this is one way of concatenation, so this is not one in concatenation, and concatenation is done for so many other reasons, this is not the only reason for concatenation, this is not the only way of concatenating things and in practice actually you'll find very few systems which actually use this kind of concatenation, okay, so one can, another concatenation which is actually more popular in practice and it was used still a short while ago before being replaced by another way of concatenation, so you see concatenation methods keep changing over time, okay, so this is the second thing that we are going to see is called the Reed-Solomon-Witterby concatenation, okay, RSV is what I'll call it, actually I should say Reed-Solomon convolutional code, okay, so but it's usually in practice people say Reed-Solomon-Witterby, okay, so when you say Witterby code it means convolutional code, it's understood that you always use Witterby decoder with convolutional codes, okay, so the idea here is very very simple, the concatenation works as follows, you put a RS code outside maybe over GF256, then you put a convolutional code inside, okay, that's all, okay and then it goes through the channel, then how do you decode, you put Witterby here and then you put say your PGC decoder, how many of you remember the expansion for PGZ, okay, I'm going to give one grade more for anyone who can say it right now, okay, so it doesn't matter, the bounded distance decoder that we saw in class, okay, so you do that, this is a very popular idea, in fact in many of the some of the space shuttles that were launched, not space shuttles, some of these deep, the distant planet exploration satellites that went out, this was the combination that was used, okay, so the idea is here once again, the burst error property of these Witterby decoders, okay, so you know convolutional codes, one can do ML decoding, okay, so you can do optimal decoding and read Solomon codes help you correct bursts of errors very easily, so you put a convolutional code and then one day error you will get outside of your Witterby will be hopefully bursts at moderate SNRs and you can pick them up and correct them with your read Solomon code, okay, so you'll see this actually gives you a benefit, if you do a BR versus EBO and not plot, you'll see the slope of the curve will sharply decrease when you do this concatenation, this goes down very fast, okay, so this was a popular idea for satellite communications till it got replaced by the product concatenation or the turbo product code, which is actually now becoming more popular in many places, okay, so what's the turbo idea, turbo product code idea, no, no, no, none of the, see any time you do a multi-stage decoder, question was is the decoding like this optimal, okay, clearly it's not optimal, if you take the total code, the total code will be something else, you're doing stage wise decoding, but this decoding is practical, can be implemented and can be done, okay, so these turbo product codes have replaced read Solomon Witterby in several places and in fact even in the system that we have here, how many of you know, we actually have a satellite communication system in this very building, okay, so in that system the read Solomon Witterby was replaced by a turbo product code and this performs much better, okay, so the word turbo is there, right, so obviously it should be better, but before we talk about the turbo, we'll talk about what the product construction is, again a very interesting construction, okay, so who's doing turbo product code you're doing now, okay, so maybe when he does it, this is like a prelude for what he talks about, I'll only talk about the product code, maybe the turbo part you can pick up, okay, so here what you do is you think of your message as a matrix instead of a vector, okay, so you take a K1 by K2 matrix, okay, so your message length is K1 K2 this, okay, and then you have two generator matrices, G1 is a K1 by N1 generator matrix and G2 is a K2 by N2 generator matrix, what you do is each row of M you encode with G2, okay, see each row of M is what, how many bits is that, K2 bits, okay, so I can multiply by G2 on the right, okay, that is a valid multiplication, so to form a code word C, I'm going to take M and multiply with G2 on the right, okay, and then each column of MG2, okay, I will encode with the code corresponding to G1, so when I do column what should I do, I should multiply by transpose on the left, okay, so I'll write G1 transpose, this is how I form my code word, so what will be the dimensions of C? N1 by N2, okay, it's a very simple idea, so the overall code is a N1 N2 comma K1 K2 code, that's also very clear, it's also very clear that it's linear, so it's very clear from this multiplication thing, you do M1 plus M2, you will get this, okay, right, and it's also from the associativity property of matrix multiplication, you know it doesn't matter whether you encode the row first or the column first, you will ultimately get the same product, okay, so you can either do the G1 transpose M first and then multiply by G2 on the right or MG2 first and then multiply by G1 transpose on the left, you will get the exact same code word matrix and that's also very nice to know, so you don't have to worry about whether you do the row first or the column first or what will change and what will not, okay, all these things are fine, now suppose I say D1 is the minimum distance for the code corresponding to G1 and D2 is the minimum distance for the code corresponding to G2, can you bound the minimum distance for the overall code, suppose the D minus D1 here and the D minus D2 here, what can you say, think about it, so in the final matrix C, can I say each row will be a code word of the code C1 corresponding to G1, yeah, likewise every column will be a code word of G1, I am sorry every row will be a code word of G2, okay, so that's the important observation, in C which is actually a N1 by N2 matrix, every row is a code word of G2 and every column is a code word of G1, right, that's how the matrix multiplication worked out, okay, so now think about it, can we bound the minimum distance based on minimum distance of, okay, actually the bound is D1, D2 once again, okay, how do you get that D1, D2, yeah, if you have a nonzero matrix, there should be at least one row which has a, which is nonzero, okay, and that nonzero row should have at least D2 once, so go to those D2 columns, all of them are nonzero, so each of those columns should have at least D1 once, so overall the minimum weight for every nonzero code word will have to be D1 times D2 at least, is that clear, okay, so that's how you get a bound form, it can be exactly D1, D2, but it can be greater than or equal, oh if I pick, I don't know, I don't think you'll get the same number, no, see here in this way I get only 2 power k1, k2 code words, if you do that how many will you get, if you do what you're saying, okay, let's talk about this after class, okay, his question is if you do some other division, so that's not what I want, okay, so here the minimum distance is greater than or equal to D1, D2, okay, so the argument I gave you is, I'm not going to write it down, but hopefully you remember that, so the overall code becomes N1N2, k1k2 greater than or equal to D1D2, so there's some confusion about what you asked me, I'll come back and tell you, okay, so this is fine, okay, so this seems like a good construction, so once again when you think about decoding there'll be problems, because see suppose you want to achieve the full D1D2-1 by 2 correcting capability, you cannot decode columns and rows independently, right, if you do that you can only do D1, but the main advantage in complexity is you're thinking of columns and rows which are much smaller codes than the entire larger code, which you may be, which you have difficulty in decoding, okay, so the advantage in this construction was you constructed a larger code with larger minimum distance using smaller codes which have smaller minimum distance, for instance the standard choice is you pick C1 and C2 to be extended Hamming codes, this is a very very popular choice, extended Hamming codes, so for instance the choice that's been, that was made for the decoder, for the code for this satellite modem that's out here in this building, it's basically we use the 64, 57, 4 extended Hamming code, okay, so extended Hamming code for, it's 63, 57, 3 is the Hamming code, 64, 57, 4 would be the extended Hamming code, okay, so if you use both of them as Hamming codes, what's the overall dimension? 64 times 64 is 4096 comma 57 times 57 comma 16, okay greater than or equal to 16, so how many errors can I potentially correct there? Minimum distance is 16, so 7 errors can be potentially corrected, okay, but if you want to achieve that full error correcting capability, you will have to decode the whole code word as a whole, okay, so that might be difficult, so the suboptimal decoders are basically decoders that do one row at a time and one column at a time and somehow put together and get a overall decoding, okay, so that's where the Turbo Product Code idea comes in and how you iterate between these things and all that, okay, I don't know you tell me, so the question is, is the Turbo Product Code different from the first concatenation we had, yeah, I believe so, at least we have only binary codes here, right, we are not thinking about them as codes over extended alphabets, only binary, okay, so and in practice you see Turbo Product Codes will be much better than the Reed-Solomon in terms of complexity, in terms of everything, it just works way better, okay, so that's one idea, okay, so, oh my god, getting close to ending time, so, yeah, there's one more type of, so sometimes, so several other concatenations one can do, you know, one can just take, say for instance a 50 comma 100 code binary, then concatenate it with a 100 comma 200 code, you know, I mean you can just do it, you know, don't worry about it, but you won't have any such nice distance properties, this D1, D2 multiplication came only for those two constructions, it's difficult to do any other concatenation which will give you the D1, D2 distance multiplication property, but it doesn't matter, one can do it, there are so many other advantages for doing these kind of smaller divisions, okay, so for instance, if you go back and look at Reed-Solomon codes itself, one of the earliest systems to use Reed-Solomon codes on CD drives and all that, they really use heavily shortened codes in concatenation, okay, so you might say why they wanted to concatenate and all that, but the motivation there is to simplify your decoding, right, even though you have nothing better to do, so I'll put that down as concatenation idea 4, but it's not really a major idea, so this is just an example, so the first code you have is a 28, 24, 5, okay, this is the RS code over GF 256, okay, so what, so this is obviously a heavily shortened code, shortened from where, what is the primitive code which was used to shorten this, what's the dimensions of the primitive code with the same error correcting capability 5, okay, let me post the question differently, suppose I want to correct, design a two error correcting Reed-Solomon code over GF 256, what will be its dimensions, what is the block length, what is the block length that I gave in class for Reed-Solomon codes, some ballpark figure, was it 1, was it 2, was it 10, what was it, block length that I used in class for primitive Reed-Solomon codes, what was the block length, I don't know, I think many of you know the answer, just throw it out, I want to hear it, this is Reed-Solomon, man, there's no BCH and all, this is Reed-Solomon, what is the block length, block length over GF 256 in terms of symbols, what was it, okay, I think at this point I'm not going to stop anymore, it's 255, okay, 255 symbols, okay, and then how many, what is the, what is K, for correcting two errors, how much, how many should you do, N comma N minus 2T, right, 2T, okay, so for every, so if you want a minimum distance 5, you have to have four consecutive roots, so you add N minus 2T, okay, so 255, 251, 5 code was the original primitive code, from there you shortened down to 28, 24, you set a whole bunch of messages to 0, okay, right, that's what you do to get this, 28, 245, Reed-Solomon code over GF 256, okay, so in one of the CD standards, one of the earlier CD standards, they do this concatenation, well there is, there are several other blocks in the middle, okay, so for instance, you might want to take say 28 times 24 symbols, okay, symbols are bytes, right, so you take bytes here, okay, what will you get here, okay, you encode them 24 bytes at the time into 28 bytes, so you will get here 28 bytes, okay, so then what you do is you do a row column interleave, okay, what do I mean by a row column interleave, yeah, so you put them, you stack these code words one row at a time and then you output the columns, okay, right, so you do a row column interleaving, okay, so these things get, say when they go in, they are going to go in like this, okay, one at a time and when they come out, which will come out first, this guy will come out first, okay, so across code words will come out first, okay, and then what you do is you send it through a 32, 28, 5, RS code over GF256, so what was the original primitive code for this, the 32, 28, 5, same, 255, 251, 5 code, but it was shortened to a slightly lesser degree, few lesser message bits were set to, message symbols were set to 0, you did that, okay, you did, if you do this, what will you get finally, 32 times 28 bytes, okay, so finally out of this 32 times 28 bytes, what will be the property, what can we say will actually happen, assuming you are doing systematic encoding, okay, assuming you are doing systematic encoding, what will happen, you initially had a 24 by 24, which was, no, 28 by 24, okay, 28 by 24, which was your message, and then you encoded what, each row, okay, each row was encoded to get a code word of 28, 24, 5 code, okay, and then what happened, each column was encoded to get a what, code word of 32, 28, 5, okay, so this is how it will look, okay, so here I do not know, I mean linearity and all that is, maybe 11, maybe 11, and even minimum distance will not really grow as 5 times 5, you cannot expect 5 times 5, okay, so it did not work so nicely as we had before, right, so I do not know, I do not know, maybe it is difficult to think of, difficult for me to think of linear this minimum distance here, because you have an interleaving and I do not know what is going to happen, right, I do not know, maybe one can argue for minimum distance here, okay, so think about it, think about how you would argue for minimum distance in a construction like this, okay, so that is something else, but that is not relevant, because in the decoder, what was done was these things were decoded independently, stage by stage, okay, so you decode the outer code first, basically you decode the columns first and then you decode the rows, okay, so you get some benefit by that, but it is not a product construction or not a very good other construction either, okay, so one can do this also, so this was used in one of the early CD standards, okay, so the primary problem if you think about it, remember the code words went out a column at a time, right, so one of the primary problems in CDs is scratches, right, you are worried about a sequence of bits going wrong and then you have to recover, this kind of a construction is excellent for scratches, when you do this row column interleaving, okay, if you get a series of errors, what will happen, if an entire column here gets erased, what will happen, look at it, yeah, only one error in each row, if two columns get erased, what happens, it is only one error in each row, okay, so that is the most nice thing about something like this, okay, so that is why this row column interleaving gives you a lot of protection against burst errors and CDs, this is a very common thing, interleaving is very very common, even on hard drives where people use resolument codes, interleaving is very very common, interleaving across code words in a row column fashion will always give you an improvement in burst error correcting capability, okay, so, yeah, second part gives you more protection against random errors, no actually what they did in the CD standard was very interesting, they would, see, how do you know the entire thing is getting erased, for instance, how do you know, how do you detect a burst, right, you could have some electronics which figures out if the whole thing, there is a scratch or not, etc., you monitor some other signal and figure out, maybe it is possible, but purely from a coding point of view, if you do not, if you are not allowed to use any other signal, how can you find out if there is a, if there is a burst of error or not, there is a scratch or not, so, maybe they did not even do things like that, they were saying, if the outer code fails, I will declare the entire column as gone, no, I will declare that there was a scratch, every time the outer code fails, remember each element code will fail, right, if there is a lot of error and that is very likely to happen and since if the outer code fails, they will declare actually that the whole column was erased, so, some such thing they did, you know, I mean, so, some very simple idea like that to even detect the burst and came out in the 80s, this first CD standard, I think CDs were not that popular before the 80s, I guess many of you may not know these things, but see, see, if the CDs came out, this was one of those first standards, so, the outer code also pays the role of detecting when errors happen, all these things can be used in engineering solutions, all these other things are important, finally, it becomes a product, these kind of things play much more important role than being able to do something else, okay, all right, so, this is, this is a very good place to stop, there is one more minor thing which I actually wanted to talk about and do we have time, we have like 10 minutes, so, let me just finish off with that, okay, so, the last bit which I want you to leave with this is, so, so far we have only talked about BPSK, BPSK, I mean, if you do QPSK, it's this very minor extension of BPSK in two coordinates, this basically QPSK can be seen as BPSK except that spectral efficiency goes up a bit, okay, so, so, none of your coding ideas and your LDPC or turbo cores, none of the design has to change when you go from BPSK to QPSK, but when you go from, go to 16 QAM or 8 PSK or some, so, genuinely different constellation, then how do you get LLRs, this is an important question, okay, how did we get LLRs for BPSK and QPSK? 2R by sigma square, it was a very simple formula, right, but when you do 16 QAM or say 4 PAM, okay, then there are too many bits in one, you don't know how to do, you should know how to do LLR, okay, so that's one, that's one point, okay, so, so, so, so, that's the last bit I'm going to leave you with basically coded modulation, so how do you do higher order modulations to achieve higher spectral efficiency, so why would you want to do 16 QAM, okay, you're being more spectral efficient, no, the same bandwidth you can do 4 bits, while you are doing only 1 bit and 2 bit with BPSK and QPSK, when you do 16 QAM, you're doing 4 bits in the same bandwidth, but what do you need to be able to do 16 QAM, SNR, you need obviously higher SNR, you remember the capacity curves that I've plotted, be able to do anything with 16 QAM, you need higher SNR, so if you have higher SNR and you have limited bandwidth, you will have to go to larger constellation sizes, and when you go to larger constellation sizes, you have to pay attention to coded modulation, okay, so there's coded modulation with convolutional codes called trillist coded modulation, okay, so we will not see that, we will just see for block codes and how to go about doing it, okay, so one idea that might become more practical, maybe not the optimal way of doing coded modulation, but what might become very practical is the following, you might do some block code here, okay, imagine an LDPC code, okay, because you want to do soft decoding, so maybe an LDPC code, okay, and then you have your, what I'll call as a mapper, what does the mapper do? In BPSK, we add a mapper, right, what does the BPSK mapper do? Take 0 to plus 1, 1 to minus 1, for 16 QAM, obviously the mapper is more complex, it'll take 4 bits to a 16 QAM symbol, which will be actually a complex number, so we think about it, A plus IB, where A is 1 minus 1, 3 minus 3 and B is also 1 minus 1, 3 minus 3, so we think about it, that's the constellation, so the mapper does that, okay, I do it in some way, usually one of the mappings which is very common in practice is called, what's called the gray mapping, you might be familiar with it, any two things differ, so some such mapper, so maybe gray, okay, think of it as gray, okay, and then what happens? It goes through, say AWGM, okay, so what's becoming much more common today is the fading channel type scenario, right, so today all these things are being done with wireless, so wireless is the main non-trivial problem, if you have a channel which is static and it stays the same, people know how to deal with it, so maybe this needs to be replaced by some fading channel, and when you do fading etc, people find it useful to put an interleaver here, okay, so I'll just, I'll write a pi here and put a box around it, maybe you have an interleaver in between your code word and the mapping, okay, so that's very, that's also a popular idea, you'll see in the decoder we can use these things, it's quite easy to do these things, so now at the decoding end, what do you do, the first thing you do is demapping, but remember you can't do a hard demapper, you can't just decide on the bits, you have to decide on LLRs, okay, so the output of the demapper is LLRs, channel LLRs, okay, and for 16 QAM etc, finding LLRs is not as easy as finding LLRs for BPSK, right, if you take one bit in 16 QAM, it will be 0 for how many signal points in 16 QAM, 8, right, 8 of the points will be 0, 8 of the points will be 1, and each of them will correspond to different actual signal that was transmitted, so you have to do a lot of conditioning, you will get a lot of expressions in the numerator, a lot of expressions in the denominator, it will be a nasty function which will not simplify, okay, so there are approximations for that, there are changes possible, but in general, this demapper will be more complicated for higher order constellations, okay, but the demapper will output LLRs, and then you do what? You do a Pi inverse, and then do a, say a soft decoder, soft in, soft out decoder, okay, this kind of a structure is becoming very, very popular, these days people are talking about it, it's a very, at least a lot of implementations today for wireless systems and all that, people are proposing things like this, okay, so some such idea is becoming very, very popular, and in fact what people are suggesting is, anytime you have different blocks like this, and each of them are soft processing, what can you do? You can subtract out intrinsic, extrinsic, and do turbo iterations. Remember this demapper, what does it do? It takes soft value in and produces soft value out, what does the decoder do? It takes soft values in and produces soft values out, there is nothing that stops me from doing this, okay, put a Pi in the middle, I'm sorry, okay, and then you have to extract the extrinsics and then do demapping once again with a priority information, okay, do demapper again, then go back and do your softness of the decoder, then come back and do it, keep on doing iterations, and these iterations are actually useful in some cases, actually for gray mapping, these iterations are not that useful, so that's why gray mapping is preferred, and gray mapping gives you very good performance usually, okay, so people typically prefer gray mapping and not do this iteration, but these iterations are possible, so what will happen in many situations is, this will not be a simple AWGN channel, there will be some what's called ISI plus AWGN, so then what should you do here before you do demapping? You have to do something called equalization, okay, so today people are thinking about doing soft equalization, and then doing these iterations back to the equalization, so all these things are being tried, okay, so for ISI case, things get more complicated, okay, so AWGN at least, one can stop with this, okay, so when you go to 16 QAM, 64 QAM and 256 QAM etc, higher order constellations, doing these simple decoders become more complex, but they can be done, they're implementable, they're very much doable, and in fact we love proposing these kind of things, okay, so I think that with that we can put a big full stop to the whole course, and as I said,