 Okay, so the audio seems to do something here so that means it should work, okay, alright so lecture 18, okay, lecture 19, okay, so don't count the tutorial classes, it's not counted, alright, so the last thing we were saying was basically BPSK over AWGN, okay, so you can see this audio bar moving out, right, so if at all it freezes, let me know, okay, otherwise my audio, my voice will not get recorded, okay, so we've been looking at using examples of simple codes over with BPSK modulation over an AWGN channel and how will the optimal receiver look like, we looked at two receivers, two different optimal receivers we looked at, one was ML, right, which would reduce, which would, well the way you derive that is by starting and assuming that you minimize the block error probability, the other receiver that we saw is what I call bitwise MAP, which is presumably going to minimize the bit error probability, okay, but both of these are optimal, in practice you'll see both of them have very little difference, okay, in any figure of merit both of them will be very similar, okay, so we'll begin, we'll begin by looking at one more example to just to drive home the whole point, okay, so maximum likelihood became equivalent to what, minimum distance, right, minimum what distance, Euclidean distance, in fact for BPSK you can go one step further and say it's the same as, what, maximum dot product, right, maximum correlation is also the same, okay, but bitwise MAP we did a whole bunch of simplifications and finally we saw the best way to write it is what using, using likelihood ratios, I mean use likelihood ratios, you can simplify the expression a little bit and write something that looks reasonable at least, okay, so what I'm going to do is pick an example slightly more complicated than before then write down the actual expressions for both of these decoders for that code, just basically write down a description of what you would do if you have to do ML decoding or bitwise MAP decoding, okay, so here's the example we'll take, I want to take a reasonably complicated example, so I'll pick n equals 6 and k equals 3, okay, and my code is going to be generated by, I'll take a linear code, we'll take what shall we take, we'll take in any 3 linearly independent would do, 1, 0, 0, I'll take something very simple, excuse me, okay, so maybe I should take something that's slightly different, 0, 1, 0, we'll take what shall we take, 1, 1, 0, okay, and then 0, 0, 1, we'll take 0, 1, okay, so this is my code, okay, so the way you can see the first 3 bits, first 3 bits have chosen to be systematic, right, if you write it as a generator matrix, you'll get the identity matrix, so of course this is nice and linearly independent, there's no problem, so I want you to spend some time and write down a description for the maximum likelihood decoder and the bitwise MAP decoder, yes, this is a basis, no, this is just a basis vector, this is not a set of all code words, I don't have the all 0 code words, okay, so I know this is a little bit of a drudgery, I think there is a point there, okay, eventually you'll see this, so I wanted to show, I want you to realize at the end of the day that when I describe some decoder, the fact that it works so well is pure magic, okay, so if you don't do this drudgery and work through and see the maximum likelihood decoder and the bitwise MAP decoder and how painful it will be even for such a small block length, you will not realize how amazing it is when I finally give you a decoder which works very close to these decoders, okay, which is practical, which is implementable, which is possible to, in fact it's being implemented today in practical chips, okay, so that wonder will not come unless you go through this drudgery once, okay, so even if it is a little bit painful, list out all the eight code words, try to write a description for the maximum likelihood decoder, how will the description for the maximum likelihood decoder be, okay, suppose you have a received vector r which is r1, r2, r3, r4, r5, r6, okay, you should say you will evaluate for instance eight different correlations, right, and pick the maximum, so you have to say argument of maximum of all those things, what are each of those things, okay, it's not too difficult to write down the maximum likelihood, okay, you have to start by listing out all the code words and maybe even all the symbol vectors, right, or maybe if you can read it off from the code word itself, it's not too bad, but write it down, okay, so I'll do my own calculation here, you don't have to look at the screen and continue to keep doing your own, so this is the code, okay, so what are the various correlations that an ML decoder has to evaluate if you have to write it down, so the correlations will have to be r1 plus r2 plus r3 plus, so r6, that's the first correlation corresponding to the all zero code word, the code word on the right will give you what correlation, minus r1 minus r2 minus r5 minus r6 plus r3 plus r4, right, that'll be the next correlation, the correlation here will be minus r1 minus r4 minus r6 plus r2 plus r3 plus r5, that'll be this correlation, what'll be the next correlation, minus r1 minus r3, I'm just writing all the minuses together and the pluses together, you can do it in any way you want, it's just, I just like the looks of it when you write all of those together, r4 minus r5 plus r2 plus r6, okay, what is the next correlation, r1 plus r3 plus r6 minus r2 minus r4 minus r5, okay, r1 plus r5 minus r2 minus r3 minus r4, okay, r1 plus minus r6, right, did I get that right, so there's one more r6 term, okay, r1 plus r2 plus r4 minus r3 minus r5 minus r6, okay, minus r1 minus r2 minus r3 plus r4 plus r5 plus r6, okay, so anytime you want to do MLD coding, what do you have to do, look at all the six values that you receive, evaluate each of these things, one of them will be maximum, corresponding to that you pick out that code word as the most likely code word that was transferred, okay, this is what you do for the, these are the dot products, right, for MLD coding, okay, so try to write down the bitwise MAP expression for say bit1, bit2 and bit3, okay, those are the three message bits, right, maybe those are the only three bits I care about, I don't care about the other three bits, so I try to write down the expression for those three bits at least, then let's see how it looks, okay, so to write down that expression for the first bit, I need to divide my code into two halves, the set of all code words which have zero in the first position and the set of all code words which have one in the first position, okay, so if I have to say tick the code words which have one, those are the code words which have one in the first position, right, so now if I have to write down, say the, okay, so I will use some terms here, so I will say li, small li is what, e power 2ri by sigma square, so it's the, this is the intrinsic likelihood ratio for each bit, okay, so from here, so the final likelihood ratio will be what, the product of the intrinsic and the extrinsic likelihood ratio, so how do you compute the extrinsic likelihood ratio, you have to use that formula, okay, you sum over all code words which have zero in the ith position, then some product of certain likelihoods according to where the zeroes are, okay, and then you divide by sum over all code words which have one in the ith position, product of the likelihoods, wherever you have zeroes, okay, so suppose if I have to write down, I am going to write down capital L1 which is the a posteriori likelihood ratio for the first bit, okay, so it's a product of L1 times what, what do you have in the numerator, first term maybe corresponds to the all zero code words, so that's going to be L2, L3, L4, L5, L6, okay, the second term maybe corresponds to this guy here, so that will be L3, L6, am I right, and the third term is L2, L4, fourth term is L4, is that clear, okay, and then in the denominator you would have what, all the other guys, okay, I am sorry, what is the L5, yeah, you are right, this is the L5, okay, so the denominator what would you have, okay, so the first code word will give you me L2, L3, L5 plus what, next would be L3, L4 plus L2, L6 plus L4, L5, L6, okay, so that's my a posteriori likelihood ratio, so simple thing to write down at the end of the day, right, so once you know the split it's very easy, so please take some time and write down L2 and L3, I mean there's a valuable lesson at the end of the day, okay, so you may not believe me now but there will be one lesson which we will take from here, just an observation, I think nothing very great, but we will observe something after writing L2 and L3, okay, I am going to write, you can look at the screen to make sure if I am doing it correctly or not, but go ahead and do it on your own, okay, let me know if I am making a mistake, L6 now, okay, the same term will usually never show up in the numerator and the denominator, so let me use that as a eliminating tool, okay, so you have two sets of expressions here, let me know if I am making any mistake, I think this looks okay to me, but I could make one or two mistakes here, but it's not too critical, I want you to appreciate something, okay, so you have one set of expressions that you have to evaluate for the maximum likelihood decoder, and you have another set of expressions which you have to evaluate for the bitwise MAP decoder, okay, so I mean the same problem can be posed for the maximum likelihood decoder as well as for the bitwise MAP decoder, but I am going to pose it for this bitwise MAP decoder, you can go back and figure out the same problem for the maximum likelihood decoder, suppose I tell you I give you L1 through L6, okay, maybe I do the exponentiation already, I give you L1 through L6, the problem I pose is what is the minimum number of additions and multiplications I have to do, forget about the division, okay, maybe there is a division which I have to do three divisions, okay, so let's keep that aside, okay, there is a way to get that, what is the minimum number of additions and multiplications that I have to do to evaluate capital L1, capital L2, capital L3 as ratios, okay, how would you answer that question? Do you think you can look at this long enough and come up with an answer? First of all, one easy way of saying it is I just evaluate each thing brute force independently, if I do that it's very easy for me to count the total number of additions and multiplications, but will that be the minimum number? Obviously not, right, look at L4, L5, L6, it's showing up in each term L1, L2 and L3, okay, so at least there are terms which are repeating which you don't have to reevaluate for each capital L1, L2 and L3, okay, but there's also something that's more hidden, there is a smarter trick that you can use to minimize the number of multiplications, what is that? What is it that you can do? Yeah, so you can use the distributive law, right, how will you use the distributive law? If you have to one term that is common between 1L, small L, which is common between two terms, what can you do? You can pull it out and then your number of multiplications will go down, right, right, it's a simple principle, you have AB plus AC being equal to A times B plus C, okay, there is nothing great about this expression, but if you count the number of computations you need, on the left hand side you need two multiplications, one addition, on the right hand side what do you need? One addition, one multiplication, okay, so you can use this and further simplify, and it is not at all obvious what the minimum number of additions and multiplications are, okay, I think in fact if you can figure it out for a general code you would become very famous, it's a complicated problem, okay, so, but you see that using these distributive laws and evaluating all of them together as opposed to individually, we can potentially simplify the number of computations, okay, and this will be crucial as we go along, can imagine why, okay, so right now I have a 6,3 code, so I have four terms in the numerator, four terms in the denominator, if I have a 1000 comma 500 code, how many terms will I have in the numerator? 2 power 499, okay, denominator you will have another 2 power 499, there is no way you're going to be able to actually do it brute force, okay, it will not work, okay, for all you have, you are going to have any hope of evaluating these terms, you will have to do, you have to be smart about it, you have to be very, very smart about the computations, okay, so as we go along, we'll see the one specific way of being smart about this computation, but I want you to remember just to stare at this expression long enough, it's not very clear how you go about it, okay, there is something non-trivial here, we will see that as we go along, okay, so how to simplify this, the same problem can be posed even for the maximum likelihood decoder, right, if I give you R1, R2, R3, R4, R5, R6, I tell you what is the minimum number of additions that you have to do to compute each of these correlations, how will you answer that question, it's not clear how to answer that question, right, so because there are a lot of terms that show up in common, okay, and you have to spend a lot of time and it's not very easy, right, and it's difficult to do, but as you will see, I mean we'll be mostly worried about the bitwise MAP type decoder and not the ML decoder, so we are more interested in the rest, okay, but these two things are clear, right, bitwise MAP and the ML, how do you write it, how do you at least write the expression down in the general case and try to evaluate it, okay, so it's also possible to do the following simplification, so typically people will deal with log of the likelihood ratio and not the likelihood ratio, okay, so you can see why that is useful, what is log of L1, simply 2R by sigma square, it's easy to do, okay, so if you do that what will happen to these expressions, if I take log on both sides what will I get, log of capital L1 is log of small L1 plus log of some nasty expression which you will probably not be able to simplify, okay, so it's possible to think of it that way, all your multiplications will become additions, but it's still, it's not very easy to deal with that also, okay, it's one thing I want to point out before we move along, I think that should be good enough as a general introduction to bitwise MAP and ML decoder, any questions on this, I mean we can do one more example and write it down, I'm going to do one more example, I'll do the repetition code example, but before that if there are any questions, now is a good time, because this is an important idea, it's slightly different and it's tough to give examples, right, for syndrome decoder I can actually give you a decoding example, even for the Reed-Solomon or BCS decoder I can give you an example, it's not too difficult to imagine, but this is so painful, you know, I mean you have to evaluate all these things, it's just a calculator example, okay, the insight is very difficult to get, okay, yes, okay, so the question is basically why do we want to think of it as intrinsic and extrinsic, okay, you're right, I mean there is no real special reason why you want to think of it as intrinsic and extrinsic separately, it's just that when people wanted to simplify these expressions and evaluate it approximately, it was found to be very useful to split it as intrinsic and extrinsic, okay, so all of that is for approximate evaluations, the exact evaluation, yeah, there might be better structure if you have the L1 and L2 insight, maybe the way to describe that is simpler, okay, so the next one question, one more thing I want you to think about before we move to other things that we will see in this class is how do you go about analyzing this decoder, what do I mean by analyzing this decoder, how do you compute probability of decoding error for this, for these two decoders, okay, let's start with the ML decoder, okay, how would one compute, think about it, how would one compute probability of decoding error for this ML decoder, what is the, what is the approach, what should you, how should you start, it's confusing, it's not so clear, right, so to better answer that question, we'll see the same example for the repetition code, okay, and then we'll do some analysis for which it's easy and then I'll come back to this, okay, so, but if you can already see how to analyze this, it's great, but otherwise wait for the repetition code and then see, okay, the next example we're going to see is the repetition code, say the n equals 3 repetition code, okay, any repetition code is the same, you'll see the analysis is very similar, everything is very, very similar, okay, so I want you to write down same things as we did before, expressions for the bit ML and the bitwise MAP decoder for the repetition code, assume that we have R equals R1, R2, R3 being R3, R1, R2, R3 is being received from when you transmit a code word, received with BPSK modulation over AWG and so on, okay, so ML you'll see is very, very easy, okay, the correlations you have to evaluate are what, R1 plus R2 plus R3 and minus R1 minus R2 minus R3, if R1 plus R2 plus R3 is greater than minus R1 minus R2 minus R3, you're going to say the decoded code word is 0000, else you're going to say it is 111, right, you can say simply else, you don't have to do more than one maximum and if you simplify that condition R1 plus R2 plus R3 is greater than minus R1 minus R2 minus R3, what will you get? You can move everything to this side, okay, so you'll see ML decoding becomes equivalent to if R1 plus R2 plus R3 is greater than 0, C cap is 0000, okay, else C cap is 111, okay, for the repetition code it becomes very, very simple, okay, so spend some time and write down the bitwise MAP, it's enough if you find capital L1, right, any one bit is the message, what is capital L1? L1 times L2 times L3, okay, so if I write down the multiplication, what do I get? E power 2 by sigma square times R1 plus R2 plus R3, so how will I make decision based on likelihood ratio? The likelihood ratio is greater than 1, the transmitted code word was 0000, the likelihood ratio is less than 1, the transmitted code word was 111, right, what is the likelihood ratio? Probability that the bit is 0 divided by probability that the bit is 1, if it is greater than 1 then it is 0, if it is less than 1 it is 1, so when will e power something be greater than 1 and less than 1? Yeah, so e power x is greater than 0 if, greater than 1 if x is positive and if x is negative it is 0, so that condition becomes equivalent to what? Exactly same as ML, okay, so for the repetition code both of them are the same, okay, so this will not happen for other codes, it will not work out so nicely, okay, so the repetition code it is very, very nice, okay, so let us talk about analyzing this, okay, in general it will be very close, if you plot either probability of block error for both and probability of bit error for both, you will see it will be very, very close. There are some simple relationships between block error and bit error, so it will be okay, okay, so how do you go about analyzing it, okay, the first question I will ask you is are you familiar with the analysis for BPSK AWGN when there is no coding, okay, I hope you are all familiar, what is the probability of error for that, in terms of some function, the Q function, right, Q of 1 by sigma, the way I picked it is plus 1 minus 1, so probability of error is Q of 1 by sigma, right, so that is the uncoded case, okay, we will have to start from there, if this is your constellation plus 1 minus 1, right, probability of error is Q of 1 by sigma, what sigma? Sigma is the variance of the additive white Gaussian noise, how did I get 1 by sigma, 1 is actually 2 by 2, the distance between plus 1 and minus 1 divided by 2, okay, how do you come up with this formula, yeah, so you have to find the conditional distributions and do it, and then one assumption that is inherent in this is that both 0 and 1 are equally likely to be transmitted, okay, so once you do all that, you will get this simple expression and Q of x is the tail probability for the zero main unit variance Gaussian random variable, okay, so this is very standard digital probable, digital communication stuff, I am not going to spend much time on this, okay, so now you can do a very simple extension of this to evaluate the probability of error for the repetition code, it is not very difficult, okay, so what do I have to do, yeah, so here in this case it is just R1 plus R2 plus R3, okay, so how did I evaluate this, this probability of error you will see is the same as in the uncoded case probability of say for instance minus 1 given plus 1 was transmitted, am I right, am I right, probability of error for this uncoded case is probability that you decoded minus 1 given that plus 1 was transmitted, how did I evaluate this, I found the conditional distribution for R given that plus 1 is transmitted, right, find the conditional distribution for R given that this one is transmitted and then evaluate the probability for that conditional distribution to be less than 0, that random variable should be less than 0, what is the probability of that, same thing you can do here, all I have to do is evaluate the conditional probability for R1 plus R2 plus R3 given that 0, 0, 0 was transmitted, okay, and find the probability that that random variable will be less than 0, okay, so let us try to do that, suppose I define D to be equal to R1 plus R2 plus R3, okay, given that C equals 0, 0, 0, okay, suppose I define this to be my random variable, decision random variable D if you will, what will be the PDF of D, I want you to give me the exact specification for this D, what is the PDF of D, what is the distribution for D, okay, normal with mean 3 and variance 3 sigma square, does everyone agree, okay, given that you transmitted plus 1 plus 1 plus 1, the 3 R1 and R2 and R3 become independent, so each of them is Gaussian with mean 1 and variance sigma square and I am adding 3 independent Gaussians, so I am going to get mean 3 and variance 3 sigma square, so what is the probability that D is less than 0, this is the probability of error, right, root 3 by sigma, do you agree, everyone agrees, which will be smaller, Q of 1 by sigma or root 3 by sigma, root 3 by sigma, right, the tail probability you are increasing by, is that clear, okay, so this is how simple it is to do analysis for the repetition code, can you extend this kind of an analysis for the ML code in the general case, maybe even the 6-3 case that we saw before, is it possible do you think, what do you have to do, it is not easy, okay, so you have to find some complicated joint distributions which will not work out so easily, okay, so this in fact can be extended for any n, suppose I do a n1 repetition code, what will be probability of error, Q of root n by, okay, so it sounds as if I can get as low a probability of error as I want just by increasing n, right, just true, it is true to an extent, you will see but there is a hidden thing here which will give you some bad news soon enough, so this is not the best code out there, there is an obvious reason for it, we will come to it as we go along, but for now at least this analysis can be done, okay, so there is two points to this analysis, one is to say that the analysis in the general case is very difficult, okay, for ML decoding even, okay, so if you move to bitwise MAP, how do you think we will extend this analysis, how can you extend an analysis like this for bitwise MAP, how would you do it, what is the way to do it, you have to actually find the PDF for l1 conditioned on a particular code word, then see when it will become less than 1 or greater than 1, okay well at least that seems like the brute force simple approach, okay, do you think you can possibly do it, it is much more difficult, right, it involves products and it involves ratios, these things are not that easy, okay, it is not going to be easy, it is going to be very difficult, leave alone doing it for a 1000 comma 500 code, okay for a 63 code it is difficult enough, you cannot even imagine doing it for a 1000 comma 500 code, once again the reason why I am doing it is later on we will see a way of analyzing an approximate version of this bitwise MAP decoder, at that time when most people look at it for the first time they say hey this is a very approximate way of analyzing it, maybe this does not even hold, okay, but believe me, the fact that you have at least some analysis is highly significant, okay, so because these expressions are not easy to deal with, you cannot even know what they are, right, the fact that at the end of the day you have even some way of analyzing it and saying my probability of error will behave like this, if I do this decoder for such a large code that itself is amazing, okay, so again I want you to have that wonder when we come to it, okay, so that is why I am trying to give you a glimpse of how difficult it is to do analysis for these soft decoders, any soft decoders usually very hard to analyze, okay, and the two these ideal decoders are very difficult to analyze, okay, so another point of doing this analysis for the repetition code is in both cases you saw the probability of error was a function of what, it was a Q function but forget about the Q part of it, I want function of what variable, sigma, okay, so that is one to start off with you say it is sigma, okay, but I can be a little bit more clever and say it is a function of 1 by sigma, okay, so I can say that, right, in both cases it is a function of 1 by sigma, in fact you might be used to the way, used to the definition of SNR, right, signal to noise ratio, what is the signal to noise ratio for BPSK over AWG, 1 by sigma square, I would not be wrong even if I say in both cases it is a function of 1 by sigma square, right, in both cases we saw it was a function, bit better the rate was a function of 1 by sigma square which is actually the SNR for the BPSK over AWG and situation, okay, so one might from here extend and say probability of error, okay, can be studied as a function of SNR which is 1 by sigma square, okay, there is lots of practical merit in doing this, okay, when you go back from coding theory to the world of actually building communication systems you need some language in which you have to describe yourself to the engineer out there who is going to be building complicated circuits, right, power amplifiers and all that and you need a common language and SNR turns out to be a wonderful common language and it is really great that in our model the SNR controls bit error rate, okay, then it means the SNR that you can describe in practice, right, for practical systems you can think of SNR, right, signal power divided by noise power, all that has meaning even in our model which is very nice, okay, so you will see typically people will plot probability of error as a function of SNR, what kind of a function do you think it will be as a function of SNR, it will hopefully be a monotonically decreasing function of SNR, right, when you increase SNR what should happen to the probability of error, it should decrease, right, one hopes that it will happen, okay, if you do a proper decoder it will happen and you expect that to happen, okay, it is also typical to deal with SNR in the decibel scale, Db, okay, so what is the Db definition, 10 times log base 10, 1 by sigma square, all right, so in the uncoded case suppose I want a bit error rate of 10 power minus 6, that is a good point, no, 10 power minus 6, okay, what is the SNR that is typically required, you might know this number in Db for PSK over AWGN, Q of X, when does it become 10 power minus 6, okay, maybe you should know this number by heart but even otherwise it is around 12 Db, 13 Db or maybe 10 Db somewhere around that number, okay, 12 or 13 Db, Q of 1 by sigma when plotted over SNR in Db will come to 10 power minus 6, okay, so let us do that plot for the, just for reference here, suppose if I were to plot probability of error versus SNR in Db, okay, so it is always typical whenever you see books or papers that deal with coding, they always have first of all an uncoded plot, okay, so an uncoded error rate would behave something like this and you will see most plots will stop at 10 power minus 6, okay, so 10 power minus 6 is considered 0, okay, so you have achieved everything you want to achieve if you come down to 10 power minus 6 and this point is around, let us say 13 Db, okay, so just to give you a firm number, let us say it is around 13 Db, okay, this is uncoded, sorry, okay, so using all the experience you have understanding Q functions, understanding Db, okay, hopefully you have a lot of experience doing all that, I want you to plot on this same graph the probability of error for the repetition code, for the n equals 3 repetition code, how far will it be away from the, okay, so this is what, this is a plot of Q of square root of SNR, right, so I want you to plot probability of error for the n equals 3 repetition code, what is for the n equals 3 repetition code, this Q of square root of what, 3 times SNR, so if I were to do all the Db conversion carefully and plot it on this, where will it be, shifted left by, left or right, left, hopefully shifted left, right, why is left shifting better? Yeah, lower SNR you get the same probability of error, that is what you want, okay, that is what you hope coding is going to give you, okay, so how many Db will it shift to the left, factor of 3 is, yeah, some 4, 4 and a half, whatever, whatever number, okay, so you will get a curve which is shifted to the left, okay, okay and like these guys are saying, this is for the n equals 3 repetition code and this might be something around 10 Db let us say, 10 or 9 Db, okay, whatever, okay, so this part you can claim is coding gain, okay, but remember this is not coding gain, one can, one can, I will come back and define coding gain, okay, so do not worry about it, you do not usually define coding gain with respect to SNR because there is a problem there, we will come back to it soon enough, but that difference you can construe as something you have gained, okay, one can think of it, okay, okay, so that is one utility for SNR, you can plot your probability of error versus SNR for the AWG case, BPSK over AWGN and it seems to have a good interpretation in practice for building communication systems as well as in theory for it trying to see how much you have gained by doing coding, but there is a deeper, deeper, what do I say, deeper insight, not insight, deeper truth to this using SNR, okay, for BPSK over AWGN, over AWGN, I am going to quickly state it, don't worry too much about understanding this at this point, maybe when you read more about communications and information theory and all that, it makes more sense to you, but another justification that one can think about is what is called capacity, okay, so this is a key idea, capacity and information theory and all these arguments is what justifies coding, why do you want to do coding, maybe you can do something else to improve your bit error rate, right, why do you want to repeat something, why do you want to add redundant information, okay, so all those questions you have to ask first, right, even before you entered this course, maybe you should have asked this course, maybe that's the question you should have asked in the first class, right, why do you want to do coding, maybe there is some other way of improving your bit error rate, okay, there could be, right, why do you want to do coding, why is coding supposed to be really good, all those arguments, capacity is the culminating argument for all that, I mean that's its justification for the whole thing, okay, so how does this work, okay, so I will try to write down a brief quick description, I want you to think about it and we will use some notions of capacity but just very briefly, okay, I won't go into great detail here, okay, suppose I have a channel, okay, capacity is supposed to be for channels, okay, I will start with the simplest possible channel that we saw, okay, so it's a binary symmetric channel with probability of transition p, okay, I have some input, I have a binary input, I would get a binary output, okay, so what capacity tells you is the maximum rate at which you can transmit across the binary symmetric channel or across any channel with arbitrarily low probability of error, okay, I have to be very careful when I define all these things, okay, so let me repeat that once again and then I will qualify each of those things, capacity tells you the maximum rate at which you can communicate across a noisy channel with arbitrarily low probability of error, okay, that's what it says, okay, but first of all, first question you have to ask is why arbitrarily low, why can't you say zero, okay, so ideally you want zero probability of error, right, but if you look at binary symmetric channel and if p is not zero, can you ever have zero probability of error? No, you won't have, okay, whatever you do, whatever you do, if you want to send even any non-zero amount of information, if you don't want to send any information, yeah, you can have zero probability of error, then you won't have any information to send, you'll achieve only zero rate, if you want to send anything, anything at all, okay, you will have a non-zero probability of error, that might be clear to you from just by looking at the probability of, okay, so you can never have zero probability of error, so the way to get around that is to allow an arbitrarily low probability of error, what do I mean by arbitrarily low probability of error? Given any epsilon greater than zero, okay, what is arbitrarily low, okay, so let me write down this, this is, this is the notion of arbitrarily low, okay, given any epsilon greater than zero, what do I mean by that, you give me any positive number, okay, don't give me 10, okay, so because I want to make probability of error less than that, okay, so think of the moment I write down epsilon, you know, you're always thinking of 0.1, 0.1, right, by training, you know, epsilon is a small number, right, you assume that, okay, so given any epsilon greater than zero, so I don't want to say i, okay, one can transmit at rate, okay, remember I have not told you what rate is, rate less than capacity with probability of error less than epsilon, okay, okay, so that's the, that's the connection between, that's the meaning of arbitrarily low probability of error when interpreted in terms of capacity, in fact capacity means something stronger, it means if you have rate greater than capacity, what can you not do? You cannot transmit at arbitrarily low probability of error, which means there will be a finite probability of error below which you cannot go if you are at rate greater than capacity, so capacity is the best you can do, okay, right, you can do it and you cannot do anything better than that, okay, so that's the operational meaning of capacity and this notion of arbitrarily low probability of error, okay, if you're really confused you don't want any epsilon in your notes, you can simply say 10 power minus 6, okay, just keep 10 power minus 6 as your epsilon, okay, capacity is the rate at which, the maximum rate at which I can transmit and still achieve 10 power minus 6 probability of error, okay, so theoretically the notion says given any epsilon I can transfer, so what do I mean by rate, okay, for the binary symmetric channel, what do I mean by rate, okay, rate is measured in terms of number of bits transmitted per channel use, okay, so that's the definition of rate, number of information bits per channel use, in one channel use how many bits can I put out into the channel, for the BSE one bit rate, right, I can put out one bit into the channel, okay, what do I mean by rate then, why is the operational meaning coming in, then if I actually have each of those bits as being information bits, if I keep sending one bit, one information bit every channel use, can I achieve arbitrarily low probability of error, no, not if you have P not being 0 or 1, P less than half is what we consider, so if P is not 0, you can never achieve arbitrarily low probability of error with rate 1, right, if you don't do, if you keep sending one information bit every time, you will get errors, this is nothing you can do to avoid, okay, and it will stop at some finite point also, that also we can show, okay, if I do coding, then I can get rates less than 1, right, if I take K information bits, excuse me, convert them into N channel bits, N bits which I want to transmit on the channel, I am sending K over N information bits per channel use, and that's my rate, okay, so that's the connection with rate, and I said one can transmit, what do I mean by one can transmit, okay, just loosely use that word transmit and everybody was happy, okay, but what do I mean, I have to tell you how to encode and how to decode or how to transmit, okay, how do I map my bits into channel bits, what do I do, it turns out coding is good enough for that, it turns out you can show there exists a code of sufficiently high block length, which will achieve arbitrarily low probability of error under some kind of decoding, let's say maximum likelihood decoding, okay, I don't care, there are even other types of decoding which will achieve that, okay, so that's where coding comes in, and that's why coding is so vital for a communication system, okay, let me rephrase my definition for capacity now. Capacity tells you as long as your rate is less than capacity, there exists a code, another correcting code, maybe not a linear code, some code, there exists a code, okay, according to which you can encode, transmit across a BSC, and then receive the received vector and decode according to that code by maximum likelihood decoding and achieve arbitrarily low probability of error, okay, that's why coding becomes so vital for communication, okay, that tells you how to transmit, okay, so let me rephrase and write that down very carefully, okay, given any epsilon greater than 0, there exists n, okay, so maybe let's say n is some function of epsilon, okay, it's okay, I don't care, n large enough, okay, no, let me be careful here, n for which, okay, so I'm struggling with the, okay, so let me be careful here, let me start with the code, apologize for this, there exists an nk code for sufficiently large n, okay, in fact this could be a function of epsilon, okay, in most cases it is, there exists an nk code for sufficiently large n with k by n approaching capacity, okay, so let me say approximately, I think I guess approaching is the best way of doing it, k by n approaching capacity with such that, sorry, such that probability of error, let's say under ML decoding, okay, so just to be very clear, ML decoding or say bitwise MAP decoding, whatever you want to choose, there are in fact even other suboptimal decoders which will work here, less than epsilon, okay, so think about what this means, okay, and even the second statement is also true, if k by n is greater than capacity, then there will be no code for which probability of error can be arbitrarily low, okay, so this is why coding is supposed to be very important and it's supposed to be the one that, the final tool for communication, okay, so hopefully this meaning is, the meaning is clear when I say sufficiently large, so what will happen, for instance if epsilon is, if you say epsilon is 10 power minus 3, maybe I can say there is a n equals let's say 100,000, okay, for which you will have a code which will achieve probability of error less than 10 power minus 3, if you want my epsilon to be lower say 10 power minus 15, then maybe my n will go up, okay, but that doesn't, that doesn't scare me too much, but I know there is a code as long as my k by n is less than capacity, but if my k by n is greater than capacity, I have no hope of achieving arbitrarily low probability of error, okay, so what is this capacity, can it be calculated, okay, it can be calculated very easily and for the binary symmetric channel, in fact there is even a very nice closed form expression if you accept logarithm as one of these things, okay, so for the BSc capacity equals 1 plus p times log base 2p plus 1 minus p log base 2 1 minus p, okay, so you remember once again let me stress the meaning of this capacity, okay, so capacity is first of all defined in the context of achieving arbitrarily low probabilities of error, okay, given any epsilon greater than 0, given any probability of error, 10 power minus 3, 10 power minus 6, you should be able to achieve it, okay, capacity characterizes those kind of scenarios and what is capacity exactly, it's the maximal rate, okay, so you want to think of it in terms of rate, what is that rate number of information bits per channel use, so ultimately you might be able to convert it into a megabit per second or a kilo bit per second, okay, that is just, that just depends on the clock rate, okay, that's fine, that you might be able to convert, but in theoretical terms it's number of information bits per channel use, okay, so that's what capacity will also be and what it means is as long as you have k by n less than capacity, okay, there will be a code for sufficiently large n, okay, which will achieve probability of error less than epsilon, okay, for any epsilon that you get, but if k by n is greater than capacity, even if you go for infinite n, the probability of error will be finite, it will never become arbitrarily low, okay, so capacity is kind of a nice demarking point and can it be calculated, yes, using information theoretic tools you can easily calculate, not easily, well at least for the binary symmetric channel one can easily calculate and what is it a function of, function of p, okay, so it's a very simple function, in fact it will be between 0 and 1 and you can plot it, if you plot it, 0 and half, capacity will be 1 for 0 and for half it will become 0, okay, if you put p equals half you will see it will become 0, okay and it will be a function which goes somewhat like this, okay, it's of course not a linear function, but the point, halfway point is achieved by something like 0.11, okay, it's a good point to keep in mind, so the capacity falls down to half by the time you go for p from 0 to 0.11, after that it slowly decreases to, okay, the fall is steep at the beginning, that kind of a function it is, so suppose I have a channel, binary symmetric channel with p equals, let's say 0.11 in this case, okay, what is the maximum rate at which I can transmit? Half, so what kind of codes should I be looking at for that channel? 1000 comma 500, maybe 10,000 comma 5000, okay, no point in looking at anything more than that, okay, but maybe less than that is okay, if in case you can't have a very practical algorithm for decoding all that, maybe less than that is okay, but you can never go to capacity, rate greater than capacity, because in that case you cannot get low probabilities of error, okay, so that's the operational meaning of capacity. So let's see what about BPSK over AWGN, are there codes, the question is are there codes which achieve all the points on this graph, okay, which can achieve capacity? For BSC I think you can get close, but you can't get very, very close, okay, so there are some results, strange results saying the LDPC codes, which are currently one would say the most powerful codes that one can think of, do not take you arbitrarily close to the capacity, there is some result, I think it's not very well understood, but one needs more work for the binary symmetric, okay, so let's move on to the other channel which is interesting to us, which is the BPSK over AWGN channel, okay, so in this case also there is capacity, okay, for any channel there is capacity, there's no problem, but the capacity is more difficult to calculate, okay, difficult in the sense that there is no close form expression, you need a numerical integration routine, that's all, numerical integration is not that hard and it's a very well behaved function, so it's not that difficult to do it, one can do it, the most interesting and surprising thing is capacity, can you imagine what will it be a function of, function of SNR, which is 1 by sigma square, okay, in fact it is a monotonically, whatever function do you expect it to be, monotonically increasing function of SNR, as you have higher and higher SNRs, you can get higher and higher capacity, okay, so all this you can prove, if you do an information theory class, you can write down the expression for capacity of BPSK over AWGN and you can prove that it's a monotonically increasing function of SNR, okay, so that's a nice thing to know and there's a plot here, which I hope I'll be able to pull up of this capacity, why doesn't it show my, okay, so this is a plot from a paper by Forney and Unger Boyk that appeared in the information theory transactions, it shows capacity for a lot of things, but I want you to, so basically it shows capacity for MPAM, okay, what is MPAM, MPAM, not just BPSK, which is 2PAM, 4PAM, 8PAM, 16PAM, it shows capacity of all those things as a function of signal to noise ratio, one can show for all those things also, even for MPAM capacity is purely a function of the signal to noise ratio, okay, so it shows a plot of all that, so you see the 2PAM plot, can you see it, it's clear enough, right, it starts at 0 and saturates at 1, does it make sense that the 2PAM capacity should maximize at 1, yeah, you can't do more than 1, right, per channel use, the maximum we can send is 1 bit, okay, it's binary, for 4PAM you'll see it will saturate at 2, 8PAM it will saturate at, all those things, okay, I'm sorry, okay, so I'll tell you this, there's also something called Gaussian inputs, okay, suppose you have an AWGN channel without restricting the constellation, I don't care about what constellation you use, use whatever constellation you want, then it turns out the ideal thing to do is use Gaussian input, your input should have a Gaussian distribution, that also you can prove, you can prove using information theory, okay, and if you do Gaussian inputs you can show the capacity is a very nice closed form expression, half log 1 plus SNR, okay, it's a very simple and nice expression and that is the plot that is shown as a dotted line there, okay, so if you do any course in information theory, you'll see how to generate this plot basically, okay, maybe one can interpret, one can define a course in information theory as a simple method, basically a course to tell you how to generate this plot, okay, maybe this is what you do at the end of the course, no, you do so many other things, but this is one of the things you would learn in a course in information theory, okay, but anyway, let's forget about all that, let's just look at the capacity for BPSK, there's also an interesting point there which says probability of error is 10 power minus 6, that is for the uncoded case, suppose I do BPSK without any coding, at what SNR will I achieve, 10 power minus 6 probability of error, that is that point, you see that point is roughly around 13 dB or so, okay, around 13 dB you'll get 10 power minus 6 probability of error, uncoded transmission rate 1, okay, rate 1 uncoded transmission you'll get 13 dB, 10 power minus 6 probability of error. Now, if I say I can live with rate half, okay, look at the plot and tell me what's the SNR I need, okay, it's somewhere here, right, okay, where is it, it's somewhere there, it's about 0 dB, okay, where is 13 dB and where is 0 dB, okay, so maybe if you're not used to these dB numbers, you can, if you can talk to maybe your friends or maybe your faculty profs who are in circuits and talk to them about building power amplifiers and tell them I need something which is sensitive, which can boost up my signal by 13 dB or which need not boost up at all, what do you, how do you think they'll respond, 13 dB is such a huge number, it's probably lakhs and crores of rupees, that's what it means, okay, building power amplifiers with such huge gains, it's not the easiest thing in the world to do, okay, and look at the difference, without any coding and with coding, there is about a 13 dB difference, okay, so but you might say there are, but however I get rate 1 there and you get only rate 1 by 2 here, okay, in fact it's possible to normalize with respect to rate and still get a coding gain of about 10 dB, close to 10 dB, let's say 9 and a half dB, okay, I'll tell you how to do that normalization shortly, okay, it's possible to look at both these rate half and rate 1 and normalize it so that you spend the same energy, this is this, this is a simple way of doing it, we'll do that, but still this is, this shows you how important coding is for practical communication systems, if you do coding you can save cost by saving the power of the signal that is required, and that's the huge savings in almost all communication systems, okay, any questions on this plot, something that's worrying you or any number that you want to explain, okay, so, so why did I bring this up, there are several reasons for bringing this capacity up, I mean maybe you don't need to see it, but it turns out all these new codes that we are saying LDPC codes and all that can operate very, very close to capacity, in fact for rate half people have found, okay, I won't believe it, people have found rate half LDPC codes that are within 0.0045 dB of capacity, what do I mean by that, so you see from this plot it's roughly about 0 dB, but it's actually the actual number is 0.2 dB, around 0.2 dB, okay, for rate half, the signal to noise ratio you need for rate half is roughly around, I'm quite sure, yeah, it's around 0.2 dB, okay, people have found rate half LDPC codes which can give you 10 power minus 6 bit error rate or block error rate or bit error rate with only a 0.0045 dB penalty, okay, which means it'll give you a 10 power minus 6 bit error rate at what, 0.2045 dB, okay, right, so you can't get much closer to capacities in there, okay, so you can get very, very close to capacity with LDPC codes and that's very non-trivial and that's the power of these new methods, okay, that's why it's important to see this capacity curve and know that the whole point of these modern codes is to get too close to capacity, okay, so now I want to mention briefly about this rate normalization, so how do you compare an uncoded system which is transmitting at rate 1 and a coded system which is transmitting at rate half, there seems to be a penalty there which I'm not accounting for in SNR, right, SNR doesn't account for the rate, whether you do rate half or rate 1 by 10 or rate 1 by 100, SNR is still 1 by sigma square, but that's a little bit unfair, okay, so we'll see how to normalize that and that's done by using this notion of EB over N0, okay, that's what we will do, there's another way of doing it but we will see only EB over N0, okay, so basically the way to think of EB over N0 is this is basically normalized, okay, rate normalized SNR, okay, so that's the way to think about EB over N0, so the first thing is noise, okay, so far we've been thinking of noise as noise, the only thing that matters is sigma square, right, so there's a very simple relationship between N0 and sigma square, okay, so N0 is sigma square by 2, okay, so there's no reason to go to N0, you might have lived with EB over sigma square itself, okay, so N0 and sigma square have no real difference, okay, it's only divided by 2, okay, in fact there are some papers which still plot over EB over sigma square, but just that the convention is everybody uses EB over N0 and you also shift to EB over N0, okay, it doesn't change much, this is not really a normalization, simple scaling, okay, the change comes in EB, okay, EB is 1 by R, okay, so what is this 1, this 1 is energy of BPSK, okay, so how did I get that 1, so I'm assuming my constellation is what plus 1 and minus 1, so what's the energy I would be doing, plus 1 with probability half and minus 1 squared which is again plus 1 with probability half, so I got my energy to be 1, okay, the in general that the right way of writing it is ES by R, okay, where ES is the energy per channel symbol, energy per modulated symbol, average energy per modulated symbol, okay, for instance instead of plus 1 and minus 1, if I make it plus A and minus A, what would ES be, A squared, it's a simple definition like that, but since we have already already fixed it at plus 1, minus 1, I can simply define it as 1 over R and you see EB is normalized by rate, okay, what is this EB, this is energy per information bit, okay, am I right, okay, what is this R, R is K over N, okay, if I divide by R, will I get energy per information bit, yeah, I will know, so if I divide by R, what happens, EB becomes ES times N divided by K, okay, to send K information bits, I'm using N modulated symbols of total energy, total average energy N times ES, so the average energy per information bit, well it's all average, okay, remember it's all average, I won't mention that but it's average clearly, okay, so the total energy that I'm expending to send K information bits is actually N times ES divided by K, okay, right, so that is EB, okay, if I compare a coded system and an uncoded system with respect to EB over N0, it is a fair comparison, okay, if I compare with SNR it is not a fair comparison because I'm not accounting for energy per information bit, as long as I account for energy per information bit, I'm doing a fair comparison, okay, so whether I do rate half or rate one, as long as I account for energy I'm spending per information bit, it's a fair comparison, okay, there's also another way of thinking about it, you might say just because I'm doing rate half, I will have to transmit slower, right, if my channel clock is fixed, if I do rate half, I can only achieve half of that but there's another way to think about it, you can actually double your channel clock and use a rate half code, right, do you understand what I mean, double your channel clock and use a rate half code, what will that mean in practice in an AWS GN system when you double your channel clock, what will happen, bandwidth is doubling, so what will happen at the receiver, more noise will be lit in, so your signal to noise ratio will increase, in what proportion will it increase, exactly by R, signal to noise ratio will decrease, I'm sorry, it will decrease by R, so this dividing by R takes care of two things, one is this energy per bit at the transmitting end, another way to think about it is if you keep energy per bit as the same but you double your clock rate, okay, then you would get more noise in and it would represent that noise also, okay, so even those are two ways of thinking about this EB but this normalization makes it fair, okay, it makes the comparison between a less than one rate and rate one or any two rates, fair, it's a fair comparison because you're doing a normalized thing, okay, so even if you didn't understand it too much, if you're doing a digital communication course, think more about this, what it means, various ways of getting less than rate less than one to be the same as rate one, okay, it's possible to do that, okay, solve these things up, okay, okay, so now let's go back first and try to do the probability of error versus EB over N0 in dB for the repetition code, okay, so before we go there, let's see what's EB over N0 finally after doing all these things, you'll see it will be 1 by 2R sigma square, okay, EB is 1 over R, N0 is, oh did I do, make a mistake, oh N0 by 2 is sigma square, I'm sorry, oh I'm sorry, I'm sorry, I'm sorry, I'm sorry, I'm sorry, apologize for that, I'm sorry, okay, so this is how it works, okay, so in dB, usually it's again coded in dB, EB over N0 in dB would be what, 10 times log 10 of 1 over 2R sigma square, okay, so now I want you to plot probability of error versus EB over N0 in dB, for first the uncoded system, let's do the uncoded first, for uncoded what is R, 1, okay, so 1 by 2 sigma square as opposed to 1 by sigma square, okay, so it will move by a factor of 2, 2 is what roughly in dB, yeah, so roughly around 3 dB, okay, so you got 10 power minus 6 at around 13, 13.5 dB, so 2 if you move by 2, you'll get 10 power minus 6 at 10.5 dB or so, okay, so that's how it will work, okay, so around 10, 10, 10.5 dB or so you'll get 10 power minus 6, okay, this is for uncoded, okay, on top of this I want you to plot probability of error for the repetition code, let's say the 3 comma 1 repetition versus EB over N0, remember for the repetition code what is R, 1 by 3, okay, so you have to account for that, okay, just to remind you how this works, okay, the channel is the same, it's the same BPSK over AWGN, same noise variance, but if you change the rate what will change, Sennar doesn't change but what changes, EB over N0 changes, if you do uncoded you have 1 EB over N0, for the same sigma, if you do a rate 1 by 3 transmission, your EB over N0 is different, okay, remember that for the same sigma it is different, okay, that's how it works, so I want you to plot probability of errors for the other case, so probability of error for the uncoded case, probability of error equals Q of root Sennar, but Sennar is what in this case, Sennar is 2 times EB over N0, right, root of 2 EB over N0, okay, what is it for the repetition code, let's say the 3 comma 1 repetition code, what is probability of error, this Q of root 3 times Sennar but if you go to EB over N0 what happens, you simply get 2 EB over N0, right, do you agree, so what will happen if you plot probability of error is EB over N0 for the repetition code, you will get the exact same plot as the uncoded case, so if you account for the rate and then ask for the coding gain, what happens when you do a repetition code, let's 0 coding, in fact you can do it even for the N comma 1 repetition code, for the N comma 1 repetition code probability of error will be the same Q of root 2 EB over N0, it will not be in any way different, okay, so maybe I will do another color here, no, you can do another color I know, you can do other colors, let's do this, okay, exactly on top this other guy will also lie, okay, so think about what this means, okay, so there are a lot of things that are going on here if you are seeing this EB over N0 for the first time in your life you will be a little bit confused about this, okay, so what does it mean when I say the same sigma just because I changed rate there is different EB over N0, okay, so what does it mean just because I went for a, went to a lesser rate to be fair I should be able to tolerate a higher noise variance, right, I went for a lesser rate which means my clock speed on the channel has gone up which means I am letting more noise in, my noise variance is automatically going up, so I should be able to tolerate that, okay, and it turns out the repetition code is not doing any better, it is only tolerating it enough to make it equal to the uncoded case, okay, it is not doing any better, so the repetition code in terms of EB over N0 gives no coding gain, okay, so you will see people typically do not use repetition code in practice, okay, so no coding gain for repetition, so it turns out obviously there are codes which will give you coding gain, okay, so obviously it is there and we will see some examples first, I will show you this plot and show you how some codes behave on this plot and then we will maybe go back and look at these other codes over there, of course there are codes which will give you coding gain and of course there are codes which can give you coding gain, the maximum possible coding gain which is guaranteed by capacity, okay, so all these things we will see, we will see in the next class, I think I am almost done, I will stop here, this is a good point to stop and we will pick up from here, think more about this repetition code in EB over N0 and convince yourself that you have understood this plot very correctly, okay.