 So, it is good to check for this audio thing, just to make sure this thing this bar, once I recorded without the audio, just to make sure it is there. Okay, so this is lecture 25, am I right, it is 25, okay, so the last thing we were seeing was this notion of irregular LDPC codes or more general LDPC codes and I defined the whole bunch of parameters to characterize them, okay, the block length is important but as I said, usually since these codes are sparse, you can conveniently tend the block length to a large number and do it for some suitable block length. So, we do not worry about block length as much, but we worry about these weight distributions, okay, there were two ways of defining the weight distribution, one was with respect to the nodes and other one was with respect to the edges, okay, the node perspective and the edge perspective degree distributions, okay. So, I had notation, I think L of x and R of x was where my, these two were my node perspective degree distributions and lambda of x and rho of x were edge perspective degree distributions, okay, and there were ways of converting from one to the other, okay, I left all of that as exercise, hopefully all of you went back and figured it out, it's just basic simple counting, but I will urge you to make sure you can do that, okay, because if you don't, if you don't understand that very basic thing then when I do something else later it will be confusing, okay, and remember the convention, the way we wrote down L of x was what? We said it would be sigma i equals, so let's say 1 to some dL, okay, yeah, Li x to the power i, this is how I wrote my L of x, okay, and I wrote my R of x similarly as summation say, I think I used j for these things, right, so 1 to dr, okay, let's say I don't know, I used rj, did we use capital L or small l, capital L, right, so capital L and capital R, capital R x power j, okay, and so the important parameter is the rate of the code, right, and the rate of the code we saw is just a function of Li and rj, okay, and there was this nice simple way of writing it, well the design rate at least, okay, maybe not the actual rate, maybe it's very close to the actual rate, but the design rate worked out as what? Yeah, 1 minus l prime 1 divided by r prime 1, okay, so that was a convenient way of writing down what the design rate would be, okay, if you want you can write it purely in terms of Li's and the rj's, okay, so then you would get sigma over i, sigma over j, i times Li, then for r prime of 1 you would get j times rj, this is a simple calculation, okay, and the way we wrote down lambda of x and rho of x, how did that work out, okay, lambda i, I wrote x power i minus 1, okay, so there's some convenience factor there, similarly rho x is summation j equals 1 to dr rho j x power j minus 1, okay, I believe there's a relationship between lambda of x and l of x, okay, I don't know if enough of you realize that, what will be that relationship, okay, one can show I think, I'm quite sure that this is correct, but you might want to check this, l prime x divided by l prime 1, am I right, okay, so this is a very convenient nice short way of writing down the relationship between lambda of x and lx, right, so if you have l of x, you can find lambda of x kind of in one shot with this formula, okay, so if you find the other formulas complicated to remember, this is one thing to remember, okay, similarly rho x will be what, r prime x divided by r prime 1, okay, so you can also go back actually, okay, so you can do an integral from 0 to 1 and you will get back, get your l of x from lambda of x, right, so it has to work out the same way, right, this will work out in that fashion, make sure you understand all these simple conversions, this basic enumeration and counting, but one needs to pay some attention, okay, so when you write programs, these are the things that will go wrong, okay, we will see that, we will miss out on these simple things and there will be all kinds of confusion, okay, so for instance one very simple confusion is li's are usually fractions, right, maybe you have that fraction to 10 decimal places, okay, but your block length is only like 4000, right, now if you want to find number of bit nodes of a particular degree, you can never find exactly that many, right, it will only be close to that, okay, so typically in practice when you construct you are willing to live with the closest possible approximation, you do not have to hit this li's and ri's exactly, up to some error is okay, the performance will not change, performance of the code will not change, it is not too sensitive to the actual values, okay, but it should be close enough, you cannot just pick some other value and say it will give you the same performance, okay, so this part hopefully is clear, hopefully you have thought enough about it, okay, and how to characterize these things, okay, so the next thing I am going to do is I am going to take Gallagher A, okay, hopefully, again remember Gallagher A, we did the analysis for regular codes, I am going to do an extension of the analysis, I will point out the places where you love to do some extra work and then I am going to work out how that is done, basically how do you find threshold for irregular LDPC codes under Gallagher A decoding, so that is the next thing that we are going to see, okay, so how do you characterize thresholds, first of all, is density evolution possible, all those things is what we will look at next, okay, any questions on these things, the lambda i's and rho j's and anything you are worried about in actual construction, some of you might take that as your project, right, so you will have to know, it is a good time to bring it up so that others also realize, okay, it is fine, all right, so let us look at Gallagher A on irregular codes, okay, so there are quite a few problems, okay, so first of all, I will talk about a non-problem first, the first thing is can we have Gallagher A on irregular code, can you have an irregular tanner graph and have Gallagher A decoding algorithm, go back and think about what you had to do, what about the first iteration step A, can you do that, yeah, because a node has different degrees, no different nodes have different degrees, you do not have a problem, right, you would send as many as you want, right, that is fine, and can you do step B of iteration 1, can the check nodes reply back to the bit nodes, that is possible, right, and can you do iteration 2 step A, right, you have to modify depending on the number of the degree of each node, different nodes will do different things, but you can do it, right, if all of them agree, then send that value, if there is any disagreement, then send the value that was received out of the channel, okay, so you modify it slightly, you can run Gallagher A for irregular graphs, okay, it is very clear, you can go back and check on it and you can write a program for doing Gallagher A for irregular codes, it is not a problem, so doing Gallagher A is not a problem, okay, the problem will come next, okay, so you remember the way we analyzed it, okay, so you remember how we analyzed it, right, I said, I asked a question, first we said all 0 code word assumption, right, so all 0 code word assumption also is going to be okay, okay, so let me write down what are the things that are okay, okay, so for instance, the running of the algorithm is okay, there is no problem here, it requires some minor modifications, but you can go back and look at it and modify it, there is no problem there, okay, what about the all 0 code word assumption, you remember in our analysis, we use the all 0 code word assumption, right, and well it is also true, okay, so if at all it is true for regular codes, it should be true for irregular codes also, right, and I do not want to spend too much time again, once again doing that, in fact, I did not do it, I did not do a full-fledged proof the last time also, okay, I just gave you a rough argument why it should be true, the same arguments will hold here, because the basic reason is the channel is not changing, it is still symmetric, whether it is 0 or 1, the channel is doing the exact same things, whether you have a regular code or irregular code, channel does not care, right, so it is still a binary symmetric channel, so for the same reason, it will work, okay, the all 0 code word assumption is also okay, okay, the thing that will differ significantly is neighborhoods, okay, the neighborhoods of nodes will change, if you have a particular irregular tannograph, the neighborhood of one bit node need not look the same as the neighborhood of any other bit node, okay, it will depend on that degree, right, starting with the first stage itself it can differ, right, the bit node can have a different degree which means the number of immediate neighbors itself is different, okay, right, so within one tannograph, one irregular tannograph, you cannot expect all the bits to behave in the same way, right, so the error rates for each of the bits will not necessarily be the same as you keep doing your iterations, okay, okay, go back and think about it, right, so in the regular case, we did, we did not have any problem with that, we did not bother with that, right, I did not even explain all that very closely, I just said we can look at any one bit node, it is the same, it is the same as any other bit node, I do not have to worry about it, okay, so that will be one problem, okay, so one significant problem will be different neighborhoods for different bit nodes, okay, so this is a problem and one needs to worry about it a lot because I mean there could be various ways of handling it and how you handle it will change the ease of analysis, it will change the results that you can get out of it, okay, one might say for instance, the best way, the ideal way, the proper way of doing the analysis is to fix one tannograph, list out all the different neighborhoods, right, and then do the analysis, okay, so one assumption that will still work is what, okay, let me, I forgot about one more assumption that will work, the neighborhoods will be cycle-free, okay, though they are different, you can say as n tends to infinity, neighborhoods will be cycle-free with high probability that you can still say, okay, so let me say that, okay, neighborhoods will be cycle-free or will have no repetition with high probability, okay, I can still say that, okay, again why that comes about is just because the graph is parsed, okay, there is no other reason for that other than that, okay, so the regularity does not really play into the cycle-free nature, okay, that is not dependent on that, so that will also still work, so pretty much the only problem is the fact that different bits have different neighborhoods, there is no other problem, everything else will nicely carry over, okay, so the only trick you need to be able to analyze Gallagher A on irregular LDPC matrices or irregular tanner graphs is how do you deal with these different neighborhoods, different neighborhoods for different bits, okay, so like I said one might demand that the only way of doing it is for take a particular graph, fix a block length, take a particular graph, enumerate all the possible neighborhoods, okay, right, enumerate all possible neighborhoods and in fact even that is difficult to do, right, I am not defining one code for you, I am defining only a an ensemble of codes and depending on which code you pick your neighborhood will change and you can never make any statement about the performance of the code purely based on degree distribution, okay, so the whole thing gets complicated and at the end of the day the only tool you will be left with is simulations, okay, so you generate a code with a particular degree distribution, you simulate, see how well it performs, then change your degree distribution, then generate another code, again simulate, you can do it, one can do it today, it is not too unthinkable, okay, but it is just impossible to really optimize quickly and get good thresholds, you remember what our problem was, we already had a 0.04 threshold while the optimal was 0.11, okay, you want to get towards that, okay and you have no way of knowing beforehand what is the change you have to make, right, so unless you have a nice tool with which you can analyze, there will be, you will lose a lot of, well you can't really do anything, yeah, so there is a way to deal with these different neighborhoods, okay, so this is very common to a lot of arguments in coding and information theory, in fact I think it is common to many places where there is a common, some combinatorial argument involved, okay, so the idea is to average, okay, so I cannot calculate anything for one particular graph, okay, for one particular graph I am stuck, because there are different neighborhoods and I don't know what to do, well the neighborhoods are tree like, they are cycle free, I know that, but still the neighborhoods are different, right, there are different degrees and I cannot calculate anything for a specific case and the specific case may not really mean much, so what you do is you average over all possible neighborhoods, okay, so you find probability of error averaged over all possible neighborhoods, so what does it mean, you take one neighborhood, find its probability of error, take another neighborhood, find its probability of error, another neighborhood, find its probability of error, do everything and then average, if you do that way even then the average is tough to compute, right, so the idea is you can compute the average without doing the individual case, you can use a probabilistic argument very smartly and compute the average directly without doing the individual probability of error calculations, if you can manage to do that, well you have done something, but maybe that is not really the problem that you want to solve, okay, so that is the first idea, so what is idea one, to deal with this, idea one is average over all neighborhoods, okay average what, average probability of error, okay, so this is what we are trying to calculate, over all neighborhoods, so you try to find the average probability of error averaged over all neighborhoods and everything that you can receive, see so far we have been averaging only over all possible channel output, okay, your input, the code word was all zero, that was always fixed and what did we average over when we found probability of error, all possible channel outputs, that is what we were averaging over, okay, so now I am saying that not only that you have to also average over all possible neighborhoods, okay, first of all is the first statement clear, the fact that we were averaging over all the output code, the channel outputs, okay, we were actually averaging but we did it smartly, right, how did we do that averaging, we said each bit can be in error with probability P and we use that and use some independence assumption in the middle and the independence assumption is crucial in making your average calculation easy, okay, so once you did that you were able to do the do the average calculation without really enumerating each possible output code word, right, so go back and notice how the independence assumption was crucial, if you did not make it you cannot do that, okay, so once you do that, once you have some independence holding, okay, you can do this average calculation without enumerating each and every independent case, each and every case you do not have to enumerate, okay, so that is the crucial thing, so you average over all neighborhoods without finding the individual probabilities of error, okay, so once you can do that using your independence assumptions, at least one part of it will be solved, you can find something, okay, the next step is to show that this something is meaningful, that this average probability of error is actually meaningful, so for that the idea too is to show what is called a concentration result, okay, so this is a standard trick in a lot of combinatorial problems if you are used to, if you have seen a lot of problems including in fact channel coding, okay, so the whole area is based on these kind of arguments, so you average over several realizations and then you show a concentration argument or an existence, okay, so in several cases concentration will be tough, existence will be easy, okay, so you can show, once you find an average it is like saying suppose I say the average mark in a class is 50, which means there is at least one guy who got greater than or equal to 50 and there is one guy who got less than or equal to 50, so finding the average is meaningful to some extent, it shows you existence on either side, okay, but concentration is stronger, what does concentration mean, every single instance is actually very very close to the average, it is like saying everybody is a genius in class, everybody got 100 out of 100, right, so it is concentrated tightly around the average, okay, typically it never happens in class, right, so you will never get your average marks to concentrate, okay, but here you can show that result, okay, you can show that in this problem your average probability of error is a very good indicator of a random instance because every random instance is going to concentrate tightly around your average, okay, so it is like think of a Gaussian PDF, right, so your sigma is tending to zero, okay, so everything is at the mean, okay, so everything is at the mean, okay, all the tail probabilities are going to vanish almost zero, anything outside of the mean plus or minus sigma, okay, so if you can show these two then you have some consolation, okay, well you have not really solved the exact problem, you have not found the probability of error for a particular case or you cannot even make an argument like that, but I know that I can find for the average case based on some independence assumptions and then I will show a concentration result which justifies my argument, okay, so I have not really done anything concreted, I am just giving you a flavor for what is what is going to come, okay, so that is what we will do, so what I will do in this course is I will expand on idea one a little bit, I will actually show you how to do the computation, okay, little bit, hopefully I think we will derive the exact density evolution equations for Gallagher A at least, okay, so we will do that, idea two I will just outline briefly, okay, so I will not do the whole thing in great detail and if you are interested I will urge you to learn more about martingales, okay, the concentration result comes because of martingales, okay, so you have to study a specific type of construction called Dub Martingale which and then it is used, okay, you can look at modern coding theory, the book I talked about, they explain this in great detail, okay, it is not difficult, okay, once you see martingale you immediately think it is some fancy thing, it is not very difficult but it requires some background which we will not see in this class but I will encourage you to look at the proof, okay, it is not a very difficult proof to understand, okay, so let us go to the finding the average performance overall neighborhood, okay, so I will call it density evolution for the irregular case, okay, the regular case we know how to do this and we will use the regular case a little bit, okay, we will see, we will borrow from there and we will also make those independence assumptions which are justified by the cycle-free nature, okay, those things are all justified, we will do all of that and compute the average case, okay, all right, so okay, so all right, so let us start at the very beginning, so iteration one step A, okay, I will do it slightly differently, I mean at least for the first few things I will do slowly and then I will do the general result reasonably here, so what is this iteration one step A, okay, you receive something from the channel, from the channel and each bit node is simply sending exactly what it received on every edge, so what is the probability that you will, a particular edge will have an erroneous message, P, okay, right, so that is very clear, in the first iteration you can even find the probability of error for a particular edge, you do not have to average, you do not have to do anything, okay, right, is that clear, so for the first step there is no confusion, okay, but even if you average what will happen, if I average over all possible instances you will still get P because it is all constant, right, so there is real tight full complete concentration, everything is P, there is no problem, okay, so that is fine, so I will call that P0, okay, P0 is P, okay, that is fine, now for step B you need some work, you will see immediately you will have to average, if you do not average you cannot say much, but before that let me go slowly, okay, okay, so what is step B, I have a check node, say the degree of the check node is J, okay, start with that check node of degree J, I have to say that now, right, for the regular case what would it be always, WR, okay, there is no problem, I can take any old check node, there is no problem, now I have to tell you which check node I am taking, okay, so I am going to do that, okay, suppose I knew that the edge, the edge I am looking at, okay, so, so again, once again, remember what should I do now, okay, I am trying to find Q0, right, this is what I am trying to find, what is definition for Q0 now, probability that a message from check node to bit node is an error, okay, but that will depend on which edge I choose, right, so now I am going to say I am going to choose some edge which I know is connected to a check node of degree J on the right hand side, okay, so given that my edge is connected to a check node of degree J on the right hand side, I will try to find Q0, okay, this is not Q0 exactly, Q0 given that check node edge is connected to a check node of degree J on the right hand side, okay, that is what I am going to compute first and then I will average over all possible check nodes and using the probability to get my average, okay, that is what I am going to do, so what does this mean, how do I do this, it would have had J-1 incoming messages in the step A iteration, okay, and based on my independence assumption what do I know, all these gates will be independent, right, and they are independent and they are all equally likely to be in error with probability P, okay, fine, so step A was just P, there was no problem, okay, so that I know, is that clear, okay, so I will do the IID assumption, each of these gates are independent and they are equally likely to be in error with probability P, okay, so if I do that what will be the probability that this is in error, yeah exactly, so it will be same as before, right, 1-1-2P raised to the power J-1 divided by 2, okay, so basically probability that a odd number of messages out of this J-1 are in error, you write it out in a complicated way, you can also equivalent it like write it in this form, okay, it works out in that fashion, there is no problem, okay, now this is not enough for me, see I have to find average performance over, averaged over all possible tannographs and all possible received values, I am averaging only over all possible received bits for a particular edge that is connected to a check node of degree J, okay, so now I have to pick a random edge, okay, and allow my tannograph to be all possible, all possible tannographs, over all possible tannographs, I am going to ask a question, okay, so the next question I am going to ask is crucial, I am going to say suppose I select an edge, okay, then vary my tannograph over all possible tannographs, what is the probability that my edge would be connected to a check node of degree J, okay, is the question reasonably clear to you, so one way of doing it is you fix an edge, you enumerate all the tannographs which have various check nodes connected to the edge, various bit nodes connected to the edge, count all of those cases where the check node connected to it had degree J, divide by the total number of tannographs, you will get one number, okay, that is the number I am trying to find out, okay, so I fixed an edge, then I am trying to find out over all possible tannographs what is going to happen, okay, so there is another way of quickly doing that, okay, so you can say based on the way I am constructing my sockets, based on my socket construction, I can quickly answer that question, okay, I can say what can I say, how many edges are there that are connected to degree J check nodes, okay, there will be so many edges, right, I can divide that by the total number of edges, that will also give me the same number, okay, well all these things are slightly hand waving arguments but hopefully you have done enough problems and counting problems to convince yourself, okay, so I have a given number of edges, right, I know there are so many edges that are connected to degree J check nodes, okay, if I divide that number by the total number of edges, I should get the probability that a randomly chosen edge from this tannograph is connected to a check node of degree J on the right, okay, and then even if I average over all possible graphs, I will get the same number, okay, think about it a little bit, this is the easiest way of convincing yourself, you have so many edges connected to degree J check nodes, okay, divide that number by the total number of edges, that will be the probability that a randomly chosen edge from a particular tannograph is connected to a check node of degree J, when you vary over all possible graphs, you will get the same number again, okay, so what is that number? The number of edges connected to check nodes of degree J on the right divided by total number of edges, that is rho J, okay, so you see probability that a random edge is connected to a check node of degree J equals rho J, okay, in a particular graph, okay, so you have to extrapolate from here a little bit and say if I fix an edge, okay, the fraction of graphs on which I will have a check node of degree J connecting on the right side will also be rho J, okay, so it is not a very difficult computation to do from here, one can do that, okay, so once I know that that is my rho J, I can average, I can find the average probability of error over all my graphs or all my realizations, okay, so that is what I do and how will I do that? I will multiply rho J by this probability of error and add over all J, okay, so that is my average probability, okay, so my Q0 is going to be summation rho J times 1 minus 1 minus 2p raised to the power J minus 1 divided by 2 and J, well, 1 goes from 1 to dr, whatever my value is, all? Yeah, yeah, kind of doing some kind of thing like that, but I want to bring that distinction in a little bit so that people can think more about this averaging process, okay, so it is, I mean if you are convinced about this formula then you are happy, there is no problem, you can be convinced in various ways, but if you want to think about the averaging over all graphs, am I doing that correctly or am I just picking one edge averaging over all edges, am I doing all those things, you can think about it, but both of them are the same, you will get the same, you will get the same answer, okay, so that is Q0, okay, is that fine, okay, so if I pick some edge, okay, the probability that that edge is going to carry an erroneous message averaged over all graphs and all outputs is Q0 in step B of iteration, okay, so now, now I will jump to an, I will jump to iteration L, okay, iteration L step A, I will assume until then everything has been done, which means what, I already have Q, okay, so am I doing this correctly, should I do P1 Q1 because why did I put 0 here, so I think I should do 1, right, so this one, okay, it is okay, I mean, I do not know, did we do P0 or did we do 0 or 1? 0, we did 0 before, okay, fine, that is fine, I do not know, I do not like calling this Q0, I will call this Q1, okay, I call this Q1, okay, sorry about that, we call this Q1, okay, so I have finished up to iteration L, which means what is known to me, I know QL, no, QL minus 1, right, L minus 1 or L? L minus 1, okay, after I finish iteration 1, I know Q1, okay, so after I do iteration, when I am ready to do iteration L, I know QL minus 1, I am done with iteration L minus 1, okay, so I want to look at iteration L, QL minus 1 is known, okay, so what is known is probability that in any edge I would have an erroneous message, okay, so now you go back and look at the bit node and ask the same question and you will get the answer very easily, okay, suppose I say my bit node as degree i, okay, which means there are i minus 1 different messages that were received in iteration L minus 1 from, okay, from I know different check nodes and all these information is independent, okay, right, I will again make the IID assumption, okay, so what am I going to say, any one here, okay, so I will make the IID assumption here, what is the IID assumption going to be that any one particular edge is going to carry an erroneous message with probability QL and the fact that another edge carries an erroneous message is also, is independent of it and that probability is also QL, okay, so I am making the IID assumption, okay, that IID assumption needs to be justified and can be justified when you do the, when you look at the cycle free neighborhood, but I will make that IID assumption, okay, any two edges, the fact that they carry an erroneous message is independent or independent, those two events are independent and they have the same probability, okay, so this needs to be justified later, like before we will justify it with the cycle free assumption, but once you do that it is very easy to do the same computation again, okay, so I can easily find PL, how will I write PL, okay, so you know you are going to sum over 1 to dl lambda i, this you are going to do, okay and then given that you have a bit node of degree I connected to it, you redo your same formula as before, what is your same formula as before? 1 minus P times, I am sorry QL minus 1, QL minus 1 raise to the power i minus 1 plus P times 1 minus 1 minus QL minus 1 raise to the power i minus 1, yes. I am averaging over all the possible tannographs, so why will it not be mixed up, it will be properly nicely mixed up, it is no problem, so the averaging gets rid of all these confusions about should I worry about which bit check node can it be possibly connected to, should I worry about which, see that is the problem, right, you cannot do it for an individual case, because there are just too many possibilities, a bit node can be connected to so many check nodes and all of them can have the same degree, which means this is no real point, you know I mean the average does not, so averaging just gets rid of all those specific cases, I do not have to worry about the actual connection, I do not worry about anything, it will be nicely wonderfully mixed up, get rid of the whole, the fact that such a mixed up number has a meaning comes from the concentration result, if you did not have a concentration result then there is not much point, okay, so how will I do ql now, no or ql plus 1, I am sorry, ql now, is it okay, yeah ql, how will I do ql now, same thing as before, right, I already have that written down, so since I have a convenient, this should be pl minus 1 now, okay, so I have to do some minor changes here, I will do that, so I can do l here, then raise it to the power j minus, okay, so I think that is why I wanted to call this also p1, so maybe I will just say this is equal to p1 nodes, okay, so I will say p1 equals p, is that fine, okay, if you want to be very specific, okay, so that is my density evolution for irregular LDPC codes for Gallagher A, and I mean, I do not know how convinced you are about all these averaging arguments, go back and think about it and it is okay, I mean enough people have looked at it, it can be, it is true, okay, start with the assumption that it is true and then work yourself into convincing, convince yourself based on that, okay, so that is the best way of proceeding, okay, so I am going to try and give you an interpretation for these equations from the neighborhood point of view, so you remember for the regular case, we could go to a neighborhood of depth L, right, a particular neighborhood of depth L because all the neighborhoods look the same and given my IID assumption, I could say the lowest, no, the deepest edges, they have a probability of error p, the next one has q1, qp1 like that, right, we could build it up like that, okay, so a picture like that for irregular codes is tough to build first of all, why, you do not have one neighborhood, okay, so I have to average over for each, for each depth there will be an averaging, okay, so I have to keep on averaging, okay, so but that picture still holds, okay, so that picture is what justifies your independence IID assumptions, okay, so that picture is called actually the tree ensemble, okay, you can read the modern coding theory books, so there they have something called the tree ensemble, they define all possible trees of depth L and then they talk about averaging over those things, okay, and then each there is a degree distribution also involved at each step and it's a little bit confusing to see what that is but one can actually have a neighborhood picture also for irregular codes, yeah, that's important to know, all right, so that's the that's idea one, like I said idea two we will not spend, I don't know if I want to really spend any time on, so what it means is this average behavior is as good as what you would get for a random case, so if you generated a random enough LDPC matrix, right, that obeys a particular degree distribution, its performance and the average performance will be very very similar, okay, I agree, so strictly speaking what I should do is I should go to the tree ensemble, right, I should define the tree ensemble and then start from the deepest nodes and then give you an argument for how the next step would be, next step would be, next step would be and then give you PL as what happens at the root, okay, so that's the proper way of doing it, okay, so I just gave you a simple way of convincing yourself why that should be true even without going doing all that, okay, so that's the proper way of doing it and you can see modern coding theory in that book they have done it that way, okay, so it's possible to do it properly also, okay, but if I write down that neighborhood and then start giving you arguments, more people will be lost then, this is a simple enough argument to see and understand, okay, random edge probability rho j lambda j, it's easier to see than anything else, will you get the same expression if you look at the neighborhood, yeah, you'll get the exact same expression, okay, because again there you have to worry about an edge, you will have, you will have so many edges at a particular depth you have to worry about edges, okay, alright, so what has happened after you did all that, okay, so even if you do not understand the individual evolution steps, at the end of the day what has happened, you have a miraculous formula for PL as a function of what? PL-1, p, okay and what is it parameterized by? rho and lambda, okay, so this is your density evolution iteration, okay, some f it is but you know there is an f like this, okay, so then you go ahead and do the same analysis as before, what do you have to show now, you have to show some monotonicity properties, you have to say this will be a decreasing function of l of PL, PL-1 decreasing function of p, all those things you have to show, okay and then slowly you can find then what is the next thing we worried about, yeah what happens as l tends to infinity, okay, so you know it is bounded, right, the sequence PL is going to be bounded, so it will converge, the question is does it converge to 0 or does it converge to a non-zero finite value, okay, so you run this iteration several times and find the maximum p for which PL will converge to 0 and that will be my threshold, okay, so you can define a threshold, okay, so threshold p star is I have to strictly say supremum, okay, so one can replace it by maximum if you are not convinced and comfortable with the word supremum, supremum over all p such that PL tends to 0, alright, so that will be my threshold, okay, so again the ultimate way to convince yourself that the threshold is true is what, similar to what we did before, what should you do, I am sorry, simulate, so you take the same degree distribution, find its threshold based on your density evolution, you find this p star and then you increase your block length, right, do you see the behavior very close to the threshold, you remember what I showed you for the regular case, okay, you can do it for the irregular case also, you will see it will behave close to threshold, in fact it will behave close to threshold even if you have cycles of length 6 and all that, even if your neighborhoods are half cycles, okay, like I said that is still one of the unsolved problems in LDPC codes, why you said that these algorithms behave so well even when your assumptions in the analysis fail, okay, so there should be other analysis out there, so that is the answer to that question, okay, so that is all that is true for irregular codes as well, okay, but there is one advantage with irregular codes which we did not have for regular codes, we have these extra degrees of freedom in row and lambda, okay, so you can go full around with them, change them and optimize your threshold to get a suitable row and lambda, okay, so you can now do optimization over row and lambda to get the best possible threshold for a given rate or get the best possible rate for a given threshold, you can do either of those optimizations, okay, so that is what I mean good about irregular codes, okay, optimization over row and lambda is now possible, okay, previously we did not have that, well one can say previously you could have optimized over WR and WC for a particular rate, but for a particular rate that space is not varied enough, right, if you say rate half what are the WRs and WCs, 3, 6, 4, 8, 5, 10 and quickly you will see it will start to degenerate, but if you fix rate as half and look at all the possible rows, rows and lambdas based on my equation, you will see there are so many of them and there is bound to be something which will give you a better threshold than what it was, okay, so it is one of the another thing to remember whenever you do optimization, you want your space over which you are searching to be large, okay, only then you can hope for something good, if your space is only 10 or 20 there is no point in trying to optimize, okay, it has to be very big and it is good to have a variation in your landscape, so to speak, okay, so how is this optimization done, I am going to briefly outline again, not in great detail, I will briefly outline how the optimization is done and you will see how it works, okay, the first thing is look at the design rate, okay, let me write it down, R I wrote as 1 minus, okay, let me write it down explicitly in terms of row j's and lambda j's, row j divided by j divided by summation lambda i divided by i, okay, so there is a very famous conjecture in LDPC codes that write regular graphs give you capacity achieving thresholds, okay, I am sorry, yeah, it is a conjecture, nobody has proven this, right regular in the sense, almost right regular, so you remember the example I gave, I asked you to assume that the right degrees were only W and W plus 1, okay, and I asked you to find something, right, you remember, okay, so that is one of the assumptions made here, first assumptions that will be made to simplify your search space, okay, you want your search space to be large but you do not want it to be too large also, you will be totally lost, okay, so to curtail that a little bit, people usually make a right regular assumption, well not really regular but right close to regular assumption, yeah, almost, so you say row W and row W plus 1 are non-zero, okay, and row j equals 0 for all j not equal to W, okay, so you say that first, so that simplifies your, so now you have to only search over W, okay, so that makes your dependence, as far as the row is concerned, okay, right, clear enough, I am sorry, just to simplify the optimization problem, otherwise there are too many variables, right, suppose I say this goes from 1 to dr, you need something but you do not need too many things, okay, let me come back, okay, I will come back soon, okay, so for instance if I say dl is 100 and dr is 100, how many variables do I have, 200 variables, that is probably way too much, okay, but if I say dl is 100 and dr is, the numerator is completely only one variable, it is okay, I mean 100 is good enough, I mean I do not need more than 100 variables to optimize, you may not get too much there, okay, is that clear, okay, so another thing that is done, so what do you want to do now, next you have to characterize all the lambda i's which will give you a particular rate, so usually you fix a design rate, okay, say half, so you fixed rate to be half, okay, and then you fixed your row almost, right, then you find all possible lambdas which will satisfy this rate equation and then you have to find what, you have to find thresholds for each of those lambdas and pick that one which gives you the best possible threshold, that is one way of viewing the problem, if that was the only way of solving the problem, it would not have been very successful, because finding the threshold would be very painful and there would be too many cases, okay, but it is possible to phrase this as a linear optimization problem, okay, it is possible to convert it into a linear optimization problem, see remember the way you do that there, the way you do that is slightly different, you try to maximize the rate and keep the threshold fixed, okay, that is also possible, you fix the design threshold, fix the design rate and try to maximize threshold, that is one approach, okay, as I said this is a fine approach, I mean it is not wrong, one can do it, okay, but a better approach which will give you some linear programming, linear optimization problems is if you do the reverse, you fix the threshold and maximize your rate, okay, you fix the threshold and try to maximize your rate, if you do that you will get a linear problem, you can see why, right, see what is maximizing rate, if your row is fixed, maximizing the rate is same as minimizing this entire expression which means maximizing denominator, what kind of a function is the denominator in terms of lambda, it is a linear function, okay, so trying to maximize r fixing a threshold can potentially give you a linear programming problem, okay, but fixing a threshold what does it mean, the constraints might now become non-linear, okay, but go back and look at the constraints, at least one of these expressions is a linear function of lambda, okay, so but that is not really the only constraint, there are other constraints, look at this function, it is a linear function of lambda, right, so it is also possible to modify your constraints so that all your constraints become linear, okay, both of those are possible, okay, so if you for instance choose to do the project or the term paper which is optimization of degree distributions, you might have to look at how that problem is converted into a linear programming problem, okay, there is no great concept there, it is just a technique of viewing it properly so that you can, so that you convert all your constraints and objective functions to linear functions and you run linear problem, by now if you are not sure, if you do not know, if you only have a linear objective function and linear constraints, it can be solved, there is an algorithm which can solve it, okay, that is also known, so one can do this reasonably fast, okay, so I mean I know I have not done a very great comprehensive discussion of irregular codes, part of it is intentional, part of it is also because there is really nothing much more that one needs to do, okay, so it is just a question of going in there and running these complicated linear programming problems with 100 variables, okay, I cannot run that for you here in class, it does not make much sense, okay, I can only show you how these things work out, okay, so that is the high point of irregular codes and it turns out if you do this optimization, you can get to get very, very close to capacity, okay, so this is enough pretty much, it will give you thresholds that are very close to capacity and then you do not worry about too many other things, that is enough for you, okay, is it clear any questions, is there something that is disturbing you based on what I wrote down, how close can we get to 0.11, for the BSc it is not very clear, okay, the actually the jury is out, people do not know how close one can get for the binary symmetric channel, if I change the channel I do something else, for instance the AWGN channel you can get very, very, very close, so the BSc I am not quite sure, I think you can get very close but I am not very sure how close it is, okay, so that is roughly the reason and motivation for irregular codes and why it is very useful in practice, so in practice for instance the Ymax codes which are actually in the standards are irregular, okay and they are right regular, okay, so they, yes, but they may not be the same, see there is an approximation of AWGN but the, okay the question was isn't the binary symmetric channel can be seen as an approximation of the AWGN channel, okay, but it turns out the kind of information you get is so much more powerful in AWGN and that variety is not there in BSc, okay and the BSc you do not get much information, the information content in the AWGN is more colorful and nice so you get better capacity, so that is the loose way of saying it but I do not know how else to quantify it but I think in my opinion it should be possible to even get close to BSc capacity, that is what I think, I might be wrong, people have shown some weird results, yeah I mean BSc is not a particularly difficult channel but it is not great, you know I mean for AWGN for instance if you do BPSK AWGN minus 1 plus 1, right, 0.5 and 0.7, okay are they different if you receive 0.5 is it different from receiving 0.7, okay BSc would say no, right, the BSc you don't distinguish but 0.5 and 0.7 can be different, if you actually use that number they can give you different answers, so maybe the kind of information you get is different, maybe that plays a role, not sure, but the capacity is clearly lower, right, 1 minus h of p is much lower than the BPS, not much lower, in many cases less than, significantly less than the capacity you can get with the BPSK AWGN, okay, is that fine, everybody is happy, there is no problem, so we have about half an hour, 25 minutes, half an hour to go, should we do the project discussion now or should we proceed and do the project discussion in the last 10 minutes, what should we do, okay we will do the project discussion now, okay but I think I want to also assign projects to people, so that will also take some time, so we will stop here for today and do the talk about projects and we will pick up from here, I will do soft decoding next, okay, so next thing is BPSK AWGN, what do you do with LDPC codes in that case, okay.