 Welcome to this lecture on digital communication using GNU radio. My name is Kumar Appaya. In this lecture we are going to take a brief look at channel capacity. Channel capacity is related to the maximum error free communication that you can perform over a channel. We will largely focus on the binary symmetric channel, but briefly mention how these concepts can be carried forward to other channel types also. As an outline we will talk about rate and error free communication, we will briefly mention typical sets and their probability and then we will roughly specify the capacity of a binary symmetric channel and just mention the capacity for a Gaussian channel that is more practical and it is more related to the simulations that you have seen and summarize what the implication of capacity is in practical communication systems. When you look at rate for error free communication, so if you consider an NK block code as we have seen in the past several lectures and NK block code has a rate of K upon N. Now the block error let us say in a binary symmetric channel occurs if enough bits get flipped like the thing is that you have a noisy channel is a binary symmetric channel. If there are no bit flips there is no problem, if there are bit flips then you can potentially correct for those bit flips. For example, we have seen the hamming code as an error correction mechanism, we have seen the repetition code as an error correction mechanism and these codes essentially are able to flip the you know rather are robust to those bit flips and still say ok most likely this particular code was sent and therefore your K bit message is this. Now if hypothetically we demand error free communication, what does that mean? Error free communication is where you have a channel that introduces errors, but you have added enough redundancy so that all these errors can be handled. Let us take an example, let us say that you have a channel with let us say a binary symmetric channel with P is equal to 0.2, then the probability of each bit getting flipped is 0.2, but suppose that you use a code and reduce your rate and say use a repetition code and repeat everything 11 times then potentially the error rate can be reduced significantly the effective error rate can be reduced significantly because even though your channel is flipping bits the error control code is able to handle those bit flips, but the price you pay is that the rate is very poor in the case of 11 repetition code is 1 upon 11. The question that we are asking is what is the maximum K upon N that is achievable or possible as N goes to infinity. Now one key idea here is that we are giving a block length that is very very large that is I am allowing you to essentially send several bits a block of several bits and then asking you to give me the maximum possible rate. Now one advantage that you will see happening is that when you allow me to use the channel or send bits over the channel multiple times as you send bits over the channel multiple times what ends up happening is that there is a certain pattern in the you know pattern in the types of errors that the channel introduces and getting a handle on these patterns can allow you to design codes that can beat the errors that is basically the way you can get capacity. In practice if the block length is very high what is the maximum K upon N that we can get with an arbitrarily small error rate. Now practically as students of probability you know that whenever you have a binary symmetric channel with any non-zero probability of error for any finite N there definitely is going to be some you know some situation where a large number of errors happen irrespective of the code that you use you will always end up getting some decoding error of course. So let us change it a little bit suppose that if you want a practical idea suppose that I demand that the error should be less than 10 power minus 9 or less than 10 power minus 12 then maybe you may be able to design a code of course the easiest way is to just define a repetition code. The repetition code however is very poor because if you want an arbitrary arbitrarily low error probability you have seen that you need to increase N significantly and the rate is going to be 1 by N and as N tends to infinity 1 by N is going to be close to 0. I want the highest possible K by N which gives me error free communication with over a channel with errors. So let us look at a binary symmetric channel with error probability p. What are the most typical error sequences in N channel uses? Now this word typical has a special meaning or special connotation in the study of capacity typical does not mean you know typical means probable in the sense that it is not the most probable error sequence that is you know that if p is less than half then the most probable error sequence in a binary symmetric channel is the all zeros it is actually just so what we are asking is what is the most probable number of errors when bits are sent N times roughly. So let us actually just look at this in more detail let us look at this in more detail. You have binary symmetric binary symmetric channel with error probability p it looks like this a 0 is transmitted a 0 with probability 1 minus p a 1 is transmitted as a 1 with probability 1 minus p but they get flipped with the probability p. Let us now look at the binary symmetric channel with multiple uses meaning transmission of multiple bits. Let us say we look at 3 bits ok with 3 bits let us look at the error pattern I am deliberately going to look at the error patterns in terms of their hamming weight or the number of errors 0 0 0 0 0 1 0 1 0 1 0 0. Next I am going to look at 0 1 1 1 0 1 1 1 0 1 1 1. So if you now group this this is ok this is 0 errors this is 1 error actually till here this is 2 errors. What is the probability of these? The probability of this happening is 1 minus p the whole cube remember if p is less than half then among all these this will be the highest number. The probability of each of these p into minus p the whole square the probability of each of these is going to be I should have done this here and this is this is 3 errors of course 1 minus p and this has probability p cube remember if p is less than half let us say p is 0.1 then this is going to be something like 0.9 the whole cube is going to be 0.1 the whole cube and you are going to have a couple of more numbers which add up to 1 of course if you add up to 1 if you count the number of these sequences there 3 you know 3 times is 3 times this and this is the situation that we have the most likely is 0.9 power 3 ok yeah. Now, let us let me ask you a question let us say n bit error patterns let us actually now also write the total probability here ok let us not write it in terms of this number let us write the cumulative probability by taking into account all possible sequences just for the just for our reference. So, 0 errors is 1 minus p the whole cube 1 error is 3 p 1 minus p the whole cube 2 errors is 3 p square 1 minus p the whole cube and p cube. Now, rather than target individual error patterns suppose we try to target the number of errors that the channel will introduce among multiple channel uses. In such a situation which is the most which is which is which one of these is the most likely is there is where the typicality starts occurring. In other words the most typical sequences are those where even if you look at large numbers of large block lengths where all the probability is for example 1 minus p the whole cube may be a large number, but 3 p into 1 minus p the whole cube may or may not be. So, for example, if you look at n bit error patterns ok let us say that we say 0 errors the total probability of 0 errors is 1 minus p power n 1 error 1 minus p for n minus 1 n times p ok. If you want me to be very formal you can write this is n choose 1 p power 1 1 minus p power n minus 1. Now, over here notice that this n times p may not be a you know small number because what happens is that you have 1 minus p power n and you have n times p time 1 minus p power n minus 1. So, having an n which is large may actually cause a single bit single error patterns as a total we more likely than 1 the all 0. See the all 0 error pattern is definitely the most probable, but what is the probability of getting 0 errors as n goes to infinity 1 minus p power n is going to be 0 as n goes to infinity for any non-zero p. So, I am saying when you make the block length larger I am 99.999 percent or more almost 100 percent sure that I am not going to get all zeros what is the most likely group of error patterns that I can get and that is what I am looking for. So, let me let me now say k errors I will just write the k errors you are going to need n choose k p power k 1 minus p power n minus k this is the k errors. So, if you have k maybe k is not a good notation because we use that for our block length and everything. So, I will change it to let us say l errors let us just for you know just for here also let me change it to l errors. So, now if you think about this the probability of getting probability of getting l errors the probability of getting l errors not an a single l error sequence. Property that your sequence has l errors since all of those are mutually exclusive you have to add them up and their n choose l of them is n choose l p power l 1 minus p power n minus l. Now, this is could be you know n choose l times p power l times 1 minus p power n minus l maybe much more than 1 minus 1 minus p power n in which case this is probably what you need to target because as you make the block length larger and larger what are the most likely error sequences that is the question. To give you some intuition as to how this is going to work out let us say you have a biased coin let us say you have a biased coin let us say probability of heads is equal to p 1 minus p let us say. Property heads is 1 minus p then the probability of the sequence h h h h h h if you toss it n times is 1 minus p power n and if you now say probability of k heads let us say not k the gain I am making the same mistake let us say l tails n minus k n minus l heads this is n choose l p power l 1 minus p power n minus l. If I ask you the question that if I toss the coin n time where n is a number which is really large like 1001 million what is the l that is maximum what is the most likely number of heads that you will get intuitively this is a binomial distribution of course the PMF of binomial distribution I am asking you what is the peak of the binomial distributions you know in terms of l which is the most likely number of tails that you will get or heads that you will get and that is going to give you an indication. Because if you then know this pattern then these are the error patterns you can design your code so that you can have one code word and another code word which if you flip that many number of bits still uniquely decodes to the same code word what was the motivation with which we went to the hamming code one bit error pattern resulted in zero errors after decoding similarly if you know the number of errors we can always design our codes in such a way that you can always you can handle that many errors and that is where the capacity aspect comes in now the most probable sequence is the all zero sequence or you know all heads or no errors but the most probable number of errors is obtained by maximizing I've chosen k please bear with me n minus k p power k 1 minus p power minus k over k not k factorial over k and it turns out that k is NP why you can actually look at the binomial distribution and if you plot the binomial distribution you will always find that k being close to NP is the peak if you want a more intuitive answer you can think of the binomial as the summation of independent Bernoulli and if you keep adding independent Bernoulli then you from the central limit theorem you know that the mean is you know it's going to look like a Gaussian in fact if you plot a Bernoulli for a large n and the PMF of a Bernoulli for a large n it looks like you know sticks which resemble a Gaussian shape and the peak always happens close to NP the reason I am writing approximate is because k is an integer and NP may or may not be an integer but the peak always happens close to NP this gives you an indication of the fact that the most likely number of errors is going to be NP so the most probable group of errors seem to have or seem to produce NP bit flips but the problem is you do not know where those NP bit flips are because if you know where those NP bit flips are then you are done but your code which is consists of you know 2 power k code or 2 power k codes have to be in such a way that if you flip NP bits still you are able to map back to the same code word and not go to another more than NP is fine but if you flip NP code NP bits of a code word it should still be minimum distance to the same code word that is what we are going for now while this is the intuition typically the decoding process is not that simple for the for finding out capacity but let's go further what if we choose n length code word that separated sufficiently so that NP minute flip does not hamper decodability in the case of hamming code you said I want to separate them in such a to such an extent that one bit flip does not decode ham you know hamper decodability now it's NP now here is where we are going to actually just make a packing based argument the total number of code words okay total number of code words or other total number of sequences which are n bit is 2 power n remember n is a large number so you have total 2 power n sequences now we we said that let's take n bit code words in such a way that flipping them NP times okay flipping them NP times results in the results in some sense the same number of okay results in unique decodability so it's like this meaning there is a particular n length sequence if you flip any NP bits then you get several several several code words how many such code words do you get n choose NP okay roughly you don't believe me it's very simple take take any n length code word how many different n length vectors can you get by flipping NP bits it turns out that it's just n choose NP so it's just the binomial you just have to you know how many you know how many L bit flips can I perform n choose L into so now the number of code words sorry the number of code words that I can accommodate in this 2 power n space is going to be 2 power n divided by the ball which is of size n choose NP let's give you some intuition over here let us say this is the space of 2 power n n bit sequences this is a code word let's say we call it X any NP bit flips NP flips will result in several other n bit sequences and we want another code word to have another NP bit flips which cause this to not get confused with x1 similarly we have another x3 with another NP bit flips which you do not want to get confused with x1 and x3 so it's almost like you want to be the spheres to be packed in into this 2 power n bit sequences and they should just not cause any confusion so how many can you get 2 power n divided by n choose NP why is it n choose NP because each of these spheres has n choose NP elements in them because you said NP errors are most likely fine I'm just going to make sure that any NP errors does not cause any confusion so approximately 2 power n by n choose NP is the number of n length code words that you can support I have used sterling's approximation here sterling's approximation approximates you know n factorial for large n in fact it's proportional to n power n so if I write that I get n power n by NP power NP into n minus NP power n minus NP you may argue that NP is not an integer but for now bear with me approximate let's say that we have managed to choose a close enough P and such that NP is an integer then 2 power n by n power n in n power n by NP power NP into n minus NP power n minus NP is the number of code words if you just take this to the numerator you will you'll notice the n power NP and n power n minus NP result in power n n power n gets cancelled and you end up getting 2 power n into P into 1 minus P this is the number of code words therefore if you now use n block length that is n uses of the binary symmetric channel and you want to find out the number of code words that you have the number of such code words is 2 power n P power P 1 minus P power n minus P which means what is the rate that I get so this is actually going to be okay so for n for n length sequences you are able to send 2 power n P power P 1 minus P power n minus P the actual number of bits you're sending per channel use is 1 upon n and we just log we take log to the base 2 why do we take log to the base 2 suppose you have 4 code words you can send 2 bits suppose you have 8 code words you can send 3 bits suppose you have m code words you can send log to the base 2 m that's the reason why so we just take log to the base 2 of this and divide it by n because we want a per channel use or per binary symmetric channel kind of rate and this will turn out to be 1 plus P log P plus 1 minus P log 1 minus P therefore this is going to be it turns out the rate achievable over a binary symmetric channel okay let us actually do some intuitive calculation this P log P plus 1 minus P log 1 minus P is a number which you can confirm is going to be between I believe it will be between 0 and minus 1 in fact that's why typically they take a negative sign over here let's actually just do a small kind of operation let's just say 1 plus what happens when P is 0 when P is 0 okay then P log P right that's the tricky part P log P it turns out is like 1 log 0 and log 0 is minus infinity all those kinds of things are there we are just going to take P log P to be you know 0 over here so you get I'm going to call this R R is going to be 1 because I'm just going to take this to be 0 similarly if you say P is equal to P is a number less than half so let's take half this is going to be 1 plus half log half plus half log half right because P is half 1 minus P is also half this is equal to 1 plus a half log 1 plus log to the base 2 of half and log to the base 2 of half is going to be minus log to the base 2 of 2 which is 0 so the intuition says P is equal to half implies that the rate achievable is 0 what does this mean see here's the thing if P is equal to half corresponds to the case of like a fair coin that is the most typical sequence is the one that contains an equal number of heads and tails or an exactly n by 2 errors now if you have n by 2 errors then there is a problem because let's say that the worst situation you will encounter is you use a repetition code if you use an n length repetition code and I flip n by 2 bits then you are in trouble because you have an ambiguity if you have an n if you have n ones you flip half of them you're going to have an equal number of zeros and ones you can't decide similarly if you take n zeros and flip half of them you're still going to have an equal number of zeros and ones so intuitively speaking when P is half the rate is 0 because even a repetition code will hardly work and that's the reason why P is equal to 0 no bit flips in one binary symmetric channel use I can get one bit why because that's the case where the binary symmetric channels and zeros and zeros and ones as ones so if you know go a little further there's this notion of the so-called typical set this is just to formalize the statements that we made earlier you can just understand this in terms of probability if you have x1 x2 xn as iid random variables drawn from a finite alphabet x the typical set a epsilon n this is basically making it formal contain sequences that satisfy h of x-epsilon is less than equal to minus 1 upon n P of x1 x2 xn less than equal to h of x plus epsilon where h of x is the entropy of x defined as summation over x minus P log to the base 2 of P is a binary entropy now in this case our finite alphabet here is 0 comma 1 and hx minus epsilon hx plus epsilon is less than equal to minus 1 by n P x1 x2 up to xn what does this mean so let's say that you know you take the noise sequences in noise sequences that result from a binary symmetric channel you can also take them as bit see bit flip sequences it turns out that for large enough n and for any epsilon you give me like epsilon is number which is like 0.01 0.001 something like that then the probability of the seek the most likely seek you know sequence of errors it turns out is going to be between hx plus or minus epsilon in other words let me just put it in this way for typical sequences probability of will always be less than or equal to 2 power minus n h of k and it will be more than this is something that will happen in this particular case if you take a seek you know if you take any np any sequence that are np ones in the case of a binary symmetric channel they will essentially satisfy this where h of x is defined as this is not very different from what we saw earlier here we saw that here this is maximized for L is equal to np right that same thing is being specified here in fact if you go through some references on information theory you will find that this proof of this basically rests on the weak law of large numbers and the noise sequences in a binary symmetric channel are typical because as you make in larger and larger it turns out that only the sequences that have np ones are going to start appearing and these sequences are going to be you know having probability of 2 power minus n h of x approximately and therefore they are the only sequences that you have to handle therefore when you design your code words they have to essentially be able to handle these typical error patterns. So, as n tends to infinity typical sets sequences amass all the probability in other words let us say the all 0 sequence all 0 error sequence 0 probability all error sequence 0 probability ok you know close to np errors they have non-zero probability each of those sequences has probability 2 power minus n h of x and they amass all the probability. So, all you need to do is to account for these. So, therefore it is almost like this you have 2 power n sequences you need to partition them so that any np length errors are handled that is you need to design your code to account for the fact that the 2 power n sequences which you you have the rather the nc n bit sequences that you have they partition the 2 power n sequences into those that are robust to these typical error sequences. If you want a formal kind of picture you have 2 power n h of y and if you work it out let us say y is equal to x plus noise you will find that if you choose x to be equally likely 0 and 1 and n is any distribution of binary symmetric channel noise you will find that 2 power n h of y h of y is actually 1. So, and there is this concept of h of y given x so you will have to evaluate the you know distribution of y given x is 0 or 1 and place this actually, but that is actually 2 power n h of y by 2 power n h of n what does this mean this it turns out is 1 minus h of p it turns out that the capacity of the binary symmetric channel is 1 minus h of p, but this is the same as what you got earlier which is 1 plus p log to the base 2 of p plus 1 minus p log to the base 2 of 1 minus p. So, this is exactly the capacity of a binary symmetric channel formally speaking however you need to work you need to actually prove the fact that this rate is achievable and you need to also prove that any rate above this is not achievable for that you can refer to a good textbook like fundamentals of information theory by Kovar and Thomas or any other textbook, but the intuition that you have is that n length sequences are present n length bit sequences of those you choose those in you know 2 power k sequences such that any n p flips result in no ambiguity so you want to handle only the typical error sequences and that partitioning intuitively gives you this capacity. A quick kind of aside to AWGN if you consider a real additive wide Gaussian noise channel in this case x square is less than equal expectation of x square is than equal to p that is the constraint that is placed because in the case of an AWGN channel your x can have a higher and higher power and your n is the noise will say Gaussian noise of course it is wide and mean 0 and variances sigma square. It turns out that the capacity of this channel is half log 1 plus p upon sigma square intuitively this makes sense because if you allow p to be higher and higher you can choose excess to be more and more robust and you get a higher and higher rate and the number of possible excess that you can potentially choose is related to c of course much like in the binary symmetric channel case you cannot directly just use excess you need to create code books of multiple x you need to basically be allowed to use the channel multiple times and that way you can get a large number of channel uses and that's where you can achieve this capacity using typicality once more. So as an intuition if you use the channel n times just like in the case of a binary symmetric channel you have x1 x2 up to xn and the noise essentially gives you x1 plus n1 x1 plus x2 plus n2 and so on. So use the channel n times then this n is actually the you know n as opposed to the noise in the previous one apologies for the confusion. The signal to noise can be considered a sphere which has volume roughly root of n times p plus sigma square power n okay because your y has volume and root of n of p plus sigma square power n and the noise sphere has root of n sigma square power n as its volume because it's like you keep generating it's like you keep generating n length sequences of noise and plot it as a higher sphere in n dimensions it's a sphere which has volume its radius is root n sigma square so volume is proportional to root n sigma square power n of course I'm ignoring the constants that go there I'm going to take the ratio here if you take r the ratio r is equal to 1 by n log root of n plus sigma square np plus sigma square by root n sigma square whole power n this n comes out and cancels this end you get half log p plus sigma square so again by taking random Gaussian sequences you can show that the number of codes number of the rather the k or rather the rate that you can get over the channel is half log 1 plus p upon sigma square for again more on this you can refer to a good information theory textbook so to summarize bitter rate in practicals channels is of course quite high coding reduces bitter rate but suppose that I demand an arbitrarily low bitter rate what is the highest rate that I can get the channel capacity is the ultimate limit and as the block lengths go to infinity you can reach close to this in fact modern codes such as LDPCs and turbo codes in several situations are able to achieve very close to the channel capacity with of course some error margins taken into account and for more information you can check some good references is in particular the Shannon Hartley theorem that extends this to the case of channels with bandwidth so if you have a wide band channel you can actually say that per Hertz achievable rate is this much and you know this is the capacity and you can look at several information theory courses and modern coding related references for more information on information theory and coding and their connection thank you