 Welcome everyone, this is the last lecture of our course Stochastic Control and Communication and we have come to a very interesting juncture where we were talking of data compression problems and we were wondering if it is in fact even feasible to do something like data compression. Data compression involved taking a source that can take 2 raise to n possible values and storing it in a storage which had space for only 2 raise to k possible strings. So the problem for us was are these 2 raise to k in fact sufficient or is it in fact not even possible to recover the source meaningfully with those 2 raise to k with just a set of size 2 raise to k. Now for that what we decided that we were looking for a set we were wondering if there exists a subset of the strings that we want to store that is to a subset called a and epsilon whose probability was high so that when we recover this subset is our target that is what we want to recover and so the probability of this set has to be high but then the size of this set has to be as a fraction of the total number of strings has to be small. And now what we did was we looked at what is called the asymptotic equipartition property here which was in which we looked at what is called a typical set is a set of sequences whose probability distribution satisfies a certain set of inequalities. And the with this we said were the strings that you would typically see the strings with the number of zeros and ones that you would typically see when the strings are being generated in an IID fashion and what we found was that this set satisfies a certain set of properties. The first property is that its probability is large so as n tends to infinity this probability is you have that p of n epsilon is for n sufficiently large finite but large you have this property that p of n epsilon is a and epsilon is greater than 1 minus epsilon. But we also have the property that a and epsilon is actually its cardinality is upper bounded by 2 raise to n times a constant. We have yet to see what this constant is but this it is upper bounded by this particular constant and we also had this additional property that is there is a lower bound also on the size. So now question is first the question that we had was well how do we make use of this? The way to make use of this is to do the following. So what we do is we wanted to recover this 2 raise to n we had a 2 raise to n possible strings that were in our source. What we would do is we use the function f to map to assign them a label the set of labels are from 1 to dot dot dot till 2 raise to n times h of p these are our labels. So we assign each of these strings a label which is from and the set of labels is from 1 to 2 raise to n times h of p. So the number of labels therefore the number of labels is therefore is 2 raise to n h of p. Now this then these labels are then used to recover back the string that was sent. So this is what is being used to decode. Now in fact technically what we can do is because every each of these 2 raise to n strings has to be mapped to a label technically what one can do is actually pad this. So what pad this label with another bit so what we do is take f to be a function that maps 0 1 to the n 2 this set of labels times 0 comma 1 this is your set of valid labels and now the strings that you intend to recover you pad it with a label 1. So if x 1 to x n belongs to a n epsilon pad it with a 1 and then attach and then we attach the label. If x 1 to x n is not in this is not in a n epsilon and pad it with 0 with a 0 and once it is padded with a 0 you do not care what it is what follows you just start it with a 0. This is what the function f does so this here defines for me the function f. What g would do is c is the first symbol if it is a 0 if it is a 0 discard or declare an error let us say declare error it just says that I do not know what this commounds to it is declare an error and if it is a 1 proceed to read the label and then from the label recover the label reconstruct reconstruct your x 1 to x n reconstruct what the string that was being said. So then what is the in this situation what is the probability that you that you the probability of making an error probability that you make an error making an error here is the probability then that of x 1 to x n is not equal to x 1 hat to x n hat. Now this probability you would make an error only when the string is actually not in from a n epsilon so this probability you would make no error when the string is from a n epsilon you would make an error only when the string is actually not from a n epsilon. So the probability of making an error is actually at most is at most the probability is greater than is less than or equal to the probability that the string here is not in the set a n epsilon and this particular probability remember is less than epsilon. So in other words that we have that as n for n sufficiently large the probability of error is less than epsilon this is essentially the scheme. So what this is what this is shown is that it is possible for us to work with approximately 2 raise 2 raise to n h of epsilon many strings and recover the source perfectly. So the resource that I was saying we were saying that I was saying that the channel has a resource constraint that it can accommodate only 2 raise to k possible spaces well if that k is greater than n times h of epsilon then it is still it is possible to take the source and to take to assign to take the source assign it labels ranging from 1 to 2 raise to n h of p and then recover from the from those labels the source with high probability. So what this is saying is if k is is greater than equal to n times n times h of epsilon we can recover the source. So what we can interpret this result as a compression result this is effectively saying that a string a strings of length n generated iid with a problem generated iid with a distribution p comma 1 minus p strings that binary strings of let us say binary strings of length n generated iid with a distribution p comma 1 minus p can be compressed n times h of p. So in other words the average length so the so a string of length n has been compressed to a string of length n times h of p that is what this is effectively saying with vanishing probability of error with vanishing probability of error. So now why is this why am I calling this compression well we are calling this compression because actually this the h of p function actually looks something like this so this is suppose 0 and this is 1 the h of p function it attains its maximum at p equal to half here it is actually symmetric around the line around p equal to half. So this is this is p here this is h of p so it is symmetric around around around the line p equal to half at p equal to half actually it takes value 1 and everywhere else its value is strictly less than 1 so n times h of p is in fact strictly is strictly less than n so which means that what we have achieved is therefore compression we have taken strings of length n and compressed them down to strings of length n times h of p and yet we have been able to recover ensure that the probability of error is small. So now which means to complete this all one needs to do is take a sequence of epsilons and then take a sequence of sequence of codes such that the probability of error then goes goes down to 0. So this particular this theorem is of course a champion theorem because it tells us that this extremely improbable sort of thing is actually possible. It is possible to come up to find a subset of strings whose probability is large but whose cardinality is small and that is why we are able to recover the source with high probability but while using a very small fraction of strings because the cardinality of the strings involved is very very small. Now this is this is only telling you the possibility direction it does not tell you that you cannot it just tells you you can compress them down to length n times h of p but it does not tell you that you cannot do go lower. So that is there is a separate theorem which tells you that you cannot go lower that is what is called the converse theorem of this. That basically tells you the converse theorem essentially says that converse theorem basically says that if n is equal to some theta sorry if k is equal to some theta times n and theta is strictly less than h of p then the probability of error is probability of error goes to 1. What this means is that if you you can compress your strings down to length h of p but if you try to even compress if you try to compress them even for a little bit further if you are looking for even see even a slight bit more compression than n times h of p then you will make the probability of error is 1. That means no matter what scheme you use no matter what kind of subset you look for you would always have the probability of error going to 1. So which means there is no scheme that will give you compression below length h of p. So this is effectively what this is done is basically if you recall we were here as a decision problem I had said that the goal here was to minimize this particular probability of error. What this is done is actually answered the following question that it has told us that as of as k varies you have this how this probability of error behaves as n goes to infinity. So if k is greater than or equal to h of p so this the behavior here is as follows if k is greater than or equal to h of p we find here that h of p is this sort of is this hard threshold. If k is greater than or equal to h of p so this is the value of k or rather let me this is let me write this as theta theta k is theta times n. So if so what we found is that there is this hard threshold. So here on the on the x-axis I am going to plot theta I am going to plot theta so the the value here of so this is this is say theta equal to 1 this is say theta equal to 0 somewhere in between is h of p and so k remember is theta times n and what what this this this theorem has told us is that as this is that we can study this you know as a function of this parameter theta. So if k is if your theta is strictly less than p then this probability of error then the probability of error is is 1 whereas if k is is greater than if theta is greater than h of p then the probability of error is 0. So this is the sort of shape that the probability of error takes asymptotically. So as n goes to infinity if your theta is even slightly below this threshold then the probability of error shoots to 1 if your theta is greater than that threshold then the probability of error is equal to 0. Which means this entropy here is something serious something very very significant. It marks the it marks a phase change between when you can compress a source up to which you can compress a source and beyond which you cannot. You can compress it down to h of p as close to h of p as you can but not lower that is what we that is what this results is. So here so what we have done is basically we have looked at let so let us now take a step back and try to think of where we are. We said we looked at communication problems because we were interested in the communication aspect of the Wittson-hausen problem. Wittson-hausen problem had in it implicitly a communication angle and that was because there was in it a the dual effect hiding because the first controllers actions affected the information of the second controller therefore he was implicitly communicating something to the second controller. Thanks to that we said we will let us look at the two extreme cases. The first extreme case is what we had seen in the first part of the course where we studied MDPs and POMDPs. The second extreme case was the case of communication and what I showed you was that communication problems can be thought of as basically decision problems and where one is trying to minimize a certain error cost function of some kind of which was some kind of probability of error or probability of distortion or the expected distortion and over the choice of controllers what we call controllers but these are in this language are called encoders and decoders. Now the reason the this although this is an extreme one extreme of the control setting which is one extreme of the Wittson-hausen setting there is a way there is a sense in which this is not exactly compatible with the setting we had in the Wittson-hausen problem. In the Wittson-hausen problem there was the setting we had was that there was a single there was a source and that source was a scalar source. It was in fact a Gaussian random variable a scalar Gaussian random variable that was being mapped by a function gamma 1 which then whose action then affected the information of the second controller then second controller there was noise in the medium that was also a scalar Gaussian the second controller also took an action which was a scalar. So thanks to this what we had this particular setting here had the source which was scalar actions which were scalar information which was scalar and information which was scalar. Now although the communication setting is in spirit an extreme setting of this particular problem of any kind or in fact of any non-classical information structure problem the communication setting is not really this setting I mean the problem that has been studied and has been solved in communication is not really pertaining to this setting because the problem that has been solved involves not a scalar but an entire block of symbols not one symbol but a block of symbols. So remember the communication setting as I said was involved this it looked at the encoders and decoders did not just look at one symbol from the source and then mapped it then towards to something later but rather looked at an entire block and that entire block was being mapped n mass to a block of symbols that would block of channel inputs that block of channel inputs went through the channel produced a block of channel outputs which was then mapped back to a block of possible reconstructions or recovered symbols. This this block coding is essential in the within the information theory paradigm because that is where because that is where properties that asymptotic properties of the source kick in. Notice that the way we solve this compression problem was by using the asymptotic equi-partition property and the asymptotic equi-partition property is one which kicks in when your n becomes large. It tells you that when as you take large enough strings or in other words long enough samples of the source what is the typical number of zeros and ones and so on. If you have a small number of samples or in fact any small finite number of samples you would not be able to say anything meaningful about you know what is the exact number of zeros and exact number of ones. Fundamentally the problem at a finite block length is essentially a kind of combinatorial problem because when we have when because as you as I mentioned knowing the exact combination that gives you the smallest probability is not that easy. So the the spirit in which the communication problem is a solved problem and if the communication problem yes is an extreme case of the Witzenhausen problem but the spirit in which it is we can say it is a solved problem is if we take this if you have to accept that if we is the spirit in which we can say it is a solved problem is the large block length limit. It is when you give yourself enough samples to accumulate look figure out the patterns within those samples and then use those to communicate. This is essentially what is the secret source essentially of communication and that is the bedrock on which communication theory rests. Now let us come back and think a little bit about how we were we come back and think about the control problem once again. The control problem would if one wants to use this sort of paradigm within control effectively what it would mean is that we would have to wait for a large number of samples or we would have to the samples that we would be getting would be some kind of discretization of an earlier sample and then they would not have the kind of distributions that are being assumed in information theory. So when you have these kind of large number of when you have to wait for a large number of samples what it precludes is the possibility of doing you know any kind of automatic decision making. So therefore there is this kind of tension here between what is conventionally being pursued under in the communication setting and what is actually required in the control setting. That does not mean that I am not trying to say that this is not being solved some of these some there have been some very nice attempts at this some very nice theories have been developed. But broadly speaking we still do not have full clarity about the role of information in a in a typical control setting. We do not really know what the what kind of notions of information should actually be arising in a control setting. So the when so another another important thing to note here is the is the way entropy appeared into our calculations. Entropy made an appearance in our calculation because we were looking for a set with a certain with a certain set of characteristics. We were looking for a set with high probability but with small cardinality and that came about through the asymptotic equipartition property and the the the asymptotic equipartition property then gave us the entropy function. If one is looking for another objective you know not this particular objective which is the one that is mentioned in the in the data compression code it is quite possible that one would look one would end up with another set of with a different set of sequences or a different property underlying property and therefore and then come up with a different quantity altogether which does not which is which is would be the appropriate notion or the appropriate measure of information in in in a control problem. So these all of these issues basically are all of these things that I am mentioning essentially point us to the question of what is the role of information in control. Stochastic control is essentially about passing information from one step time step to the other. The simplest case is one where the information is passed free of cost we already we always have the information nested and it is available for us to use in the future but but anything anything practical or more general would involve some amount of loss of information and the right notions of what information is to be used what information is to be to be retained what exactly are we signaling and so on all of these notions remain poorly understood. Some specific things have been studied such as there is a notion of any time capacity where one knows the what kind of channel resources are needed for stabilization of a plant and so on these kind of things are known but but a lot more is unknown than what is known. So this is so I invite you to be a part of this this fascinating journey we are we are really setting up setting up today a the theory for the future of of stochastic of stochastic systems of which very very little seems to be understood despite their prevalence in all around us. So I hope this course has has shed some light on some of these issues. I hope you have also taken taken back the view that I started the course with where where I said that communication control problems in a organization structure and economics are all actually brethren of each other in some sense and and but but they are not exactly the same and that is why the difficulty lies in in in merging merging their ideas and working on the interfaces which will then lead to new the the you know new insights and new developments into the field. So I look forward to your participation in this course I look if you have any if you would like to talk to me about anything my my information is there on the web you can feel free to get back to get in touch with me. Thank you everybody for for your attention in this course and I and I and I wish you all the best in your studies.