 Welcome everyone. In the previous lecture we were talking about problems from communication theory and our distinct view about those problems was that these problems should be thought of as decision problems in the same way as any of our other decision problems were in this in this course. So, these were basically stochastic control or stochastic decision problems and moreover these were problems with non-classical information structure. So, if you recall our way of viewing these problems in this way was that the setup of a communication problem. So, if you recall the setup of a communication problem is that we have a source which we denoted S that source was to be sent over a channel here or over a medium which we called the channel and that channel and the goal was to reproduce the source at the other end at a place called the destination. And the way to do that because the channel had noise and the channel had a variety had inputs that were potentially incompatible with the source and an output which is incompatible with what we want the destination to be the way to do that was by introducing two additional elements into the problem. The first is what is called the encoder and the other is what is called the decoder. So, the encoder maps the source to a possible channel input. If you recall I called this a kind of an adapter because it was taking a source from which is present in one kind of which is of one kind of alphabet or one kind of type of one type and was converting it to a compatible input for the channel and so was coming up with a channel input that was coming up with an X which would be then sent into the channel. This channel then produced a Y which was seen by a decoder and the decoder also acted almost like an adapter. It looked at Y and converted that converted its form to the form that we wanted at the destination. And the goal of the communication problem was to minimize a certain cost function. The cost function here is known as distortion. Distortion measures how different S is from S hat. So, the goal was to minimize the expected distortion over the choice of all these functions f and g. So, the particular cases of this distortion were for instance that the distortion could be the indicator of S not equal to S hat in which case this expected distortion just becomes the probability that S is not equal to S hat or the distortion could be the squared error difference between S and S hat in which case the expected distortion is simply the mean square error between S and S hat. Now, because we also noted that the information structure of this problem, the information structure by its very nature of the problem was that the information of S was not available at the decoder. And in fact that is why we wanted to be there is a need to communicate. There is a need to communicate arose because the what is known at the source is not known at the destination. So, consequently amongst the out of the two decision makers that we have here which is the encoder and decoder or the two controllers that we have here which we call the encoder and decoder the first controllers information is S, the second controllers information is Y, but Y is affected by the action of the first controller. So, Y comes out of the channel based on what X is sent in. So, Y is affected by X and X is itself determined by F. So, consequently Y is affected by the first controller. So, the first controllers action determines the information of the second controller, but the second controllers the second controller does not have access to the information of the first controller. So, the second guy here does not have access to S. So, in this way we can actually look at a communication problem as simply a stochastic control problem with non classical information pattern. Now we have seen already that whenever there is a non classical information pattern in this form there is this element of what is called the dual effect. A dual effect refers to the situation where the action of a controller impacts not just the cost, but also the information of the second acting controller. Now usually in a control problem the dual effect is called the dual effect simply because there is an action element to the to each controller. The first controller has a specific action associated with it which actually appears in the cost function as well. It does not really play the role of merely a signal sender which is in the sense that it does not merely play the role of conveying information to the second controller. However, if in a communication problem the situation is different. In a communication problem the action of the first controller is not really an action at all because it plays no role in the cost. If you see the cost here depends only on S here and S hat. You see the distortion is a function of S and S hat. So, there is no appearance here of the X. The way the X appears is because X here of X the information of Y and Y itself X of X Y, Y comes from X and X is equal to F of X. So, that is how the function F appears into the problem. So, as a result there is a sort of a peculiar thing that is happening here which is that the indeed F affects the information of G but F actually does not affect the cost directly. So, it only has an indirect effect. It does not have any direct effect. So, if you recall from here our discussion about the Wittson-Hausen problem which is where we said that the there were two sort of extreme cases involved. The one extreme first extreme case was that there is a classical information structure in which all the information about the past was being passed on to the future. The other extreme case is when there is pure communication in which I said that control has no relevance only communication matters. This statement here that control has no relevance is to be understood in this particular context that the control action of the first controller or X what we call X or the channel input really has no effect on the cost function has no effect in the cost function. At least this is the case in the standard formulation of a communication problem. Now to be fair there are also other formulations in which there is a cost also on X. We are not discussing those but the point of the main point to be made here is that this is there is a meaningful interesting communication formulation problem formulation in which f has only that indirect effect and no direct effect. So, with this background now we can reflect on once again on this these two extreme cases. Now we see that the first extreme case has been solved because we can say it is solved in the sense that we already know how to approach it. We know that there is in the MDP we can approach it through dynamic programming. If it is a POMDP we can approach it through the belief state. At least the logic of how to approach the problem is already well understood. What is the case with in this for the pure communication problem? Now in the pure communication problem we when we when we enter this domain we are basically now entering the domain of what is called information theory. So information theory refers to refers to a broad theory that is and that is that was developed by a scientist called Claude Shannon. Claude Shannon it is one of the most profound and beautiful theories ever developed in my opinion. Now what it concerns with is this is the idea of how much can you communicate when you have a certain infrastructure available meaning that when you have a source which has a certain characteristic say it is a probability distribution or something like that and suppose you have a channel which has a certain characteristic maybe it is noise characteristics or something like that. When you are given this particular these resources that means a source with a certain probability distribution a channel with a certain amount of noise what is the amount of information that you get that you can get across from the source to the destination. Now prior to Shannon the usual thinking was that if you had noise in the channel what you would do is use the channel multiple times and in a in such a way that you would cancel out the effect of the noise right so or average out let us say the effect of the noise. So you send the same input multiple times send it over the channel multiple times eventually on average the the noise distribute the noise would start behaving more the additive effect of the noise would start behaving more like its mean and then you subtract out the mean of the noise and then recover what was the original message. Shannon had this radical idea that if one is anyway using the channel multiple times then one could actually do a much better use of the channel resources than doing this kind of repetitive activity. Instead of sending the same message multiple times in order to defeat the noise in the channel what one could do is actually use the structure underlying structures that are present in a channel to make you to send messages so that they are sufficiently distinguishable from each other sufficiently distinguishable meaning that in spite of the noise in the channel what comes out of the channel is sufficiently distinct from each other for distinct messages. So consequently one can one should be able to then recover what the original message was without much difficulty. So this this turn it turns out is a far more efficient way of using the channel than doing you know the same message multiple times and averaging out the error. This was Shannon's main idea and this is what led to this idea of what is called block coding. Block coding is the idea that if one has to anyway use the channel multiple times one does not one should not really be trying to send the same message multiple times through the channel but rather use distinct strings to encode distinct messages and come up with a cunning choice of these strings so turning choice of of these input strings so that they are they can then be used to they can then be used on the other end of the channel to decode what is what was being sent. So the entire idea of Shannon's idea rested on making on coming up with these with these sort of coming up with ways by which such such a scheme could be realized. So this and come and Shannon's Shannon's main insight and main results was establishing what is establishing fundamental limits to what one could achieve in a in a regime like this. So this this here is what what what is what is called the block or block paradigm or the large block length analysis of communication or information theory. So what in this what I will do for the for the for the remaining part of this course is introduce you to what this block analysis is and tell you what we can one can actually what has actually been achieved in in this sort of a setting. So when we are talking of a block coding setting we have we must look at two we must look at the following setting. So the source that that we have comprises of n samples let us denote these by s1 to sn the these s these this this this is what is called a block of n samples. This this block of n samples is mapped by an encoder f so f maps s1 to sn to an input to the channel of length k let us say and it is that that input length is itself called is itself a block. So that is a block that is sent into the channel and that is what that is denoted x1 to xk. When this block of symbols x1 to xk goes into the channel what comes out at the other end is denoted y1 to yk. So the channel produces the channel by its law or it is probably according to its probability distribution takes takes x1 to xk as input and produces if what an output that we will denote y1 to yk. Now the end the decoder observes y1 to yk and maps that to an n vector again or a string of of symbols s1 hat to sn hat. So this this is basically the same as what we had earlier except that now I have been more explicit about what I mean by the source what I mean by the channel input and what I mean by the channel output and what I mean by the destination. The these now for us are strings strings the of of various lengths. So the first the source is of length n the what goes into the channel is of length k what comes out of the channel therefore is also of length k and whatever has come out is being mapped again by the decoder to to something of length of length n. Now because we will now be studying this as with n as a parameter we will be varying n in as a parameter we will also be varying k as a parameter. So as it as thanks to as a result of that we it makes sense to index these encoders and decoders also by n. So we will I will put a subscript n here. So because to denote that these encoders and decoders are of for a particular block length as the block length grows you would you would have to choose a new set of a fresh set of functions fn and gn. Now so now let us let us for for what do we mean by let us also recall what we mean by what is our cost function here. Our cost function is let us say if the probability of error is the cost function then the probability when what we mean by probability of error is we are really referring to the probability that s 1 the entire string s 1 to s n is not equal to the entire string s 1 hat to s n hat. So as strings these are not equal which means that there is at least one component out of these which is in which the s there is at least one component where either the where the s component is not equal to the s hat component. Notice that this is a very strong requirement we are not asking for although we are sending an entire block we are not asking for just one out of those n to be correct that would be a very very weak requirement we are asking for all n of them to be correct. So this the probability that any one of them is wrong is what is what is encoded in this in this expression here. Now let us write out this probability a little bit first so what is this probability when will there be an error well there would be an error if s 1 to s n is not equal to s 1 hat to s 1 hat to s n hat right. So this this here is so what and let us write these out so what are what is s 1 hat to s n hat well s 1 hat to s n hat is simply g n of y 1 k y 1 to y 1 y k right but y 1 to y k is produced from the channel. Now what is y 1 to y k y 1 to y k remember is produced by the channel. So let us introduce a notation for the channel the wheel will denote that the channel produces an output y 1 to y k with with a problem when it is provided an input x 1 to x k with a probability that is denoted by this expression here and what I will do is I will just write here y 1 to y k given x 1 to x k this here is the probability probability of seeing output y 1 to y k when the input is x 1 to x k. Now let us make use of this expression here to write out the the cost function or the probability of error that we had. So the probability of error I am going to do this write this in the following way I am going to write this as the probability of m not equal to s 1 hat to s n hat given that s 1 to s n is equal to m times the probability that s 1 to s n is equal to m right. So I am I am just denoting this vector s 1 to s n by m here so and I have conditioned on that event and I have I multiply outside by this probability and then I sum this over all values of m all values that the vector s 1 to s n uh s n can take. Now this this probability let us look at this particular expression now what is this expression well this expression here let us write this in a slightly different way we can write this as 1 minus the probability that that m is equal to s 1 hat to s n hat given that s 1 to s n is is equal to m now and this particular probability uh what is this probability well remember if m has to be equal to s 1 hat to s n hat it just means that whatever comes out of the channel which is which is this y 1 to y k whatever has come out of the channel has to be mapped to m that is what this is being that is what is being asked for of here so it which means the the the the probability that this happens is equal to the probability that the output of the channel was in one of the inverse images of m so now so which means that this is equal to the probability 1 minus the probability that g n inverse of m the it is equal to the probability of g n inverse of m g n inverse of m remember is the inverse image of that so it is the probability of of seeing any of those outputs which are being mapped to m given that given that s 1 to s n was being sent but then what when s 1 to s n was being sent what is effectively going into the channel is x 1 to x k so effect so therefore I can write this in as as follows this is 1 minus this the given probability y 1 to y k of x 1 to x k of g n inverse of m given f n of m so the this here is the is the probability so 1 minus this is this this particular expression and if I sum this over all m by after multiplying with a probability of s 1 to s n equal to m this particular expression is our probability of error so the goal the the goal is to then come up with functions f n and g n that minimize this particular error now you can see that this is this is a rather complicated and non-trivial expression to begin with this particular probability we do not know what kind of structure it has and the even if it had a structure the way f and g appear in this in this entire expression here is rather complicated f appears here f appears here g appears here we are looking at 1 minus this particular probability then we are waiting it with the probability that s 1 to s n is equal to m all of this this is rather this looks rather intimidating and challenging therefore that is why it is a remarkable achievement that something can even be said about the problem like this this is despite the fact that that that is in this is even more so amazing that what is being said is not is is being is actually being said in with great amount of generality and it is being said for for you know for a very large it is being said with a lot of generality with with n going to infinity so this is the setting that is the state of the art in communication theory what communication theory has done is set up a stochastic control is is actually set up a a specific type of stochastic control problem but it is solving this particular problem in a in in in its own regime where remember the where the the the the action of the sender has has no effect on the cost and and it and and the cost is of a very specific structure the cost is about the different the error between what is known at the source and what is known at the destination. So what we will do in the remaining in the coming lectures is actually look at specific cases of this problem to begin with problems of data compression and then also look if time permitting also problems with problems of of communication over noisy channels and and and say what and see what has what can one say about this particular problem.