 Welcome back to this lecture on digital communication using canoe radio. My name is Kumar Appaya and I belong to the department of electrical engineering at IIT Bombay. In this lecture, we are going to continue our discussion on demodulation and signal space. If you remember in the last lecture, we saw a detailed calculation of the basis vectors for a set of signals using Gram-Schmidt orthogonalization and I also made some remarks about how you can sometimes just by inspection infer some of the orthogonal basis vectors. Once we have a handle on the orthogonal basis vectors for the signals, the key idea that we are going to see is that the projection of the noise onto these orthogonal basis vectors is a sufficient statistic which in other words means that only getting the numbers by integrating with these size, the integrating the noise with the size that is enough to determine optimally which symbol was sent or make the optimal choice. So, let us look at projecting the signal and noise onto the signal space. So we have y of t is s i of t times n of t with basis signals psi 1 psi 2 up to psi n. If we now consider y as inner product y, psi 1 y, psi 2 up to y, psi n transpose, what this gives you is this gives you a vector with which basically gives a set of numbers obtained by projecting y of t onto psi 1, projecting y of t onto psi 2 and so on. In this case I have not had the I did not add the subscript s i, but you can just assume these s s to be s i's. So you obtain another set of n numbers by projecting s of t onto psi 1, s of t onto psi 2 and so on. This is fine because your s of t or s i of t was designed to be lying within the space of the signal psi 1 to psi n and this is what we saw in the previous class that you constructed your basis, orthonormal basis vectors psi 1 through psi n to capture all the s i's. But the trouble is over here. Here the problem is that n of t is something which is a completely different beast. It is a signal obtained you know because of noise it is not predictable and you can most definitely say that you cannot write n of t as a linear combination of psi 1 psi 2 psi 3. That is n of t definitely has some components and variations that cannot be captured by just expressing it as the linear combination of psi 1 and psi 2 and psi n. This leads to the question which is by projecting n onto psi 1 psi 2 and psi n which is what you are doing indirectly by projecting y onto psi 1 psi 2 psi n. Am I losing some information? Is it that some information is lost when doing this particular operation that results in you are not being able to make the optimal decision on which s i was sent. So, the key idea is that we are trying to convert it to a vector problem as we mentioned. It is now instead of a waveform problem we have converted it to a vector problem. What can we say about this n and its contribution to the vector decision problem? So, for the first thing we are going to look at is what is the distribution of n? Now if you remember we just did this particular exercise in a couple of lectures ago. If you now study the correlation between n psi 1 and n psi 2 it is actually going to be sigma square times inner product of psi 1 and psi 2. So, inner product psi k psi l which is 0 if k not equal to l and if k is equal to l inner product psi k psi k is 1. So, this n is a Gaussian random vector consisting of iid entries the variance of each component is n naught by 2. So, in other words you are going to get this vector n 1 n 2 and so on up to n n s pardon for pardon me for the bad notation because this n was for number this n was for noise, but all of these entries are iid. In other words the covariance matrix is sigma square times i so these are iid entries. So, the first thing that we are going to notice whenever you make a decision on which one was sent and so on you know these psi 1 psi 2 science capture that part of n of t into n 1 n 2 and you know and so on and these n's are orthogonal because your size are orthogonal that is the first thing and this orthogonality translates to independence. So, in other words the entries of n are iid. The next question that we are going to ask is the restriction to the signal space optimal that is what I was mentioning by restricting n of t to just look at those parts of n of t that are along psi 1 and psi 2 and psi n are we losing something are we losing our ability to make an optimal decision on which a psi was sent. The key concept is that we will say no the idea is that no we are not losing any information the part that part of n of t that is along not along psi 1 psi 2 and psi n is actually irrelevant it is not at all it has no bearing on your optimal decision to decide which psi was sent the concept can be also you know this concept is also known as the theorem of irrelevance and you know irrelevant statistics and so on and it is not very difficult to see how what you are doing is you are essentially you have y of t you do not have n of t separately you have only y of t you take y of t and project it on to psi 1 psi 2 psi n and then find out the residual signal that is you have y of t you project it on to psi 1 you project it on to psi 2 then construct the signal y 1 times psi 1 times psi 1 of t inner product y 1 psi 1 times psi 1 of t inner product y 1 sorry inner product y psi 2 and so on and then take it away from y of t and see what have what is there. So, for example over here if you do that y of t y perpendicular of t is equal to y of t minus summation j is equal to 1 to n inner product y comma psi j times psi j of t that is I am taking away that part of y that is along the psi j is to get the residual part and you know that your y is actually psi plus n. So, if you substitute it you get y of t is summation j equal to 1 to n inner product psi times psi j times psi j plus inner this should be it should be minus I think yes no. So, this should be this should be minus yes minus summation j equal to 1 to n inner product n psi j times psi j. So, because your y is s plus n this part is the component of y that is contributed by s s this part is that contributed by n this part is fully captured while this part is not fully captured. How does this reflect? Because if you now write y of t as s of t plus n of t s of t and this particular part will cancel directly because s of t is fully captured this part is actually s of t. So, if you now look at y of t minus s of t you get n of t minus inner product n comma psi j times psi j of t and let us call this n perpendicular of t that is that part of the noise that is not along the part that you observe which is along psi 1 psi 2 and psi n. So, we are saying this n perpendicular which is the part that is lost because of your projections is irrelevant. How? The irrelevance of n perpendicular of t can be ascertained by looking at the covariance of n perpendicular and each of these n and psi k's that is because these are the parts that are going to determine which s i was sent. So, let us look at that ok sorry. So, let us look at that. So, if you now just evaluate the covariance of n perpendicular and the part that is along the psi k you will find that that is 0. In other words you can just use the fact that the n perpendicular is uncorrelated with each of the components of the vector n which you get by projecting n along psi 1 psi 2 psi k and this uncorrelatedness translates to independence and this means that this n perpendicular is irrelevant. So, this detailed proof is something we are not looking at over here, but you can look at the references and get them. But the key idea is that because the part that is not along psi 1 psi 2 is independent of the parts that actually matter when you compute your matrix to decide what is sent this part is irrelevant. So, one remark that we are we want to make is that whenever we perform this inner product s comma psi i if it is real signal it is just integral if it is imagine you know if it is complex then you have a star it does not matter. This is called a correlation operation. A correlation operation is basically where you multiply two signals and evaluate the integral maybe with a shift also in this case we are just not having adding a shift that is completely fine. But because we are dealing with communication systems which are also signal processing systems in you know in internally many operations are signal processing operations it is convenient to express these in the form of a convolution that is because for example even from an implementation perspective if you implement some algorithms on a DSP convolution is often implemented in a very efficient way. So, these DSPs are designed to perform convolution in hardware and they are very fast and things like that. So, given this it is often the case that the match filtering is implemented using a convolution. So, to do this what you need to do is you have want to find actually this is this should not be r i it should be r i r i of t sorry let me just get rid of this. So, r i of tau that is yes. So, let us find this r i of tau which is a signal obtained by performing integral y of tau psi star of tau. Integral y of tau psi star of tau minus t d tau why is it psi star of tau minus t the reason is because you are essentially to perform correlation you are flipping the signal. If you flip and convolve that same as correlation right because you know if you just do this kind of let me actually just rewrite it in a proper way ok. You are essentially performing y of this is y of tau multiplied by psi i star of t plus tau d t this actually will be y convolution psi star of minus. So, this is like maybe I will just express in a correct way. So, this is equal to y of t convolution psi star of minus t that is what is happening. So, correlation can be obtained by performing flipped convolution and if you want to just evaluate this quantity you need to just substitute your t as 0 because if you substitute your t as 0 you essentially get r i of 0. So, what you typically do is you perform the convolution of the flip version of psi and sample at the 0th point. So, what you typically have is you have a bank of what are called matched filters. So, what are these matched filters essentially you write psi 0 or psi 0 conjugate depending on whether it is real complex psi 0 psi 1 psi 2 and you keep writing them up to psi m minus actually should be psi n minus 1 t sometimes I use psi 1 to psi n in this case I use psi 0 to psi n minus 1. So, in this case you are going to get now several numbers r 0 r 1 up to r 2 which are essentially the inner products with each of the basis elements. So, this essentially is going to give you a set of numbers which you can collate into a vector r 0 through r n minus 1 and this vector is all that you are going to need to make an optimal decision why because even though the part of y of t which is noise is not fully captured along psi 0 to psi n minus 1 it is irrelevant to your decision making. There is one alternate interpretation of all this matched filtering and you know irrelevance these so called matched filters collect all the necessary statistics to get the sufficient statistic from the noisy signal and throw away what is not needed. Let me give you a quick intuition as to why that is the case it is suppose that the signal you send is like something like this then what happens is that because of noise you end up getting something like this and intuitively the optimal thing to do is to average these things average these out right because if you just average these out then you can make a good decision on whether you are sending this or you are sending 0 or something like that right. What does the matched filter do? The matched filter is actually this you are multiplying and integrating which is the same as averaging. So, the matched filter actually is designed in a way to collect all the necessary information from the noisy signal to make sure that you get the sufficient statistic. So, that is the key intuition that you have to take away from this. So, this is the concept of matched filters it is provided with different perspectives like you know optimal in terms of decision making and so on. But if you want to look at it from the correlation perspective to recover your original symbol that is also the same result. Now, let us briefly look at optimal reception in the context of AWGN channels. We are now performing M-ary signaling in an AWGN channel. So, you have y is s plus si plus n and n is as we just discussed a random vector with 0 mean and identity covariance of course identity multiplied by sigma square. Now, there are two ways to find the optimal symbol sent and both have both are very related. The first is the maximum likelihood decision rule. In the maximum likelihood decision case what we are saying is why would we maximize the probability that we want to find that i such that this y was the most likely one. The second one is the minimum probability of error in which case you also take into account the prior that is suppose that the probability of sending 0 is higher and the probability of sending 1 is lower the minimum probability of error actually performs a modification for the priors. I will just try to give you an intuition as to why these rules come about. So, if you remember the joint distribution I mean your n is actually Gaussian with 0 and sigma square i as the variance which means your r vector or you know in this particular case we use the notation r your r vector is actually si plus n plus that is what we got which means your n is essentially r minus si. So, let us just check what symbols we are using there you are we are using y. Let us say y your y vector is this. So, y minus si y minus si. So, let us write the joint distribution of n. So, fn of n is 1 by root of 2 pi power n by 2 pi power let us say in this case n that is fine times sigma I think sigma into 2 pi power n I am going to ignore this part because it is a constant this is a constant anyway e power minus now instead of n I am going to write y minus si hermitian or transpose depending on real or complex for now let me write transpose cx inverse that is i upon sigma square I am just writing the 2 times y minus si. So, this is essentially what you have and this is the likelihood function also because I have substituted for n in terms of these and I want to maximize this. So, you want to maximize this particular thing is a constant that does not depend on i I want to now find which i was sent. So, what I am going to do is I am just going to maximize i is equal to 0 1 2 and so on up to say m minus 1 e power minus and because there is just identity it is y minus si y minus si it is essentially y minus si transpose y minus si that can be written as y minus si square by 2 sigma square, but here this does not depend on i. So, all I need to do is I can take log and if I take log then I get maximize over i minus norm y minus si square which is the same as saying minimize with i norm y minus si square. This translates to minimum distance decoding why because you are saying let us just assume that my si's are vectors in some space and my y is a vector find me that si that is closest to this y in terms of square distance. This means the maximum likelihood decoder is just the minimum distance decoder this is for ml. Similarly, for mpe it is not very difficult for mpe all you need to do is you need to take into account the prior probability of what was sent. So, without going into too much details I am just going to write it as some constant times e power minus again it will be norm y minus si square by 2 sigma square times pi i where pi i is the prior probability that means if you have a higher probability of sending 0 let us say that your messages are 0 and 1. Let us say the probability of sending 0 is 0.8 and the probability of sending 1 is 0.2 that is captured in pi i. So, I want to now maximize this over i which is same as maximizing I am just going to take log over here some ordinary function ignoring the constants minus norm y minus si square by 2 sigma square plus log pi i this is log with respect to log to the base e this is same as minimizing over i norm y minus si square and I am multiplying by 2 sigma square minus 2 sigma square log of pi of i. So, this is the mpe rule the minimum probability of error rule this comes about because see let us give me give you an intuition if you have a very very high like high probability of 1 b or you know 0 being sent let us say. So, even if you get something which is closer to 1 it could be because 1 was actually sent, but the noise event was higher. So, the minimum probability of error essentially takes that into account. So, I am skipping over the complete derivation, but the key intuition is that if you write the likelihood function and you want to maximize it the maximizing of the likelihood function essentially results in this particular result and if you go back to our slides it is very evident this delta ml involves finding the argmin means just find me that i. So, it finds that i that minimizes norm y minus si square this is same as minimum distance decoding another way to interpret this is you can actually expand this becomes y transpose y transpose y plus norm si square minus 2 times inner product of y comma si. So, if you now look at norm you know y transpose y that does not have i in it you take that away and subtract. So, maybe I will just do this once for you oops sorry if you look at norm y minus si square that is equal to y minus si transpose y minus si replace Hermitian for complex is equal to y transpose y plus si transpose si minus 2 inner product y si this can be done either using the vectors or using the function integration also if you want to know max if you know want to minimize over this you know you want to minimize over this right that is same as maximizing and you take this away and put a negative sign and you divide by 2 you get y comma si minus si norm si square upon 2 this is exactly what is being written over here you know and over here there are some advantages if you have constant envelop signal meaning if you are all the norm si's are the same you can just find inner product y si and find out which one was sent and you can just do a similar modification over here to get what was the best you know a minimum probability of error decision. The only difference is the minimum probability of error takes into account the priors if all pi i's are equal that is for example if you have m symbols and each of them has a probability of 1 upon m then this rule reduces to this because there is no dependence on i so the mpe and ml are the same when all the original symbols are equally probable which is the case in the most general situations if you want to think of some situation where you know you don't have equal priors typically you know because these coding is performed to have the same price for all the signals so like if you have so for example some data where there are many many zeros and few ones many many zeros and few ones typically they compress the data to have equal number of zeros and ones while they capture the original data but let us suppose that you have a situation where you know it's one only when it rains but zero and it doesn't something like that then it may be more likely to have you know days when it doesn't rain you know so you have more zeros and ones and things like that or you're capturing a signal from a sensor and it detects some you know whenever there's an earthquake earthquakes are rare so you have more zeros so things like that may happen but typically it is the data is encoded in a way so that you have equal probable symbols so this will be more common but this does have its uses whenever it arises so if you want a formal proof of this is just exactly what we did right now y is a Gaussian random variable and its mean is si and covariance matrix is sigma square i so if you write the distribution of y you p y given i is you know under the hypothesis i is 1 by 2 pi sigma square per n by 2 e power minus norm y minus si square I just expanded that norm y minus si square by applying that cx inverse formula for you the ml rule wants you to maximize the quantity with respect to i and you know since log is monotonic you can maximize with respect to the log of this function as well so you can clearly see that only y minus si matters in the case where you have mpe just add the pi i over here that accounts for the prior and your set that is basically the proof that the minimum distance decoder is optimal in the case of additive white Gaussian noise this is an important result and allows you to get a fair idea of what the so called decision reasons are so to find out which y which si was sent if you have y let us look at a very quick example if we consider on off signaling your y of t is essentially s of t plus n of t and if you were saying nothing your y of t is n of t so hypothesis 1 is that some signal was sent hypothesis 0 is that nothing was sent so in this particular case of binary signaling we are considering on off signaling so there is either something sent or nothing sent let us compute a statistic z which is inner product y, s okay it makes sense because see here what you expect is a signal of about norm s square why in the second case you expect a signal which is around 0 right because in the first case let us say you send a square wave you want a signal with amplitude around that square wave in the second case you want a signal whose amplitude is around 0 that is what you expect so the decision rule is to find the energy of the resulting signal and see whether the resulting signals energy is above or below norm s square by 2 that is the intuition right we do not know if this is correct but we can use the ML to verify it and if you look at the ML and if you perform the ML in this case it is very easy because you have only a vector y and you have to decide whether it is your basically your size are 0 or you know s of t so if you now write the ML probabilities you will easily find that probability of making an error given that one was sent was that you sent one but for some reason you decided that 0 was sent and similarly you sent 0 but you decided for some reason that 1 was sent so for example if you have the situation where you send this because of noise it essentially comes like this then you make a mistake or if you send 0 because of noise it becomes something like this then you make a mistake so these are actually in hypothesis testing these are the so-called I think type you know there are these type 1 type 2 errors it is a very similar what is the probability that you are you make a mistake that is what is the probability that you actually sent s of t but you decided that 0 was sent or what is the probability that you sent 0 and decided that s of t was sent so these are some things that we will have to see for various modulation schemes as well and we will continue this in the forthcoming lectures. Thank you.