 Welcome back to this lecture on digital communication using GNU radio. In this lecture, we are going to continue our discussion on demodulation, the preliminaries and talk a little more about jointly Gaussian random vectors. And then we will briefly discuss a little bit about hypothesis testing as well. Let us now go back to where we left off in the last lecture. We were dealing with collections of random vectors, particularly Gaussian random vectors. And we were saying that if you have a random vector x, this consists of a column vector consisting of x 1, x 2 up to x n which are random variables which are jointly Gaussian. Now, one important aspect is that I have not provided that definition here. A jointly Gaussian random vector has a special definition. It is one where any linear combination of the entries results in a Gaussian random variable. Like let us there is a very classic example of you know where a pair of random variables both of them are Gaussian, but they are not jointly Gaussian. So, for example, let us say you have sorry, let us say that you have y x is for simplicity n 0 1. So, x is a Gaussian random variable which is normally distributed, let us make it linear. So, x is distributed as normal with mean 0 variance 1, alpha is independent of x and is plus 1 or minus 1 with equal probability. So, it is like alpha is one where you toss a coin and if it is heads it is plus 1, the coin is tails it is minus 1 independent of x and you write y is equal to alpha x. Now, what does y do? Y is essentially just a you know coin flip multiplied by a Gaussian. So, if since x is normally distributed with mean 0 and variance 1, you can actually check the distribution for y. If alpha is 1, it is going to be the same as x which means it is going to be Gaussian. If alpha is minus 1, it is actually flipped version of x, but you know x is normally distributed with mean 0 and variance 1, minus x is also going to be normally distributed with mean 0 and variance 1. So, in both the cases it is very evident that y is also Gaussian with mean 0 and variance 1. But this is a very classic example where x and y are both individually Gaussian, but not jointly Gaussian. Why is that the case? Because if you look at x plus y, x plus y is not a Gaussian random variable. Why? Because x plus y takes the value 0 with probability half that is it takes the value 0 with probability half that is because whenever alpha is minus 1, y is minus x. So, x plus you know x plus minus x is 0. So, with probability half you are getting 0, you are getting a weight on a single number and as you know a continuous random variable like a Gaussian cannot take any single random any single number real number with any non-zero probability. So, this is a classic case where this is you know two individual random variables are Gaussian, but they are not jointly Gaussian. Jointly Gaussian means if suppose you have x 1, x 2, x n and a for any real numbers or you know complex numbers alpha 1, alpha 2, alpha n alpha 1, x 1 plus alpha 2, x 2 plus dot dot dot plus alpha n, x n is Gaussian. Then with any distribution of course, then these are said to be jointly Gaussian random variables. So, this is something we should really keep in mind. That is the only definition of a jointly Gaussian random variable and only such random variables have this kind of distribution that is 1 upon root of 2 pi power n determinant of Cx this by the way this mod means determinant of Cx times this is a mistake here there should be a Hermitian here and there should be no Hermitian here x minus Mx Hermitian Cx inverse x minus Mx. Now, as we discussed at the end of the last lecture independent and uncorrelated are the same for jointly Gaussian random vectors. This is something which is also very key. If you remember the example we discussed where y is alpha x, y is a basically plus x or minus x with equal probability. You can actually verify that those two are uncorrelated, but they are definitely not independent because if you know the value of x then you know that with probability how y is definitely x with other probability how y is definitely minus x they are definitely not independent, but they are uncorrelated because if you evaluate expectation of x y in that case it will turn out to be 0. So, in that sense you know independent equal uncorrelated only holds for jointly Gaussian random vectors how? So, in the case of uncorrelated random variables you have a Cx which has only diagonal entries and 0's here 0's here. Why? Because what are the off diagonal entries of the covariance matrix? The off diagonal entries of the covariance matrix are the correlations multiplied by root variances of the individual variables. Which means if you say that those random variables are uncorrelated their covariance is 0 which means your Cx is now a diagonal matrix. If your Cx is a diagonal matrix you can easily verify Cx inverse is also a diagonal matrix and if you then look at this x minus mx times diagonal matrix times x minus mx of course as a Hermitian here if you can easily verify that what will happen is that this will essentially have a for this will result in something of the form maybe I will just erase this some constant times x 1 square or rather some constant times x 1 minus mx 1 the whole square plus some constant times x 2 square or x 2 minus mx to the whole square and so on. And these so the way it will appear is that you will have e power minus something something x 1 square and something x 2 square and so on and this can actually split into e power minus x 1 square times some constant e power minus x 2 square times some constant and so on. Now you know that whenever you have a joint distribution which splits into the marginals and products you can verify that that is going to be a that is that means that that in that from all the random variables in the joint distribution are independent because you can show that the uncorrelatedness which reflects in Cx will result in the pdf's being appearing as the product of the marginals. So that is something special only for Gaussian maybe there may be some other way Gaussian definitely uncorrelated implies independence and this is definitely not the case for most other random variables you can check. Finally of course joint Gaussianness is preserved under affine transformations you can see that this is an affine transformation over here over here and that preserves the joint Gaussianness that is if you take a jointly Gaussian random vector apply an affine transformation to it then the resulting random variable is also going to be jointly Gaussian. Now again these things are going to come in handy whenever you deal with vectors like jointly detecting multiple symbols and so on. So just keep these things in mind we will refer back to these as and when the situation arises. The next thing that we have to consider is Gaussian random processes. Now one thing you must remember is that whenever we deal with these practical communication systems you have aspects like noise and you know channel and all those things typically what happens is that the random process essentially is like a varying waveform that is you have there are two pictures of a random process that is one picture is there is one particular random process one particular realization there is another realization and then there is a third realization. These are all sample path wise realizations for example let us say that you know this is like one particular path which you take another particular path which you take another particular path which you take there are multiple paths each of these can be chosen randomly. This is the picture when you want to basically fix the realization and view as a function of time. The other picture is if you fix time ok I will go back to the blue pen once again. If you fix time then it can be this red value this blue value this green value and so on that is X of t is a Gaussian random variable. So there are two pictures in the case of a Gaussian random I mean any random process or a Gaussian random process if you want to look at it as a function of time it is a realization if you fix t and want to look at it then it becomes a random variable ok. So typically this picture is you know used we say X of t is a random variable for every t belonging to R while we are silent about the exact interpolation of t you can assume that t is like time. So for every time t you have a random variable. Now in the random process how do you know what is the significance of you know our understanding and why do we have to look at these things. See whenever you deal with random processes suppose that you make an observation now one question that we may ask is is this observation useful for the next time instant. For example if it is sunny today then if it is sunny tomorrow going to if it is likely to be sunny tomorrow as well you can say that over time you know the level of let us say the sun is definitely correlated across time that is something which you can say but suppose that you are in an environment where you know today it may be sunny tomorrow all bits are off it may be sunny cloudy rainy or whatever then you can say that for this particular random process there is no real correlation among the weather on successive days. So you have these kinds of characterizations of how closely related the random variables are that is something which you will try to exploit even when designing it for our communication system. So for real numbers t1 t2 up to tn and complex numbers a1 a2 up to an now just hold on and we will tell you what t1 t2 tn are. Let us say that we observe the random process at time t1 time t2 and time tn and then we take a linear combination of those random variables you observe the time t1 time t2 and time tn that is a1 x of t1 plus a2 x of t2 and so on then this is a Gaussian random variable. This is the definition of a Gaussian random process I think you can recall that I just mentioned about jointly Gaussian random vector if you take t1 t2 tn and measure the random process at that time x t1 x t2 x tn forms a random vector and I am claiming that it is a jointly Gaussian random vector because a1 x by t1 plus a2 x t2 and so on are the summation is Gaussian and therefore the Gaussian random process is defined in this manner if you basically take the random variables that constitute this random process for various times and take any linear combination if that results in a Gaussian random variable then this process is said to be a Gaussian random process. Now the Gaussian random process is completely characterized by its mean and its autocorrelation function like if you recall we mentioned that you know jointly Gaussian random vector you just need to know its mean you just need to know its covariance matrix in a similar way the covariance matrix is transformed to autocorrelation function. So the auto the covariance matrix was useful in the random vector case because you had finite set of random vectors in this case the autocorrelation function tells you the relationship between pairs of these random variables for every pairs of times ok. Now one aspect is that you know you can you have these aspects related to x you know x of t1 and x of t2 have some correlation right what about x of t1 plus delta t and t2 plus delta t do they have the same correlation property that is if you look elsewhere do they have the same correlation properties that particular aspect is called stationarity. So if you recall a wide sense stationary process is one where the mean does not vary with time and the autocorrelation depends only on the time gap that is if you want to measure the correlation between x of t1 and x of t2 it is a covariance it depends only on t2 minus t1 that is if this holds true then you say that such a process is called a wide sense stationary process. So therefore another way of looking at it is that the autocorrelation the covariance function or the autocorrelation function in this case we may write autocorrelation because we typically say that it is 0 mean or we subtract out the mean so autocorrelation autocorvariance it does not matter. So if the autocorrelation process depends only on the gap t2 minus t1 we can always characterize the autocorrelation by calling this gap t2 minus t1 as tau and such a process is called a wide sense stationary random process it is very typical to assume that random processes are wide sense stationary when analyzing several communication systems. In the Gaussian case there is a special bonus wide sense stationarity implies strict sense stationarity because it is almost like once you have characterized the covariance ok it is like the you know the distribution like in the case of random vector Cx is known in this in a similar way the Cx or the autocorrelation which is which is basically getting the covariance for all pairwise random variables is fixed therefore the distribution itself can be fully characterized. This is something which you can look at from references but most important is Gaussian random processes have a very nice characterization any linear combination of x of t1 x of t2 x of tn is Gaussian. Now that we have our Gaussian random processes in our toolkit let us look at n of t we define n of t in this particular situation as the noise process it has zero mean and we define its power spectral density n0 by 2 as sigma square. Now power spectral density I am sure you would have seen it in the context of you know random processes but if you want to be refreshed about the definition the power spectral density is the autocorrelation of the sorry is the Fourier transform of the autocorrelation. As we discussed we are going to restrict our concentration to wide sense stationary processes. So, if you take the autocorrelation which only depends on the lag t2 minus t1 and find its autocorrelation that is the power spectral density. So, the power spectral density function is n0 by 2 which is same as sigma square that is if you look at the power spectral density I will say Sn of f the power spectral density is flat at its value here sigma n0 by 2. Now as you very well know whenever you have a random variable whenever you have a random variable which is like which is flat like this then let me just fix this a little bit. So, this is Sn of f whenever we have a random variable with a flat or not random whenever you have a Fourier transform that is flat this is n0 by 2 you know that its corresponding time domain function is delta you know basically I am using the fact that delta of t has Fourier transform 1. So, since n0 by 2 is the Fourier transform we have Rn of tau this is the autocorrelation or autocorvariance of the noise function is n0 by 2 times delta of tau we can also write it as sigma square times delta of tau. Now, one question which you may ask is why this choice n0 by 2 this n0 by 2 choice made for various reasons of course it you can go back and look at the Boltzmann constant based you know derivation and so on. But more importantly we want n0 by 2 because we will eventually move to complex baseband signaling in the context of complex baseband signaling if we choose n0 by 2 is the noise along the real axis and n0 by 2 is the noise along the imaginary axis and treat those as independent then we will eventually get n0 as the complex noise. So, that is where we are getting at for now we are just going to define the noise along this particular real axis as n0 by 2 n0 by 2 is the real noise process with 0 mean and power spectral density n0 by 2. So, that is n of t ok. So, n0 and n0 is k t0 k is the Boltzmann constant and t0 is the operating temperature you can choose it as you know 300 Kelvin or something it has to be in Kelvin remember and the Boltzmann constant is as you find in your list of physical constants. Now, there is a problem with this definition of noise that is it is like noise is flat and has a very wide spectrum and this particular noise is actually stretches actually this is the frequency axis because the Fourier transform. So, across all frequencies it keeps going this means that your noise essentially has an infinite amount of energy in it ok. So, but that does not make sense because you can actually start extracting the energy from the noise and build something called a perpetual motion machine you know which is something you may have heard about in physics you can actually try to get free energy from the noise. So, the answer to you know if you have a question as to whether that is possible the answer is no this is a noise model and what happens is that the noise effect essentially affects the frequency range within which you employ the communication system because wherever you send the signal that is where the noise is. So, to that extent within the band of interest within the frequency range of interest the noise exists and can be treated as flat that is basically the assumption we are making. So, let us not get into pathological questions of whether the noise actually has infinite energy infinite power no we will just assume that if this is the range of bandwidths you are using remember your pass band signaling and all those. So, the noise here this is the amount that affects your signaling. So, that is the summary now we will take a very brief look at hypothesis testing. So, hypothesis testing can be looked at as a framework for deciding which of M possible hypotheses best explains an observation why this is based on a statistical model. So, assume we assume that let us say that there is an observation why, but there is something called i that actually causes this. If you want to have a very simple example let us say that you send a message i ok messages can be 1 2 3 4 just send a message i and you receive y the question is from y you. So, from y how do we decide which message was sent. So, that is why we have these hypothesis hypothesis which I said message i is sent ok. Now, we have to decide how these hypotheses are tested and how we decide which of these hypothesis is likely to be true that is we have h 1 h 2 up to h m let us say we send messages from 1 to m which of these messages are sent is the question. So, probability that of h i are the Bayesian priors that are known that is in this particular case the simple way to understand is probability of h i is the probability that message i is sent. For example, some if you live in a city where let us say most of the days it is sunny and every day you make a you send a message regarding the weather then the probability that h i let us say that probability that you send the message that it is sunny is much more likely than the probability that you send the message that it is rainy or snowy or cold or something like that. So, in that sense p of h i are the Bayesian priors, but in the case of messages when you send in digital communication let us say 0s and 1s typically these 0s and 1s or 1 2 3 4 all those are equally likely and given that they are equally likely these p of h i's are going to be 1 upon m which means all of them have the same prior probability. So, this is something again that you have to keep in mind a Bayesian prior indicates what the probability that that particular event of that particular message being sent is if all messages are equally likely then that does not help you make a decision. Intuitively if you have a situation where it is more sunny most of the times then you may say even if I get a rainy answer I have to be really convinced that it is actually really rainy because most of the times he sends a sunny message what if suddenly he or she sends a rainy message these kinds of questions are what this essentially is meant to handle. So, let us go to the hypothesis testing case with a very simple example before we go into that particular example I will give you another simple test. Let us say that you have a car windshield and let us say that you count the number of drops okay and let us say that the number of drops is what you measure let us say it is something like 100005 or something like that. So, your Y is the number of drops on the car windshield and you have to decide whether it is raining or it is not raining for a minute let us assume that you are only counting the drops and not checking the clouds. So, the question which we may ask is there are two hypotheses it is raining it is not raining then based on the observation which is the number of drops which we call Y how will you conclude whether it is raining or not. In other words we have to come up with a way for example we can make a statement like if the number of drops is greater than 1000025000 then it is raining if it is less than or equal to 1000025000 it is not raining someone actually just sprayed those to you know clean the windshield or something like that in this manner we have to come up with a means or a metric to break our tie or you know like to find out which hypothesis we are going to decide on. Hypothesis testing can have errors it could be that you know you are under a tree so you do not get enough number of drops yet it is raining. So, this can lead to an error or someone essentially came with a bucket and poured all the water you counted the number of drops there are many many drops so you have greater than 1000025000 but it is not raining so you made an error. So, under various situations errors are possible your aim is to take into account all of these situations and minimize the error. So, this is like a very basic kind of example to motivate this hypothesis testing but in a more practical scenario let us restrict our consideration to the Gaussian case let us say that you have a very basic Gaussian system there are two symbols 0 or 1 but what you see is y and y manifests as if 0 is sent y manifests as a Gaussian with mean 0 and variance sigma square if 1 is sent I am sorry this should be 1 if 1 is sent y manifests as a Gaussian with mean m and variance sigma square. That is if you if you send 0 then you get a y which looks like this this is there essentially the distribution of course, it should not go below 0 sorry please do not you know we just we just redo this. So, if you send you get this if you send 1 you get this where this is m. So, this is what is happening. So, if you now let us look at this intuitively intuitively it is very evident that if you send if you receive some value which is here then most likely what was sent you know it is like most likely that message 1 was sent and this Gaussian took it far to the right it is more likely that a value close to m is taken far to the right than a value close to 0. But suppose that you get something which is you know over here let us say over here then it is more likely that a 0 was carried to this place rather than m assuming m is a positive number. So, what is the decision rule? Now, we have to decide just like my windshield example we have to decide where we can draw a line and to the left of that line we will say message 0 was sent to the right of that line we will say message 1 was sent. An intuitive rule is this is 0 this is m of course, by symmetry because the variance is the same we can decide that m by 2 is one place where you know we can draw the line and to the left we will decide that 0 was sent to the right we will decide h 1 was sent this looks like a possible way to make this decision. But so, I mean when we decide this decision rule we want to partition the observation space. So, you want to basically say anything here is h 0 anything here is h 1 and that kind of thing is what you want to say. But what is the correct statement or what is the mathematically correct statement which you know optimizes something is something that we are yet to see. So, let us briefly look at the conditional error probabilities. Errors can happen hypothesis testing that is even though you know you know you what message 1 was sent it could be that the message was carried to the left side because of the noise. So, in this particular case you know instead of m you got something over here because of the behavior of the Gaussian. So, you can have this kind of errors. So, what is the probability that h j is decided you conclude that h j is decided for some j not equal to i, but h i is true that is what is the probability that you will decide that this message 1 was sent even though message 0 was actually sent. What is the probability that you decide that message 0 was sent even though actually message 1 was sent. So, that is basically the summation of probability of y being in gamma j for j not equal to i that is let us say that you have multiple symbols you decide one of the wrong symbols given that h i was sent and now this is same as 1 minus y belonging to that is this is basically the same as 1 minus the probability that y belongs to the correct region. This is a conditional you know error probability just so you know with priors actually there is a little gets a little more tricky with priors what happens is that you have to account for the prior that is what is the probability that you know message i is sent what is the probability that may say j is sent this is the case where you know all messages are not equally likely right now we will just skip this for a minute we will focus on this particular aspect. So, if we look at conditional probabilities and maximum likelihood that is what we call as ML maximum likelihood the maximum likelihood essentially maximizes the some likelihood function that is it tries to give you something which has the most probable way of deciding. Let us look at our example what is the probability of error given 0 is sent if you look back at our plot when you send 0 you will make an error if you cross into m by 2 or to the right. Similarly, if you send 1 you will make an error if you cross from m by 2 to the if you cross m by 2 to the left. So, now coming back to that what is the probability that you will make an error let us look at this particular case. So, you will make an error if this particular Gaussian crosses m by 2 that is this integral m by 2 to infinity 1 by sigma root 2 pi e power minus x square by 2 sigma square dx. Now, again if you just do some quick normalizations and you know do the transformation to z and stuff you can easily find out that this is q of m upon 2 sigma. Similarly, if you do the same analysis for error given 1 you will make an error if what you observe is to the left of m by 2 even though 1 was sent and that you can also verify will turn out to be q of m upon 2 sigma. This is because why is n 0 sigma square under hypothesis h 0 why is n m sigma square under hypothesis h 1. So, it is very evident that for both of these you can essentially evaluate these probabilities it turns out that both of these are the same. Now, that means that since you know probability of sending 0 and probability of sending 1 are both equi-probable q of m by 2 sigma is indeed the probability of making an error. So, the maximum likelihood decision rule actually is going to say that I want you to find me arg max for all i of p of y given i which is arg max you can also take log of p of y given i. So, that is you write the pdf p of y given i and find out that particular i which is the maximum. In the case of Gaussian this is just going to turn out to be the minimum distance that is something which we will see. The other is the minimum probability of error rule which is the one with Bayesian priors. So, now if you want to minimize the probability of error then this delta minimum probability of error delta m pe of y is arg max the only difference is we do pi i times p of y i which includes the prior probability that is in this case let us say that it is you know you there is a probability that you will send 1 more often than 0 you can account for that in the form of pi 0 that is what essentially this is other than that the rule is similar. If all the pi i's are the same let us say in our case it is half and half then this particular thing can be removed because for all of these p of y given i the same multiplier exists. So, you can get rid of this. So, maximum likelihood is the same of minimum probability of error also known as map maximum a posteriori under the condition of equal priors that is when all symbols are equally likely. Now the minimum probability of error is also the same as the maximum a posteriori rule which is the map rule because it maximizes the probability that hi occurs given that y is observed. So, that is something which you can verify. The final thing which we want to just briefly dwell upon is the aspect of irrelevant statistics. Sometimes when we have y and it is complicated you know it is a complicated thing to process because it has been much of unnecessary or you know extraneous information. The question which we have is can we decompose the statistic into y 1 y 2 where only y 1 is relevant and y 2 is not relevant. Let us take a simple example let us say that our symbols are you know let us say our symbols are like a PAM system everything is real. But let us say that the noise is going to take it to some place like this there is complex noise. So, there is a real noise and imaginary noise. So, can we not decompose y into two parts. So, in my example we will just take the real part and imaginary part the imaginary part of noise is irrelevant because it does not affect your decision on what is actually sent on the real axis. In this you know in a technically speaking for m-ary hypothesis testing where you want to check which one of m hypotheses were sent. If you decompose your y into y 1 y 2 you can decompose it in various ways like break into a vector or you know subtract or add or something like that. If you find some part that is irrelevant to the conditional distribution of y 2 given y 1 then h i is independent of i that is if it is not going to aid you in deciding which hypothesis is sent then P of y 2 given y i y 1 comma i is this P of y 2 comma given y i for all i that is intuitively speaking if you have some extraneous information that is not relevant then you can ignore it. We will look at this more technically when we deal with projecting noise onto the signal space and we will show that that part that is not projected is irrelevant that is something which we will see in the next few lectures. So, to summarize what we have learned over the past few lectures practical communication systems have some kind of random effects such as noise and these necessitate optimal detection techniques. Sometimes interference inferences from meaning concluding what was sent from the received signals may be complicated and it may be easier to take them to a different space or convert them. In this situation we will use the relevant space with sufficient statistics and that is something that we will see. Hypothesis testing to obtain the maximum likelihood or minimum error decision on transmitted symbols is something that is a tool that we will use in order to recover our data and you know and also find out how much error there is while recovering the data. So, these are some aspects that we will see in the next lecture followed by implementation of these on GNU radio as well. Thank you.