 A warm welcome to the seventeenth lecture on the subject of wavelets and multirate digital signal processing, we build in this lecture a very important principle. In fact, in some senses the principle that lies at the heart of the subject of wavelets and time frequency methods namely the uncertainty principle. Therefore, as you note today we shall devote the whole lecture to a discussion of the uncertainty principle laying the foundation of what uncertainty means first and then proceeding to obtain certain numerical bounds on confinement in two domains simultaneously. Let me first give an informal or a diffused non-formal introduction to the idea of containment. Well, we did a little bit of that yesterday in the previous lecture, but what I intend to do now is to say a little more in terms of formality and then proceed to write down the mathematical relationships or definitions. Recall that we said that there is of course, a very tight or a very strong kind of notion of confinement that would ask that you have compact support in both domains time and frequency. The function must be non-zero strictly over a finite interval of the real axis and must be non-zero strictly over a finite interval of the real axis in the frequency domain as well. So, both in time and in frequency you demand that the function be non-zero only over a finite part of the independent variable or the real axis. This is a very strong demand and of course, yesterday we mentioned that it cannot be met ever and in fact, I also hinted at the idea behind the proof. It related to the fact that if you noted that the function was finitely supported, compactly supported on the real axis. There were certain properties of that function specifically the existence of an infinite number of derivatives which made it impossible for that function to be compactly supported or non-zero only on a finite interval of the independent variable in the natural domain. Natural domain can mean time, can mean space, whatever. Anyway, this was what we called the strong version of containment and we said that this of course was not possible, but we had asked whether a weaker notion of containment could be admitted. Namely, we do not insist that the function be strictly non-zero over a finite interval, but that most of its energy, most of its content so to speak in some sense be on a finite interval of the independent variable which indexes it and simultaneously in the transform domain, in the frequency domain, we insist that most of the content be in a finite interval of the frequency axis. This seems like a more reasonable requirement and to a certain extent this requirement can be met and as I said to give a diffused or a non-formal presentation of how it can be met, I shall begin this whole discussion by saying that we are finally going to come out with certain bounds on how much you can contain in the two domains simultaneously. So, there are several steps to reach this destination. The first step is to put down in a non-diffused in a formal way what you mean by containment, what you mean by most of the content being in a certain finite range and we had also hinted at the approach that we would take to do this briefly in the previous lecture. We had said that there are two ways of looking at it. You could think of the magnitude squared of the function and the magnitude squared of the Fourier transform as a one-dimensional object and then you could talk about the center of that object, center of mass if you like. You could talk about the spread of the object around the center of mass by using the notion of radius of gyration or if you prefer to speak in the language of probability densities, then you could employ the idea of a density built from the squared magnitude of the function and another density built from the squared magnitude of the Fourier transform. You could then look at the mean of these densities and the variance of these densities and the variances are indicative of the spread. So, this was a non-formal introduction. Now we need to formalize it and that is what we shall do precisely to begin with. Put down a formal definition, a formal explanation of the idea of spread. Now of course, we have to define the domain in which we are going to work. We are going to work in L to R. We have agreed to that. It is always going to be the space of square integrable functions. In fact, I must mention that sometimes we are actually going to work in the intersection of the space of square integrable functions and absolutely integrable functions. To be on the safe side, let us put down that requirement right now and let us put down the tightest of the requirements namely that the function belong to the intersection of these two. So, the context consider a function let us say x of t which belongs to the intersection of L to R and L 1 R which means it is both square integrable and absolutely integrable. I think we should note that. Now because the function belongs to L to R, we assure that it is Fourier transform also belongs to L to R. So, let us x t have the Fourier transform x cap capital omega and we know that capital X, I mean x cap of capital omega belongs to L to R as well. So, we first define a density or a one dimensional mass if you would like to call it that. We know that both x t and x cap omega are square integrable and therefore, if we take the magnitude squared of x t and the magnitude squared of x cap omega, they would enclose a finite area under them. In fact, the two areas would be essentially the same, but for a factor of 2 pi. Again if we chose to do away with angular frequency and used Hertz frequency that 2 pi factor would also go away. Anyway what we are saying is mod x t squared integrated from minus to plus infinity is finite. Let us in fact use the standard notation for this the norm of x in L to R the whole squared and therefore, define a density p x given by mod x t squared divided by the norm against squared a few remarks and in fact we should write them down one by one p x t as we have defined it namely mod x t squared by the norm in L to R of x the whole squared is a probability density. Why do we say this? Well because of the following reasons let us list them one by one. Number one p x t is greater than equal to 0 for all t, it is a density in t of course. So, you may think of t as a random variable and this is the density on that. The integral over all t of p x t is easily seen to be one from the definition. Essentially the integral of p x t over all t is equal to 0. In the numerator again have the L to norm of the function x and of course, the denominator is indeed the L to norm of the function x of course, both squared the numerator and denominator and therefore, they would cancel out to give 1. Similarly, let us define a density in the Fourier domain in the angular frequency domain and there we shall write p x cap as a function of omega to be mod x cap omega squared divided by the norm of x cap squared. Here again we are assured of the denominator being finite because of the L to R business. So, again we shall for completeness and formalism note that this is a probability density p x cap omega is also a probability density. Indeed p x cap omega is greater than equal to 0 for all omega. It is a density in omega and the integral over all omega from minus to plus infinity of p x cap is 1 that is also easy to see by very definition. Now, we have taken the probability density perspective, but we could as well take the so called one dimensional mass perspective and let me also note that one could also take a one dimensional mass perspective that is we could think of p x t as a 1 d mass. So, we could think in t and similarly you could think of p x cap omega as a one dimensional mass in omega. So, what I am saying is all of us or all the objects around us are masses in three dimensional space. So, here we have a simplified situation you have a mass in one dimensional space that one dimensional space can be the space of t or the space of capital omega. Similarly, I said p x cap omega is a one dimensional mass. Now, once you take the mass perspective, then immediately you have the notion of centre of mass, centre of gravity if you like to call it that and when you take a probability density perspective you have the notion of mean either of them is very equivalent. So, let us make a note of that. If we choose the mass perspective consider the centre of mass and spread around the centre. Now, incidentally as I said the spread around the centre in mechanics is often measured by a quantity called the radius of gyration. If we happen to take the density perspective consider the mean and the variance. Now, we must of course, assume that these quantities can be calculated and we shall do that. It is possible that the variance be infinity that is a subtle point. So, we are not always guaranteed of finite variance and in fact, that is not a contradiction to what we have been saying so far. We are trying to find a lower limit to where the quantities go in the two domains simultaneously. So, of course, if the variance happens to be infinite which it will actually in some situations. We shall simply say that is the worst possible case that we can encounter anyway. So, let us consider given this function x t the we will prefer to take the probability density perspective. So, we will think of p x of t and p x cap of omega as probability densities and we will then write down their means. So, indeed let p x t have the mean t 0. What would that mean? t 0 would then be integral t times p x t over all t from minus to plus infinity simple definition of mean. You will of course, recognize the same definition to hold good for the center of mass here. It is a simple definition of mean. You will of course, recognize the same definition to hold good for the center of mass here. Essentially, you are calculating the moment by choosing the fulcrum to be 0 and therefore, getting a different fulcrum or a point at which the moments are all balanced. Similarly, let p x cap omega have the mean omega naught where upon omega naught would be integral from minus to plus infinity omega p x cap omega d omega again the center of mass. If you like to look at it that way in the frequency domain now once we have the mean and of course, just as a tongue in cheek statement we assume the means are finite normally they should be in some pathological situations we may have a problem we are not looking at those pathological situations. So, assuming these means are finite let us look at the variances. So, the variance in t would then be given by and we will define it to be sigma t squared by definition this would be t minus t 0 the frequency domain whole squared times p x t integrated over all t and similarly, we could talk about the variance in frequency. So, variance in angular frequency sigma omega squared is integral from minus to plus infinity omega minus omega naught the whole squared p x cap omega d omega. Once again tongue in cheek we are assuming these variances to be finite in any case here we do not have such a problem even if the variances are infinite we will accept it we will say that is the extreme and the worst case. So, whatever they be finite or infinite we accept. Now, it is very clear you see if you look at a probability density or perhaps if you choose to think of these as one dimensional masses it is very clear that the variance is an indication of the spread. So, the larger the variance the more the density is said to have spread around the mean the smaller the variance the more that density or that mass is said to be concentrated. So, now we have a formal way to define containment in fact, we shall now make a very simple definition. We will say containment in that particular domain refers to the variance or if you like the square root of the variance positive square root of course. So, let us put down this statement formally. We will say containment in a given domain refers to the variance in that domain so containment in time is essentially the sigma t squared quantity and containment in angular frequency is essentially the sigma omega squared quantity. Now, we ask ourselves how small can we make any one of these quantities for a function for a valid function and in a few minutes we will be convinced that there is really no limit. In fact, let us take the hard scaling function as an example. Let us calculate the variance you see the hard scaling function phi of t is 1 between 0 and 1 and 0 else and then of course, it is very easy to write down the density here. It is very easy to see that the square norm in L 2 r of phi is 1. It is essentially the integral phi t mod square d t overall t. It is easily seen to be 1 and therefore, very luckily p phi t looks very much like phi t. This is how p phi t looks our job is easy. Let us find the mean. In fact, even before I formally set out to find the mean I can estimate the mean graphically. The mean is going to be in the center at half that is obvious, but let us do it formally. So, t 0 would be integral t p phi t d t overall t and this essentially amounts to integral t d t from 0 to 1. So, I have replaced p phi t by 1 and I have replaced the limits by 0 to 1 and this is obviously t square by 2 evaluated from 0 to 1 which indeed is nothing but half as we expected. So, the mean is indeed half. Now we need to calculate the variance. That is a little more work, but not too much variance. Indeed, the variance would be given as t minus half the whole square times the whole p phi t d t overall t. Once again, noting that p phi t is 1 only between 0 and 1 and 0 else we can rewrite this to get t minus half the whole square d t and if I care just to replace t minus half by another variable lambda I would get this to be well when t is 0 lambda would take the value minus half. When t is 1 lambda would take the value half and this would be lambda square d lambda to be integrated and then we have an easy expression. So, that is half cubed by 3 minus of minus half cube by 3 and therefore, you have 2 times half cubed by 3 which is 2 by 3 into 1 by 8 or 1 by 12. So, this is the variance. Now of course, sigma t squared is 1 by 12 and therefore, you may take sigma t to be square root the positive square root of 1 by 12 which is 1 by 2 square root 3. As you can see sigma t is less than half. So, in a certain sense we do not really use the number half to denote the spread of phi t around its mean. The variance does not say it goes all the way to half it says the spread is a number slightly less than half. Most of the energy is contained in that region around the mean captured by the variance. In fact, if you wish to be very specific the fraction of the energy contained here would be it essentially would be the integral of the density between t 0 minus sigma t to t 0 plus sigma t. So, that would essentially be now I do not really intend to calculate this quantity for this case. It is a very simple calculation of course, integrated with respect to t, but what I am trying to emphasize is that we are not asking for 100 percent here. We are not saying tell me the region over which 100 percent of the energy lies that region could very well be the whole real axis. We are saying well at least a significant part of it. Now, in this particular case may be it is a good idea to actually calculate it how much is this real. So, it is actually and this is essentially so, it is t evaluated from half minus. So, this is easily seen to be two times the variance 1 by 2 under root 3 which is 1 by square root of 3. Certainly not a very large fraction like 90 percent, but it is about 1 by 1.7 more than 50 percent anyway. Incidentally this fraction is not going to be the same for different functions it depends on the density, but what we are trying to say is that the variance is one accepted measure of spread and very often the variance actually tells us where most of the function is concentrated. Even in the case of this function if you look at it carefully what we are saying is quite a bit of that function is contained between half minus 1 by 2 square root 3 and half plus 1 by 2 square root 3. So, it is not an unreasonable range that we choose. Now we ask about the variance in frequency of the same function and there we are going to have a very pleasant or unpleasant surprise. So, let us look at phi cap omega. In fact we are not so interested in phi cap omega we are interested in phi cap omega mod squared and that has the form essentially sin omega by 2 by omega by 2 the whole squared mod and of course you could integrate this you know indeed as you know the integral of phi cap omega mod squared d omega divided by 2 pi would essentially be the norm of phi L 2 r norm of phi the whole squared which is easily seen to be 1 and therefore integral phi cap omega the whole squared d omega overall omega of course is essentially 2 pi. Therefore, we essentially look at the quantity mod phi cap p phi cap rather which is essentially of the form phi cap omega the whole squared divided by 2 pi and let me sketch this. In fact we are familiar with it it would have an appearance like this this is 0 this is 2 pi 4 pi and so on so forth we know this we have been doing this more than once. Now it is very easy to see what the mean of this function is the function is symmetric around omega equal to 0 and therefore, the mean is 0 by the way this is not a surprise for many real functions we would find the mean. In fact for all real functions the mean of the density on the frequency axis is going to be 0 the Fourier transform for real function is magnitude symmetric and therefore, it is not surprising that for a real function the mean as understood in this sense is always going to be 0 on the frequency axis let us make a note of it it is a very important conclusion for real functions x cap omega is magnitude symmetric. Therefore, the mean is 0 now comes the variance and here we have a very unpleasant surprise waiting for us I see unpleasant because maybe we should have had something better. So, the variance of phi cap would be calculated as follows integral overall omega omega minus omega naught the whole squared p phi cap omega d omega and if we make the required substitutions we have this is essentially omega square times sin omega by 2 divided by omega by 2 the whole squared divided by 2 pi here d omega and here we are in serious trouble this is integral minus 2 plus infinity 1 by 2 pi sin squared omega by 2 times 4. So, 4 goes up there d omega and this is not important it is just a constant, but this is trouble we are in serious trouble here. In fact, let me sketch what we are trying to integrate here we are trying to integrate this function a periodic function with a period of 2 pi sin squared omega by 2 serious trouble as I said we are trying to integrate a periodic function from minus to plus infinity and obviously that integral is going to diverge. So, the fear that we had when we started on our discussion with variances comes out to be true right in the very simplest case of a scaling function that we know the variance of phi cap is infinite. In other words phi t is not at all confined in the frequency domain at least in this sense. Now, all this while in our discussion when we talked about time and frequency together and so on in the previous lecture we had been worried about these side lobes as we call them. We said well it is all right to look at the main lobe and talk about presence in the main lobe, but then we have these side lobes and the side lobes are falling off only by the factor 1 by omega in magnitude and as you can see the side lobes have created a problem. After multiplication by omega squared in the calculation of variance the side lobes create a periodic function to be integrated a periodic non-negative function and we are in trouble. So, this tells us again why we have to go much beyond the Haar. We have been asking again and again why we cannot be content with the Haar multi resolution analysis. Now, we have one more formal answer. If you look at the scaling function for the Haar multi resolution analysis its variance in the frequency domain is infinite. It is not at all confined in the frequency domain in this sense. Now, it is a natural question to ask what is it that made this variance infinite? Why did we have a divergent variance here? In fact, we can answer that question if we only care to make a slight adjustment of the expression for variance. The variance of phi cap is finally, as you can see given by omega squared p phi cap omega d omega and this can be written as omega squared times phi cap omega mod squared d omega divided by p phi cap omega d omega the norm of phi cap in L 2 R the whole squared. Now, this norm is a number it can be brought out of the integral. So, I can rewrite this as 1 by norm of phi cap squared integral and I will also do a little bit of rearrangement in the integral. I will write the integral as j omega phi cap omega the mod whole squared d omega. Notice that if I take the modulus of j omega phi cap omega it is essentially modulus omega squared times modulus phi cap omega squared and modulus omega squared and omega squared are the same for omega real. But then when we write it like this, this has a meaning it is essentially the Fourier transform of d phi t d t Fourier transform of the derivative of phi. So, essentially what we are saying is this variance is actually the energy in the derivative of course, remember you would have a factor of 2 pi there because this is the energy in the derivative, but for a factor of 2 pi. So, this would be 2 pi times the energy in the derivative divided by 2 pi times the energy in the function and please note that this inference that we have made is of what function we consider. As long as the function is real, the variance in frequency is going to be this ratio the energy in the derivative divided by the energy in the function. Let us make that remark for real functions for real x t the frequency variance omega variance or sigma x cap squared is essentially energy in d x t d t divided by the energy in x or the L 2 norm squared of the derivative of x divided by the L 2 norm of x and now we have the answer why we ran into a problem for phi t. As you can see phi t is discontinuous. So, when its derivative is considered there are impulses in the derivative and impulse is not square integrable and therefore, the numerator of this quantity diverges if you look at it from that perspective. The moment we have a discontinuous function, we have an infinite frequency variance and there we are with this note then we realize that if we want to get some meaningful uncertainty some meaningful bound we must at least consider continuous functions and we shall proceed to build on this concept further in the next lecture. Thank you.