 Today, we shall begin with the second lecture on the subject of wavelets and multirate digital signal processing in which our objective would be to introduce the higher multiresolution analysis about which we had very briefly talked in the previous lecture. Before I go on to the analytical and mathematical details of the Haar multiresolution analysis or MRA as it is called for short. Let me once again review the idea behind the Haar form of analysis of functions. Recall that Haar was a mathematician or mathematician scientist if you would like to call in that and the very radical idea that he gave was that one could think of continuous functions in terms of discontinuous ones and do so to the limit of reaching any degree of continuity that you desire. What I mean is start from a very discontinuous function and then make it smoother and smoother all the while adding discontinuous functions until you go arbitrarily close to the continuous function that you are trying to approximate. This is the central idea in the Haar way of representing functions. We also briefly discussed why this was something important. It seemed like something silly to do at first glance but actually is very important and the reason why it is important as we mentioned was if you think about digitally communicating say for example, an audio piece you are doing exactly that. The beautiful smooth audio pattern is being converted into a highly discontinuous stream of bits. What I mean by discontinuous is when you transmit that stream of bits on a communication channel you are in fact introducing discontinuities every time a bit changes. So, after every bit interval there is a change of waveform and therefore, discontinuity at some level even if not in the function in its derivative or in its second derivative whatever be whatever it is. The idea of representing continuous functions in terms of discontinuous ones has its place in practical communication and therefore, what Haar did was something very useful to us today. What we are going to do today is to build up the idea of wavelets. In fact, more specifically what are called dyadic wavelets starting from the Haar wavelet and to do that let us first consider how we represent a picture on a screen and I am going to show that schematically in the drawing here. So, you see let us assume that this is the picture boundary and I am trying to represent this picture on a screen whatever that picture might be. So, just for the sake of drawing let me draw some kind of a pattern there. Let us say you have a tree and some person standing there I mean forgive my drawing, whatever it is in some grass may be here. Now, this is inherently a continuous picture. How do I represent it on the computer? I divide this entire area into very small sub areas. So, I visualize this being divided into tiny what are called picture elements or pixels. So, each small area here is a pixel a picture element so to speak. There are for example, suppose I make 512 divisions on the vertical and 512 divisions on the horizontal. I say that I have a 512 cross 512 image that many pixels and in each pixel region I represent the image by a constant. So, the first thing to understand is there is a piece wise constant representation. Let us write that down. There is a piece wise constant representation of the image. One constant for each piece and that piece is the pixel or the picture element suppose I increase the resolution. So, I go from a resolution of 512. So, I take the same what I mean is I take the same picture. In this case I make a division 512 cross 512. In this case I make a division 1024 cross 1024. Now, obviously the pixel area here let us say the pixel area here is p2 and the pixel area here is p1. It is very easy to see that p2 is one fourth of p1 and therefore, I have reduced the area by a factor of 4. Naturally if I use a constant to represent the value or the intensity of the picture on each pixel here and this do the same here. What you see in this picture is going to be closer to the original picture in some sense and what you see here. So, in other words we can capture this by saying the smaller the pixel area the larger the resolution. Now, this is the beginning of the Haar multi resolution analysis. The more we reduce the pixel area the closer we are going to go to the original image. Even though this captures the idea that we are trying to build it is not quite the idea of the Haar MRA. The Haar MRA does something deeper and that is what I am now going to explain mathematically in some depth. Now, here I gave the example of a two dimensional situation which apparently is more difficult than one dimensional, but it is easier for us to understand physical. We can more easily relate to the idea of a piecewise constant representation in the context of images or pictures, but the same thing could be true of audio for example. So, you could visualize a situation though seemingly more unnatural where you record an audio piece by dividing the time over which the audio is recorded into small segments. Now, let me show that pictorially it would be easier to understand. So, suppose for example, you had this wave form here. So, the one dimensional version. So, suppose I have this is the time axis and I have this wave form here. Assume that this is the audio wave form audio voltage recording. Let us without any loss of generality assume that this is the 0 point in time. So, let time be represented by t and let this be the 0 point in time. Now, let me assume that I divide this time axis into small intervals of size t here. So, this point is t, this point is 2 t and so on and I make a piecewise constant approximation. That means I represent the audio voltage in each of these regions of size t by one number. Now, what is the most obvious number or what are the set of most obvious numbers that one can use to represent this wave form in each of these time intervals? For example, in this time interval of that matter in any of the time intervals, it makes sense to take the area under the curve and divide by the time interval to get the average of the wave form in that time interval and use that as the number to represent the function. So, here for example, you can visualize that the average will lie somewhere here and just showing it in dotted. So, average. So, intuitively it makes sense to represent the voltage wave form in each of these intervals of size t by the average of that wave form in that interval. Is that right? Let us write that down mathematically. So, what we are saying, if you have a function x of time, a good piecewise constant representation is the following. Over the interval of t, over the interval from say 0 to t, now you know strictly it is the open interval between 0 and t. The representation would be integrate x of t dt from 0 to t and divide by t, the average. Now, of course, on any particular interval of t, the same holds. So, we say that on every interval of t, on any particular interval of t of size t, the average would be obtained by 1 by t integral over that interval of t. When you write it like this, you mean that particular interval of t, integral of x t with respect to small t. This is a piecewise constant representation of the function on that interval of size t. Now, the same thing could be done for an interval of size t by 2. So, over an interval of size t by 2, you would similarly have 1 by t by 2 integral over that interval of length t by 2 x t dt. Now, we are going closer to the idea of wavelets, a particular interval of size t. In fact, again without any loss of generality, let us choose the interval from 0 to t and divide it into two subintervals of size t by 2. So, what I mean is, take this interval of size t 0 to t, I am expanding it. So, you have this function here over that interval. Divide this into two subintervals of size t by 2. First, take the piecewise constant approximation on the entire interval of t and I will show that by a dot and dash line. You can visualize the average would be somewhere here. So, this is the average on the entire interval 0 to t. Now, I take the subintervals of size t by 2. So, I have this subintervals of size t by 2. I use a dash and cross to write down the average there. So, I have dash and cross, dash and cross here. You can visualize that in this subinterval, the average would be somewhere here. Similarly, in this subinterval, you could write down an average something like this. Now, let us give this a name. Let us call this average a on t. Let us call this average a 1 on t by 2 and let us call this average a 1 on an interval of size t by 2 and let us write down the expressions for each of these averages. What are the expressions? a t is obviously 1 by t integral x of t d t from 0 to t. a 1 t by 2 is 1 by t by 2 integral from 0 to t by 2 is 1 by t by 2 integral from 0 to t x t d t. Similarly, a 2 t by 2 is 1 by t by 2 integral from t by 2 to t x t d t. For convenience, let me flash all the three expressions before you once again. a t is the average over the entire interval of t, a 1 t by 2, the average over the first interval of t by 2 with this expression and a 2 t by 2, the average from t by 2 to t, the second subinterval of size t by 2 with this expression. And just to get our ideas straight, here again is the picture. Now, the key idea in the Haar multi resolution analysis is to try and relate these three terms. So, to relate a t, a 1 t by 2 and a 2 t by 2 and it is in that relationship that the Haar wavelet is hidden. So, what is the relationship? Now, the relationship is very simple. I mean all that we need to do is to notice that we have divided the integral from 0 to t into 2 integrals over 0 to t by 2 and t by 2 to t and then remember there is a slight difference in the constants associated. So, we have a constant of 1 by t in a t and a constant of 1 by t by 2 in a 1 t by 2 and in a 2 t by 2, whereupon we have this very simple relationship between the three quantities. a t is half, I leave it to you to verify, it is half of a 1 t by 2 plus a 2 t by 2 and how do we interpret this? Let me try and kind of focus just on this relationship. In other words, let us just focus on these three constants and make a drawing there. So, what we are saying is something like this, I have this a t there, I have this a 1 t by 2 here and I have this a 2 t by 2 there and we are saying this plus this by 2 gives you this. In other words, this is as much higher above a t as this is lower. What we are saying is these two heights are the same, that is what this relationship implies. Now, another way of saying it is, if I were to make a piecewise constant approximation on intervals of size t, how would they look? So, let me just sketch them. So, I take this function once again here, I have this function here, I have divided it to sizes of intervals of size t, let me show just two intervals for the moment. So, this is how the function would look when you make a piecewise constant approximation on intervals of size t and when you do it on intervals of size t by 2, it would look like this, something like this. This is a function, so let me highlight it, now let me darken it. This is in its own right a function, a piecewise constant function, one which I have darkened here and this is in its own right, the darkened part is in its own right and approximation to the original function here. Similarly, let me now darken this and put some other mark on it, let us keep the crosses, so I will darken this, but I will put crosses on it. This is also another function, so darkened cross function is another function, that is in its own right and approximation to. So, let us give them names, let us call this function just the dark one as f 1 t and let us call this function, the one which we have shown with dark and cross as f 2 t. f 2 t minus f 1 t is like additional information, what we are saying is instead of a piecewise constant approximation on an interval of size t, when we try and make a piecewise constant approximation on intervals of size t by 2, we are bringing in something more, go back to the original case of the picture. We have inherently underlying a continuous two dimensional picture, a continuous two dimensional scene, when we make an approximation with a 512 cross 512 resolution, then we have actually brought in one level of detail. When we go to a 1024 cross 1024 cross 1024 representation, the level of detail is 4 times more, what is the additional detail that we have got in going from 512 cross 512 to 1024 cross 1024, in effect when we take this difference f 2 t minus f 1 t, we are answering that question. So, now let us see how f 2 t minus f 1 t would look, it is very easy to see that f 2 t minus f 1 t has an appearance like this. Let me flash them before you f 2 t and f 1 t just for a second here, so that you get a feel. This is f 2 and this is f 1 and visualize subtracting this from this, what would you get? A function that looks something like this. So, I have the time axis here, so if I mark the intervals of size t there, something like this may be this has height h 1 and this has height h 2. Let me mark h 1 and h 2 on this diagram 2, so this is h 1 and this is h 2 of course, so is this simple enough. Now, if you look carefully, we can construct all of this by using just one function and what is that function? Suppose I were to visualize a function like this, 1 over the interval from 0 to 1 half and minus 1 over the next half interval, this is the point half and this is the point 1, point 0, 1 here and minus 1 and let us give this function a name, let us call it psi of t. In fact, this is indeed what is called the Haar wavelet, Haar again the name of the mathematician. It is very easy to see that using this function, I can construct any such f 2 t minus f 1 t. Indeed, if I were to take this function, stretch it or compress whatever might be the case depending on the value of capital T, dilate, dilate is the more general word. So, if I were to dilate this function to occupy an interval of t and bring it to this particular interval of t. So, I dilate that function psi t and bring it to this interval of t and then I multiply psi t, so dilated by the constant h 1. Of course, h 1 should be is an algebraic constant, it should be given a sign. Here for example, h 1 should be given a negative value because we have started psi t with a positive minus 1 here. Similarly, h 2 has a positive value here. So, in other words, this segment of f 2 t minus f 1 t is of the following form. Some h 1 times psi of f 1 t is of t dilated by t plus h 2 times psi of t minus t dilated by t. So, here this is both dilated and translated. In other words, in general, when we start from the function psi of t, we are constructing functions of the form psi t minus tau by s, where of course, s is a positive real number and tau is real. This is the general function that we are using as a building block different values of tau and different values of s. Of course, here at a particular resolution at a particular level of detail, the value of s is only 1. For example, when we are representing the function on intervals of size t, we take s equal to t. If we were to represent the function on intervals of size t by 2, then s would become t by 2 and so on. Then what we are doing in effect is dilating and translating. Now, we introduce those terms. Tau is called a translation index or translation variable and s is called a dilation index or dilation variable and we are dilating and translating or we are constructing dilates and translates of a basic function. Dilates and translates of size t capture the additional information in f 2 t minus f 1. Let us spend a minute in reflecting about why this is so important. What have we done so far? Just looks like very simple functional analysis or just a very simple transformation or algebra of functions. What is so striking in what we have just said? What is striking is that what we have done to go from t to t by 2 can also be done to go from t by 2 to t by 4. Not only that, what we have done to go from t to t by 2. In other words, for intervals of length t to intervals of length t by 2 all over the time axis can be done all over the time axis to go from intervals of size t by 2 to intervals of size t by 4 and then you could go from intervals of size t by 4 to intervals of size t by 8, t by 16, t by 32, t by 64 and what have you to as small and interval as you desire. Each time what you add in terms of information is going to get captured by these dilates and translates of the single function psi t. A very serious statement if we think about it deeply enough that one single function psi t is allows you to bring in resolution step by step to any level of detail. In fact, in formal language in functional analysis we would put it something like this. You know in in mathematics in these arguments of limits and continuity and so on or in some of these proofs related to convergence there is this notion of the adversary and the defendant. So, here the defendant is trying to show the one who makes the proposition is trying to show that by this process you can go arbitrarily close to a continuous function as close as you desire. Now, as close in what sense well it could be in terms of what is called the mean squared error or the squared error. So, let us formulate that adversary proponent kind of argument here. So, what we are saying is the proponent says we can go arbitrarily close to x t to a continuous function x t by this mechanism arbitrarily close in what sense in the sense if x a is the approximation approximation at a particular resolution and if x t is the original function then if we take what is called the squared error. So, we look at x e t that is x t minus x a t and integrate x e t the whole squared the modulus whole squared actually over all t we call this the squared error script e then the adversary or opponent says bring e to the small value let say e 0 and the proponent says certainly here is the m such that t by 2 raise to the power of m is ok. So, that is the idea of proponent and opponent here the adversary or the opponent gives you a target he says I want the squared error to be less than this number e 0 and the proponent says well here you are if you make that interval of size t by 2 raise the power of m low and behold your error is going to be less than or equal to e 0 and what is striking in this whole discussion is no matter how small we make that e 0 the proponent is always able to come out with an m such that t by 2 raise the power of m I mean piecewise constant approximation on intervals of size t raise the power of t by 2 raise the power of m would give you an approximation close enough for that small e 0 we need to spend a minute or two to to reflect on this it is a serious thing we are saying in fact let us for a moment think on how this is dual to the idea of a representation of a function in terms of its Fourier series for example. In the Fourier series representation what do we do we say give me a periodic function of that matter give me a function on a certain interval of time let us say an interval of t size t if I simply periodically extend that function that means I take this basic function on the interval of t I repeat it on every such interval of t translated from the original interval. So, suppose that original interval is 0 to t then repeat whatever is between 0 and t between t and 2 t between minus t and 0 between minus 2 t and minus t between 2 t and 3 t and go on doing this. So, you have a periodic function decompose that periodic function into its Fourier series representation. So, what am I doing in effect I have a sum of sinusoids sine waves all of whose frequencies are multiples of the fundamental frequency what is that fundamental frequency in angular frequency terms it is 2 pi by t in Hertz terms it is 1 by t. So, in Hertz terms you have sin waves with frequencies which are all multiples of 1 by t and an appropriate set of amplitudes and phases assigned to these different sinusoidal components with frequencies of multiples of 1 by t when added together would go arbitrarily close to the original function of course the original periodic function on the entire real axis or for that matter specifically on the interval from 0 to t if you restrict yourself to a function from where you start it. So, not only does the Fourier series allow you to represent by using the tool of continuous functions analytic functions remember we talked about sine waves in the previous lecture sine waves are the most continuous in some sense the smoothest functions that you can think of the derivative of a sine wave is a sine wave the integral of a sine wave is a sine wave when you add to sine waves of the same frequency they give you back a sine wave of the same frequency. So, sine waves are the smoothest functions that you could deal with and even if you had a somewhat discontinuous function on the interval from 0 to t and if you use this mechanism of Fourier series decomposition you would land up expressing a discontinuous function in terms of extremely smooth analytic functions what would you be doing in this Haar approach that we discussed a few minutes ago exactly the dual. Even if you had this continuous audio pattern you would decompose it into highly discontinuous functions which are piecewise constant constant on intervals of size t at the resolution t on intervals of size t by 2 at the resolution t by 2 and so on and so forth. Now, just as in the Fourier series representation you have this proponent-oponent kind of argument that is for a reasonably good class of functions even if they are discontinuous even if they have a lot of non analytic points and so on for a reasonably wide class of functions remember in the Fourier series that wide class of functions is captured by what are by what are called the Dirichlet conditions I would not go into those details here, but there are certain kinds of conditions very very mild conditions which a function needs to obey before it can be decomposed into the Fourier series or in other words before the Fourier series can do this job of representing that discontinuous function in terms of highly continuous and analytic smooth functions. So, similar set of conditions express it does exist even for the Haar case I mean if one really wishes to be finicky one does need to restrict oneself to a certain subclass of function, but again that restriction is not really serious in most physical situations for the time being in this course we may even just ignore that restriction all that we ask for and that is not too unreasonable is that the function has finite energy. So, let us at least put that down mathematically what we are saying is we shall focus on functions with finite energy and what does energy mean energy is essentially the integral of the modulus square. So, if I have a function x of t the energy in x t is the integral mod x t squared over all t and this needs to be finite all that we are saying is this incidentally this quantity has a name in the mathematical literature of that matter even in the literature on wavelengths. The energy as we call it in signal processing is called the L 2 norm by mathematicians and you know it helps to introduce terminology little by little from the beginning because if one happens to pick up literature on wavelengths these terms would be used. So, let us introduce that notation slowly. So, we say the L 2 norm of x is essentially mod x t square d t integrated over all t and to be very precise this needs to be raised to the power half. Similarly, one can talk of an LP norm. So, you could talk about an LP norm of x and that would correspondingly be mod x t to the power p. So, you could talk about an LP norm d t integrated on all time and raise the power 1 by p and of course, p here is a real number. So, for any real positive in fact real and positive. So, you could talk about an L 1 norm you could talk about an L 2 norm you could talk about an L infinity norm. What would an L infinity norm be? Let us take some examples what would an L 1 norm be? It would essentially be integral mod x t, the L 2 norm we already know. What would the L infinity norm be? That is interesting. So, you see in principle it would be something like this, but what an earth is this? What do we mean by this? You see as p becomes larger and larger what are we doing? We are emphasizing those values of x t which are larger. So, for a larger value of p, we are emphasizing those values of mod x t which are larger and as p tends to a larger and larger and larger value as p tends to infinity we are in some sense highlighting that part of x t which is the largest. So, in other words the L infinity norm of x essentially would correspond to the maximum or the supremum you know the very largest value that x t can attain all over the real axis. So, it has a meaning even as p tends to infinity. Well anyway this was just to introduce some notation which we are going to find useful and what we are saying in this language is that we are going to focus on functions which belong now here you know we are going to start talking about functions that belong to a space. We say you know we say the space L 2 what is the space L 2? L 2 over the real axis space of functions and it is a space of functions whose L 2 norm is finite simple. Similarly, you could have the space L p the space L p is the set of all functions whose L p norm is finite. Now, the word space is used with an intent you see space really means that if I take a linear combination of functions in that set it gets back to a function in that set. So, if I take any finite linear combination of functions in a space L p the resultant is also in that space in that set L p and that is why we call it a space. So, L p all the L p's for any particular p are spaces linear spaces they are closed under the operation of linear combination. So, in other words we are saying let us focus our attention on the space L 2. Now, what we have said in the Haar analysis that we talked about a few minutes ago is that if you take any function in the space L 2 I mean if your adversary picks up any function in the space L 2 and puts before you a value E 0 saying please give me an m. So, that when I make a piecewise constant approximation on intervals of size t by 2 raise to the power of m my error squared error is less than E 0 the proponent is able to do. So, the proponent is able to come up with an m which gives this answer and this could be done no matter how small the E 0 is the proponent will always come out with a suitable m. That is the idea of what is called closure you know. So, what we are saying is when we do an analysis using the Haar wavelet in other words when we start from a certain piecewise constant approximation on intervals of size let us say 1 for example and then bring it to intervals of size half one fourth one eighth one sixteenth as small as you desire you can in principle go as close in the sense of L 2 norm that means if I look at the L 2 norm of the error between the function and its approximation that L 2 norm of the error can be brought down as much as you desire and in that sense whatever the Fourier series was doing after all what does the Fourier series do it allows you to bring the L 2 norm of the error between the function and its Fourier series as small as you desire for a reasonable class of functions. For a wide class of functions give me the epsilon give me the E 0 and I will give you a certain number of terms that you must include in the Fourier series. So, the adversary says well here is an E 0 for you the proponent says include so many terms in the Fourier series and you can bring your error down as low as you desire the same kind of thing is happening here the proponent adversary principle. Now, this is a deep issue that one function psi t is able to take you as close as you desire to the function that you want to approximate and by the way this is only one psi t which can do it the whole subject of wavelets allows you to build up many such psi t's here we had a good physical a very simple physical explanation. We started from piecewise constant approximation we said well when you want to refine your piecewise constant approximation you could do it by using the Haar wavelet and this you could do to go from any resolution to the next resolution. Please remember here we are increasing the resolution or improving the amount of information contained by factors of 2 each time and that is why we use the term dyadic let me write down that term dyadic. So, what we introduced in this lecture is a notion of a dyadic wavelet and dyadic refers to powers of 2 steps of 2 every time the Haar wavelet is an example of a dyadic wavelet and in fact for quite some time in this course we are going to focus on dyadic wavelets. Dyadic wavelets are the best studied they are the best and most easily designed they are the best and most easily implemented and I dare say the best understood. So, for quite some time in this course we shall be focusing on the dyadic wavelet the Haar is the beginning. I mentioned in the previous lecture that if one understands the Haar wavelet and if one understands the way in which the Haar multi resolution analysis is constructed many concepts of multi resolution analysis would become clear. What we intend to do now after this in subsequent lectures is to bring this out explicitly. So, let me give you a brief exposition of what we intend to do in subsequent lectures and then we shall go down to doing it mathematically step by step. You see we brought out the idea of the Haar wavelet explicitly here what is the Haar wavelet we know we know what function it is and we know that dilates and translates this function can capture information in going from one resolution to the next level of resolution in the steps of two each time. Now, how is this expressed in the language of spaces after all we talked about the space L 2 R L 2 R is the space of square integrable functions. So, how can we express this in terms of approximation of that whole space? So, can we express this in terms of going from one subspace of L 2 R to the next subspace and in that case can we express this Haar wavelet or the functions constructed by the Haar wavelet and its translates and perhaps also dilates in terms of adding more and more to the subspaces to go from a coarser subspace all the way up to L 2 R on one side and all the way down to a trivial subspace on the other. So, we are going to introduce this idea of formalizing the notion of multi resolution analysis. We need to think of what is called a ladder of subspaces in going from a coarse subspace to finer and finer subspaces until you reach L 2 R at one end and coarser and coarser and coarser subspaces until you reach the trivial subspace at the other end. Further, we are going to see that the Haar wavelet and its translates at a particular resolution at a particular power of two sort of speak actually relates to the basis of these subspaces. So, we are going to bring out the idea of basis of these subspaces and how the Haar wavelet captures what is called the difference subspace. In fact, the orthogonal complement to be more formal and precise, simple but beautiful and what we do for the Haar will also apply to many other such kinds of wavelets. Let us then carry out this discussion in more detail in the next lecture where we shall formalize whatever we have studied today for the Haar wavelet by putting down the subspaces that lead us towards L 2 R at one end and towards the trivial subspace at the other. Thank you.