 A warm welcome to the 38th lecture on the subject of wavelets and multirate digital signal processing. As promised in the previous lecture, we shall use this session to discuss two applications of wavelets and time frequency methods in great depth. In fact, the two students based on whose application assignments this lecture is constructed are going to discuss what they only introduced very briefly in the previous lecture. They had kind of given a trailer to their presentations in the previous session in which they had just explained the essence of the application that they had done. In this lecture, they shall be explaining the details of the application and also pointing to some of the results that they have obtained. I must mention that these are two students who have actually used these two lectures to learn the subject, use the lectures over the semester to learn the subject and have undertaken to explore two applications of wavelets all over the semester out of a batch of about 15 to 18 students. The assignments that are going to be presented today have been found to be some of the best in the class and therefore, in a sense it is also an appreciation of the excellent work that these two groups have done that they have been invited to record their presentations today. In a broader sense, this is also to encourage whenever people use these lectures outside that students should be involved in exploring applications and we hope that the hard work and the very intelligent efforts put in by these two students, these two groups actually of students would inspire many other students who listen to this lecture to explore several other applications of wavelets many as they are. Anyway, with that little introduction, let me put before you the two applications that are going to be presented in depth today. You have had a trailer of them, but I would like to put them in the broader perspective of the subject. So, in the lectures today, we are first going to look at the application on data mining. Kunal Kunalsha is going to present the application on data mining on behalf of his group of two students namely, he himself Kunalsha and Orko Choudhury. Now, data mining is a generalization of representation. So, in data mining Kunal is going to show us how one can use the properties of wavelets in efficient representation to advantage in retrieval and other such applications from a database. The second application which is going to be discussed today is face recognition. Now, you know there are also going to be certain differences between detection and recognition and so on and the issues I do not want to take away the thunder as I said from the person making the presentation. But face recognition is an important and increasingly important application in many security systems and other image processing or vision systems. So, both of these applications are of great importance in the modern world and we shall now without much ado invite these two speakers, young students speakers to present the work that they have done over the semester and to put before you both the concept and the result of what they have done. I shall first invite Kunalsha to make the presentation based on the work done by his group of two students Kunalsha and Orko Choudhury. Thank you. Hello friends, today I would like to talk on the various applications of wavelets in data mining as Sir said data mining is used for efficient representation of data. So, today I would like to speak on that so, the problem statement is given a time series data the time series data could be huge our aim is to improve the efficiency of multilevel surprise and trend queries from the time series data. Now, first of all what is the meaning of surprise and trend queries? Now, when we have a long time series data generally we do not encounter point queries. For example, if we record the temperature of a particular city for a year say we never ask what was the temperature at this date of this month. We always ask the trend how was the change in the temperature during the month such queries are called as trend queries. One more type of query which we encounter is surprise query was there any sudden change in the temperature in a particular city of this month. So, such type of queries we have to handle and wavelets efficiently handle such type of queries that we have to look. Now, what is the meaning of multilevel? Yes, such queries are generally encountered at various level of abstractions. It could be a month, it could be a year, it could be a decade. So, if someone asks what is the change in the temperature during a particular month we should be able to answer that easily and if we ask what is the particular temperature during a decade sudden change or average in a decade that also we should answer. So, decade, year, month shows various levels of abstractions. So, we have to improve the efficiency of such type of queries. So, first of all how such huge data is represented. So, suppose x till day is my whole data. So, I represent that data in the form of a matrix say of size m cross n. Suppose I store the stock prices of a particular company or say many companies. So, let n be the total number of stock prices that I store for a particular company. Say if I store for one year then n will be 365. Let m be the number of companies of which I am storing the data. So, each row of this matrix represents the stock prices of that particular company. This is how my whole data is represented. Now, what I need to do? I need to do three things. First of all I need to efficiently store this data. How can I efficiently store this huge amount of data? Say for example, if I have data of say one decade, so n will be equivalent to one decade and m will be say 100 companies or 200 companies. So, efficient storage is very important. Secondly, how can I retrieve the data efficiently? And thirdly, if I want to add or modify certain things in the data, how can I easily modify it? These are the three things. If these three things could be done efficiently then we can using wavelets, then it will be a very good tool to represent this data. So, first of all let us look at one of the methods which are already used in data mining, namely singular value decomposition method. What is done in this method is I have a data say x tilde. I represent it in matrix form u, which is a column orthogonal matrix of size m cross r. This is a diagonal matrix of size r cross r and this is again a row orthogonal matrix of size r cross n, where r is the rank of matrix x tilde. So, I am representing this data in terms of three matrices. Now, instead of storing the whole data, I am storing these three matrices. So, if r is small, I am already saving under storage capacity of my data, which I need to store m cross n matrix. Now, I am storing just three matrices m cross r, r cross r and r cross n of which one of them is a diagonal matrix. Now, I have u diagonal matrix and v. Suppose I want to extract the data of a particular company, say a particular row of this matrix x tilde, let that be represented by x. So, to extract a particular row that means to extract the data of a particular company, the complexity required is of the order of size of v because multiplying these two, since this is a diagonal matrix, multiplying these two, I will be able to extract a row of a particular matrix. So, to extract a particular row, the complexity required is of the size of v, but the size of v is r cross n and n is huge in our application, say a data of a decade. So, still the complexity is huge. So, the complexity of reconstruction in singular value decomposition is huge. Now, one can argue here, let us reverse the order instead of storing it as m cross n, why not store n cross m? Because m is not that huge, but the problem here is, now if you want to extract a particular company's data, you need to extract the whole column rather than a row. The complexity of extraction of a row is of the order of size of v not of the column. So, again if you want to extract column, your size is equivalent to the complexity of v. So, you are not improving your efficiency by reversing the order. Further, one more disadvantage in this method is, if I want to modify the data, I need to recompute all the three matrices again, which is not the case with wavelets as we will see. I would just like to warn the audience here that this does not imply that singular value decomposition method is not a good method. In fact, it is used extensively, but in this case, since our n is huge and since we require frequent updation, wavelet has a upper hand over this method. Now, what is wavelet? How are we going to store the data using wavelets? Subspace is one row of that whole matrix extended. I am decomposing it into two subspaces, approximate subspace and a detailed subspace. The approximate subspace is also decomposed into two, one more a second level of decomposition, approximate subspace and a detailed subspace. This is called as a TSA tree. This is trend surprise abstraction tree. This stores all the trend data and this stores all the surprise data. So, wavelets naturally store decompose the data into two forms trend and surprise. You do not need to extract something, because wavelets naturally can decompose any data into trend and surprise form. This is the biggest catch here in this type of application. Now, how does this splitting and if I want to merge this merging take place? This split operation is very easy. If I have a data AX, AX, I, I just pass through a low pass filter and a high pass filter down sample by 2 and I get the next level of subspace. So, this is a split operation. This is generally done in the in the course. So, I did not emphasize on this. Similarly, the merge operation is just the reverse of this. Up sample by 2 pass through the same filters and then add. So, you go one level higher. So, this is the merge operation. The split and merge operations are easy, but the properties that this tree hold are very important. First of all, we can get perfect reconstruction. So, now what I am doing here is I am not storing this. Rather than this, I am storing this approximation subspace and detailed subspace. So, I am storing this data rather than storing the signal and now I am saying I can perfectly reconstruct this original signal from this data and this is very important. Perfect reconstruction is very important. If it is not so, then if in case I have a point query, say what is the temperature during this day, how will I get using that approximation and detailed subspace? So, perfect reconstruction is very important, which is the case with wavelets. This we have already studied. Again, the power complementary property is also very important because on decomposition, we are still preserving the power and this is also done. The third is very important. As we move down the tree, the size of each node, these are called as nodes is decreasing. So, if the size of this signal is say n, this will be n by 2, n by 2 and so on. This will be n by 4, n by 4, n by 8, n by 8. So, size is continuously decreasing. So, that is very important that we will see how. Secondly, these nodes are called as leaf nodes and we will see there that instead of storing all the nodes, it is appropriate to store only the leaf nodes. Now, why this since the size is decreasing? Why this point is very important? Because here, we are assuming that the split and merge operations do not incur any cost. They incur negligible cost. Whatever cost is incurred is to extract the data or cost is due to the disk IO operation. So, the amount of data that you are extracting is actually incurring cost. So, cost is directly proportional to the size of the data and bingo, our size is decreasing. If I want a 8 level decomposition, what I need is just the approximation and detail subspace of the 8 level and the size of that is very small as compared to n. So, the cost incurred is very less. Just require a small post processing and you can reach x. For example, if I want a third level decomposition, I just need to extract a data equal to n by 8 and n by 8. I can perfectly reconstruct here, I can reach here and by that I can reach here x. So, I just have a cost equal to one-eighth, the cost which I require by extracting the whole signal. Now, as I pointed out, the leaf nodes are sufficient to give us the trend as well as the surprise details. How? Suppose, I require a trend at the third level. So, what I do is I just extract this data and without using this, I do not perform perfect reconstruction. I reach this data by upsampling and passing through a low pass filter. Again without using this, I upsample and pass through a low pass filter and reach at this level and in this way I reach at this level. So, using a data equal to size n by 8, I am reaching at this level without perfect reconstruction and I am getting the trend at the AX3 level of decomposition. If I want the surprise data, I go from here, go to AX2, go to AX1 and AX. Just here, I need to pass through a high pass filter instead of a low pass filter. This is the synthesis branch, merging operation. Now, if I want at this level, then first I need to extract these two both and I need to perform perfect reconstruction at this level and then I can go do approximation. So, this is how I with small amount of post-processing of split and merge operation, I can find out the trend and surprise queries. Now, the optimal TSA tree is that tree which stores only the leaf nodes. Now, when I store only the leaf nodes, it is very easy to see that the total size of the leaf nodes is equal to the size of the original signal. So, I am not increasing my size by storing only the leaf nodes. I am not storing the whole tree. I am still getting all the information by using only the leaf nodes. So, my optimal TSA tree is that tree which incurs a minimum cost and a minimum storage and by storing the leaf nodes, I am incurring the minimum cost as well as the minimum storage. Further, if I introduce one more node which is not a leaf node, I will improve my performance, but that is also not difficult to prove that the performance increase is just marginal. So, it is no point to store any other node except for the leaf node. Once we have seen that even retrieval is cost efficient and even storage is cost efficient, can we improve more on the storage by reducing further on the leaf nodes. If I remove some of the leaf nodes, can I still get a almost accurate results? Yes, wavelets are very good at compression. You compress your leaf nodes and still you can efficiently store that data and improve your accuracy. I would not say improve your accuracy, but you are not compromising more on the accuracy. So, one of the methods is node dropping and in node dropping, we are exploiting a very important property of wavelets which is the orthogonality property. What does that say? Suppose, I drop one of the leaf nodes, say d x 3 and I reconstruct the whole signal using the other leaf nodes and let that reconstructed signal be x cap. Now, if I find the norm, norm square in the error where x is the original signal and x cap is the reconstructed signal by removing one of the nodes, then because of the orthogonality property, norm square is equal to the norm of the node which we have removed. So, the error is actually the error in the coefficients which we have already removed. So, s is the number of nodes which we have removed. This is true only because of the orthogonality property of wavelets which is not the case always. So, if this is true, then we can use a greedy algorithm to determine which are the nodes significant in the data. So, what I calculate for each node, each leaf node, I calculate the norm square of that node and divide by the size of the node and if I calculate the maximum of all and this is the most significant node which I find. So, I store that node, then again I store the second best node, again I store the third best node and I reach a particular stage and if I find that I have already completed my disk space, then I remove the remaining nodes. So, this method is called as node dropping, but here there is a slight disadvantage in this method is that if the node which I have removed contains some outliers or some important information, then that is lost which is not the case with coefficient dropping which we shall see. In coefficient dropping, I store all the leaf nodes as a sequence and each coefficient whichever are the significant coefficients, I store the whole significant coefficients rather than the nodes and since I am storing coefficients, I need to store the index of that coefficient also as to this belongs to which node. Since now for every coefficient I am using double the memory, storing the coefficient as well as storing the index, I have to just check how many coefficients I have used of a particular node. If I am using a memory greater than that node size, then it is better to store the whole node rather than storing the coefficients. If I am not storing any of the coefficients of that node, I will remove the whole node and if this size is less than the size of that node, then I will store thus these coefficients and this is called as a coefficient dropping method or hybrid coefficient dropping method. Now, let us see the results what we have got here. So, I will just switch over to the slides to see the results. Here these are the group members Kunal Shah, myself and Orko Chaudhary. These are the details of the trend analysis. This is the first graph is the original data. It is taken from Yahoo stock market and it is the data of SBI, State Bank of India of the two years. So, it has around 700 coefficients. These are the trends or the decomposition at various levels. The second graph is a two day decomposition, the third graph the four day, fourth graph the eighth day and so on. And as you see as you are going down, the averaging is increased. So, these are the trend analysis. If I see the last level and if you ask me what is the average over a particular eight level data, then this is the eight level data average which you are seeing. It would be more appropriate to show to interpret it on surprise analysis. If you see on the surprise analysis, here the first graph shows the there is a small outlier in the original data which is more prominent in the second day average in the second day surprise as you can see. But as you go on increasing your decomposition level, you can interpret the data this way. The surface which was looking prominent in two day is not looking as prominent as you increase the number of days. It is averaging out. In fact, you might see that by inspection from the original data you could very well observe that surprise. But as you go down the level that surface is actually averaged out. So, if you look at the last level, there are many surprises which you can see in the previous data which by inspection you cannot encounter only by decomposition you can encounter that here also there was a surprise component. Now, this is the note dropping method. The above signal is the original signal and this is the recovered signal after note dropping and we can see that it is almost the same. But the disadvantage here is that we have missed that outlier which is seen that impulse sort of which is seen in we have missed in the note dropping and we have removed 300 coefficients in this out of the 700 coefficients and this is the result. So, we have removed almost 40 percent of the coefficients retained only 60 percent. But in coefficient dropping method as you can see that outlier is retained even though we have removed 300 coefficients. In the trend analysis, we have used Dobash wavelet and we see that as the as we move above the family the only thing that is changing is the averaging is more as you increase the filter length. So, as you increase the improve go on increasing from dot 2 to dot 4 to dot 8 the averaging is increased. This was all about application of wavelets in data mining. The reference paper is the following is as seen on this slide. I have just read this paper and interpreted or as a student what I can interpret from this paper. So, my observations my interpretations from this paper I have just shown in this application of assignment and this is the reference which I have taken for stock prices of SBI from Yahoo stock. I hope you enjoyed the application assignment. So, that was a very interesting presentation from Kunal Shah on behalf of Kunal Shah and Orko Choudhury. The group which worked on the application of wavelets specifically they have looked at the application of the Dobash family in representing databases efficiently. You will notice that the idea of course representation and incremental information was given a different meaning in this context. So, the idea of course representation gave what is called the approximation information which he called AX and the so called incremental information gave you the surprises the novel information at every scale. Now, one must also take note of the difference between the node dropping approach and the coefficient dropping approach. In the node dropping approach one noticed that with the same level of compression some important information was lost a sudden spike was not seen so clearly. When coefficient dropping approaches were used then one could see that that was retained rather well. So, in addition to the transform that is used how one uses the data obtained from the transform is equally important in wavelet based representation that is what Kunal's presentation has amply demonstrated. Now, taking further the presentations of students on what they have done for their applications we have the second presentation which was very briefly introduced in the previous lecture namely that on face recognition. Without taking away from his presentation I shall now invite Ronak, Ronak Shah to make his presentation here based on the application of wavelets in face recognition. So, Ronak for you now. As professor Gadre mentioned the second application that we are going to look into today is on face recognition through wave packet analysis. So, Kunal mentioned an application in data mining and it was in one dimension. So, the data that he has was in one dimension. When we move from data mining to face recognition the data that we have is in two dimension. As I mentioned in the last lecture there are two keywords here one is face recognition and another is wave packet analysis. Before going into details I would like to briefly summarize what these keywords mean. Start with wave packet analysis, why we need wave packet analysis for the task of face recognition. As I mentioned in the last lecture we need some kind of decorrelation in spatial as well as in frequency domain for the task of classification. As we know the task of classification can be better accomplished if we can do some kind of decorrelation in spatial as well as in frequency domain. So, in that way wavelet analysis naturally feeds into the scheme of things. But then why we go for wave packet analysis. Again as I mentioned in the last lecture in wave packet analysis we decompose not only approximation subspaces, but we also decompose detailed subspaces as we have seen in lectures. Now when we do the task of classification we need richer representation of the underlying signal that is we should not miss anything out from the signal from the underlying signal and the underlying signal here is face that we have seen. So, face images will be our signals. So, the task of face recognition can be can be accomplished by wavelets as well as wave packet analysis. But as I mentioned we use wave packet analysis just for the richer representation and wave packet analysis also provides decorrelation in spatial as well as in frequency domain. So, that is better suited. Now before going into nitty gritties and details I would like to first provide references. So, the work that I am going to present here is based on the work done by Ghashriya, Zikos and Ziritas. It was a research paper in European conference on computer vision and the work nicely by utilizing wavelet based framework for face recognition. Now the work that I am going to present has been motivated by their work. This is an extension as well as a novel description of the work done by them. So, here we utilize a different database namely the Yale database and the work provided by Ghashriya, Zikos and Ziritas was based on different database like ferret database as well as face is database. So, here we do not only provide extension to different database, but here the understanding is purely from a point of view of a student. How a student can look into this application and how this particular framework based on wavelet, it fits into the whole framework of face recognition. Like what are the approaches that have been there in the literature and how that fit into how this approach fits into a particular framework. There are some other references like the second reference here is based on Eigen phases and this is quite a popular approach. Also, there are some other references like the approach based on PCA principle component analysis. This framework has also been popular. The reference that I mentioned the first reference compares this PCA based approach and Eigen phases based approach to wavelet based approach. The fifth approach here that is mentioned like by Zao, Chalapa, Philips and Rosenfeld. It provides a nice framework or a nice literature survey on the overall work that is done in the task of face recognition. So, this is for references. Now, we shall move into like why face recognition is required. As I mentioned in the last class, face recognition can be required for biometric authentication, but as I mentioned the face like the signal that is obtained from face namely the image a 2D image. It can easily be morphed like someone can grow moustache or someone can grow beard in a given period of time and that can easily be morphed or someone can come up with sunglasses even. There are some other signals which are available like retina or let us say for that case fingerprints. They cannot easily be morphed and they can easily be utilized for biometric authentication. So, the task of face recognition for biometric authentication framework is limited. However, we need face recognition for the task of surveillance. In surveillance, one needs some area or region to be manned and some generally in this kind of region only a limited number of persons are allowed. So, the access is allowed. Also, one might be interested in what that subject is doing inside that particular region. Also, we have some learned prototypes like which kind of activities are allowed. Someone is stealing inside that particular region or someone is doing some kind of abnormal activity inside that region. We want to detect that as well and generally the queue for this kind of framework starts with face recognition. So, under the task of surveillance, we may do activity tracking as well as recognition and if we can do this faithfully, then we can also provide abnormalities. Not only in this task, but also in tasks such as automatic character recognition in let us say movie clips. If there are some broken movie clips or let us say some parts of the clips are available on let us say for example YouTube, then we can also perform automatic actor recognition which actors are present in which scenes and we can also do effective archival based on actors. So, in YouTube when a large database of videos and images are available, we can perform this archival task faithfully. So, we can classify some of the SOPs like soaps and serials based on what are the characters that are present inside those scenes. So, this is for the task of face recognition, why we require face recognition and we see the task of face recognition is crucial. So, accuracy should also be crucial for that case. Now, the approach that I am going to mention falls under the falls under the approach of feature based approaches. So, there are two approaches that are basically used for the task of face recognition. One is geometric approach, another one is feature based approach. In geometric approach, one goes on detecting the basic features of let us say nose and eyes, cheeks, chins, etcetera and generates features based on this extracted features of nose, eyes and cheeks and chins. But the problem here is these features are difficult to extract because what we have in effect is just a 2D image. So, why not to look at face image as just a 2D image and we can represent in some other features rather than extracting eyes, nose and other features etcetera. So, the other feature that I am the other approach that I am going to talk about is a feature based approach and the approach that I am going to mention falls under the category of feature based approach. The block diagram if you look at of the approach that I am going to mention looks similar to this. This is for prototype learning. So, given an image or video, one may detect faces. So, this is a face detection algorithm, but this is not part of this discussion because as I mentioned in the last lecture, these two things can be decoupled easily because in face detection one looks for the features which are very similar across all the faces. So, that we can faithfully detect what are the faces that are available in the image or let us say for example, video. Whereas in case of face recognition, we look at features which are distinctive across faces from faces. So, this can be decoupled and hence we are not going to talk about face detection here. What we are going to talk about is face recognition. So, after extracting a region of interest in which faces are available, we can perform subband decomposition. And here we are going to perform subband decomposition by using wave packet analysis. After performing subband decomposition, we go for feature extraction. Now, this feature extraction is a crucial job because we cannot involve each and everything to be inside a feature vector that I am going to discuss next. We can also perform feature normalization because we do not want one feature vector for each image rather we want feature vector per class. So, if we have let us say 6 to 8 images per one subject, then we want a compact feature vector for each subject and not for each image. So, feature normalization can also be performed. This will be basically give us learned prototypes. And this learned prototypes can be stored in memory and can be accessed in future when a query image comes to us. This is the block diagram of matching. So, if we have learned prototypes with us in which feature vectors of different classes are stored, then we can perform matching. So, given an image like this, a query image, we can first do face detection and then we can basically apply the same algorithm. We will end up getting a feature vector and we have learned prototypes that we can access from memory and we can utilize some distance metric or some similarity to come up with different matches. So, based on application, we can give output as one image or multiple images. Now, there can be two kind of applications for face recognition. One may be based on content based image retrieval. When one image is given as input in content based image retrieval framework, we need all the images that are there in the database to be extracted which match to the query image. And in another application, we can perform a simple traditional classification in which an image is given and we just want to know to which class it belongs to. In that case, only one match is required. Now, how do we perform decomposition? Like in case of wavelet analysis like this application, like so this slide provides how we can perform decomposition of a 2D image. This differs slightly from a 1D signal because now the image is 2D here. So, this is a 2D signal. So, we need to perform filtering in rows as well as in columns and down sampling will also be performed in rows as well as in columns. So, first we perform filtering across rows and we do down sampling across rows. After getting images here by performing filtering as well as down sampling across rows, we can move to columns. We can perform high pass filtering and low pass filtering and down sampling across columns and we can get 4 sub images. So, this LL is the approximation subspace here because it is passed through by 2 low pass filters and other filters like other sub images required to be passed from high pass filters. So, they contain high frequencies in effect, but this differs from wave packet analysis. In wavelet analysis, we move from first level to second level of decomposition by only decomposing approximation subspaces. Whereas, in case of wave packet decomposition, we decompose detailed subspaces as well. So, if this is the first level of decomposition, this is the approximation subspace and these are detailed subspaces, then we do not only decompose approximation subspace, but we also decompose detailed subspaces. So, we will end up getting 16 subspaces here in which one is the approximation subspace and other 15 are detailed subspaces. So, this approximation subspace will contain low frequencies as desired and this detailed subspaces will only contain high frequencies in this order. So, this last detailed subspace will contain the most high frequencies. So, we get some kind of decomposition in spatial as well as in frequency domain and this also provides decorrelation. So, now we can generate feature vectors, but before going into that, we need to look into which kind of filters we have utilized for this purpose. Now, the important advantage of using this wave packet application is we can utilize any filters here. When we go for phase recognition, we do not want synthesis or perfect reconstruction to be done. We only want our analysis filters to be good, so that they can provide decorrelation in spatial as well as in frequency domain. So, here without worrying about orthogonality property or any other property, we can generate those filters which provide very good feature vectors for our classification task. So, in practice one can go for empirically searching filters which are base suited for this application of phase recognition and here are the features which were found to be empirically suited to the task of phase recognition and these are the impulse responses of low pass filter and high pass filters that are used for the task of phase recognition. These filters are discrete in nature, but for the ease of visualization they are connected like the two intermediate samples are connected by straight lines. So, here we can see the impulse responses of low pass and high pass filters. One might also worry about how they look in frequency domain and this slide basically shows their magnitude responses. So, this corresponds to crudely a low pass filter and this corresponds to a high pass filter. So, the filters that we have used, they do not, they are not so much different from what we have seen in the course. They are basically low pass filters and high pass filters, but these magnitude responses were found to be better suited to phase recognition task. So, when we talk about first level of decomposition, this high pass filter and low pass filters were found to be better, but we are decomposing this to the second level because after second level generally the images that are found for phase recognition, they are quite small in nature. So, as we go on decomposing these images, they become smaller and smaller. So, it becomes really tedious after second level of decomposition to handle such small images and also the localization that we obtained in time domain becomes really crude. So, they are not of much importance, they do not yield much information. So, we go up to a second level of decomposition. When we look at this view packet analysis, in second level of decomposition we obtain four filters and I am going to present the magnitude responses of these four filters. So, when we decompose an approximation subspace, we obtain magnitude responses of this low pass filter and this band pass filter. Here we can see that this is crudely a low pass filter again, but this cutoff is quite stringent now and here this is a band pass filter, but this mostly emphasizes the low frequencies which are available because we are decomposing approximation subspace. Also when we decompose a detailed subspace as is required in wave packet analysis, we end up getting these two filters. Again one is a high pass filter and one is a band pass filter, but this band pass filter emphasizes high frequencies rather than low frequencies as we saw in the previous slide. Now this filters are okay when we talk about one dimensional signal, but here we are talking about two dimensional signal, but the important characteristics of this filters are they are separable in nature. So, as you apply in one dimension, you can apply in second dimension and we end up getting 16 filters rather than four filters as in one dimension case. But the ease of visualization in one dimension helps us to talk about one dimension and I will not talk about two dimension impulse responses and magnitude responses because they are just natural extensions as well as they are difficult to visualize in nature rather than one D signal. So, given a Lena image, let us say for example, we can end up getting this kind of subspaces where this is the approximation subspace. We can easily visualize this because this contain low frequencies and the information here is more compared to other detailed subspaces. These are seven other detailed subspaces important thing to note here is this detailed subspaces and approximation subspaces are quite small in nature. But just for ease of visualization, they have been zoomed into also like this detailed subspaces are bipolar in nature like they contain negative as well as positive values. So, just for visualization, they have been taken as absolute the values are absolute here. The other subspaces are also detailed subspaces and we can visualize here. So, those were it and these are again eight other detailed subspaces. So, these are all 16 subspaces that we are going to obtain. Here we can see that some of the features like eyes are very clear in the high frequency subspaces and other subspaces contain the overall shape of the face image. Now, after getting this subspaces 16 subspaces namely we can go on to do feature vector extraction. Now, the trouble with feature extraction see here the important thing is to extract those features which are relevant to face images. One might say that all the features all the pixel values that are there in 16 sub images are important to us and go on to concurrent all the values that are available to us. But the problem doing this is this feature vector will be quite long and if you want to do a classification task this becomes increasingly difficult because if you have a long feature vector we better have long number of images large number of images then and then we can go on to do a better classification. So, rather than going into this we can provide a compact representation of the faces by providing one way of doing this is to go for moments. So, first order moment or second order moment we can go for mean and variance is basically. So, we have 16 sub images we can go for mean and variance in each some image. So, we can extract a 32 dimensional feature vector but here if we look at a detail subspaces then they have 0 mean. So, we do not need to capture that. So, we can basically go on to represent 17 dimensional feature vector for each face image. But if we can see at approximation and detail subspaces they are different in nature in approximation subspace we have more information. So, it caters for handling approximation subspace differently for rather than detail subspaces. So, if you look at this approximation subspace what we can do is we can bound it by two boxes. In one box we can basically go for a high dimensional features like what is the exact shape of the face or what are other fringe information that we have in that bounding box and we can go on to cater for mean and variance for that bounding box. We can take another bounding box in which other information related to shape like face of eyes and nose are related. So, we can extract mean and variance in that bounding box. So, we can extract four features from approximation subspace and we can extract variance is in each detail sub image and we can extract 15 dimensional feature vector from here. So, four dimensional feature vector from approximation subspace and 15 dimensional feature vector from a detail subspaces. We can extract 19 dimensional feature vector for each image. Now, given a task of classification we can go on to extract 19 dimensional feature vector for each face image and in effect we can do some feature normalization. So, that we can end up getting one feature vector for each class and after that we can compare the feature vectors that we have in database with the feature vector of the query image and we can do some kind of we can utilize some kind of distance metric. But the important thing here to note that is we have utilized mean as well as variance. So, we have information related to probability density functions of the underlying feature vectors. Now, if we utilize Euclidean distance as a feature vector as a distance metric, then the trouble is Euclidean distance assumes all the dimensions to be independent of each other. So, it is it does not cater for the probability density functions that we have. Rather than we use Bhattacharya distance that takes care of probability density functions that we have. It basically goes on for mean as well as variance. So, we have two different entities here one for mean as well as one for variance. So, if you have the same probability density function mean will be similar as well as variances will be similar. So, both these terms will vanish and we will get a zero distance at the exact match. Also, Bhattacharya distance caters for all the requirements needed for it to be a proper distance metric. So, Bhattacharya distance can be utilized, but there is nothing magical about Bhattacharya distance. Any other distance metric which caters for PDFs underlying PDFs for feature vectors can be utilized for this purpose. Now, I will quickly move towards the results that we that I have obtained by utilizing the yield database. Yield database is basically available for free download for non-commercial use from this link. And these are the typical phases that are available in yield database for one subject. Yield database basically contains 165 images for 15 classes and 11 images per class. This is just the typical images that are available in yield database. I will just provide the typical archival result that I have obtained from the CBR interface for phase recognition task. So, given an input like this a query image with a surprise expression over the phase of the subject, we could retrieve images similar to the subject like the subject is similar in all the retrieved images, but expressions are different like one is winking for example, the other is with a happy look on the phase etc. So, this is the CBR interface, but in order to quantitatively evaluate the algorithm, we need to provide quantitative results. And this can be provided by traditional classification task. So, these are the experimental results. Three experiments were performed based on different number of images utilized per class for learning. In first experiment, four images per class were utilized for learning. In second experiment, six images were utilized. And in third experiment, eight images were utilized per class. So, in total in first experiment, we had 40 images for learning. In second experiment, we had 60 images for learning. And third experiment, we had 80 images per learning. All the 80 images were utilized as query images. And the fifth column shows the accuracy that we have obtained. In first experiment, 66 images match to the native class. In second experiment, 64 images and third experiment, 64 images again match to the native class. So, the accuracy is around 82, 82.5 percent to 80 percent within the accuracy of 1.25 percent. Now, before going before dwelling too much on results, I would like to remark on some of the things like this experimental results can be enhanced like this 80 to 82 percent accuracy can be enhanced to 90 to 95 percent of accuracy if one goes for machine learning techniques like for example, support vector machines. Or one can probably utilize more number of moments like kurtosis or third order moments etcetera. And this accuracy can be enhanced. With this remark, I would like to conclude this application. Thank you very much. That was a very beautiful presentation on the application of wavelets specifically wave packets in face recognition. I would like to emphasize a few of the points that were made in this presentation. One was the distinction between face detection and face recognition. In fact, in a sense, wavelets for the matter wave packets are suited both to the detection and the recognition problem. In detection, one looks for commonality. What constitutes a general face? In recognition, one looks for specificity. What is the incremental information in that face? So, the separation into common and incremental information is again critical in the context of both face detection and face recognition. We looked at the beautiful decompositions which Roanuck showed using wave packet analysis. We noticed that different kinds of features came out in different subbands. Now, in fact, one can study that in greater depth. What has been presented here is an indicative study. What different subbands show? You know, Roanuck also presented the frequency responses of the filters at the second iteration in wave packet analysis. And you notice again a confirmation of the theoretical discussion that we had when we carried out wave packet analysis in one of the previous lectures. We saw that when we decomposed the high frequency band, that is when we followed a high pass filter and a down sampler by the analysis filter bank again, there was a band inversion. So, high followed by low actually gives you the higher frequencies and high followed by high gives you the lower frequencies. This should be noted. As you notice, as expected, there were two band pass filters aspiring to become band pass filters between pi by 4 and pi by 2 on the normalized omega axis and pi by 2 and 3 pi by 4 on the normalized omega axis. And of course, there was a low pass filter with cut off pi by 4 and there was a high pass filter with cut off 3 pi by 4. These were all aspiring to become these kinds of filters. Now, we have seen two very interesting applications of wavelets and filter banks today. There are several others. I shall just mention a couple before we conclude the lecture today. One of the other important areas of application of wavelets and time frequency methods is biomedical signals. They could involve evoke potentials. They could be magnetic resonance imaging signals, MRI signals. They could be signals obtained from cat scanning, computerized axial tomography scanning and many other imaging modalities that are useful in medical image processing today. That is one area in which wavelets and time frequency methods, filter banks have been heavily employed. We hope to have another presentation of an application of this kind in a subsequent lecture. Wavelets have been used also or proposed for use in digital communication. People have talked about building modulation and demodulation systems on wavelets because they are nice time frequency atoms. Another area for the mathematician in which wavelets have found great applications is the use of wavelets in solving differential equations. At this point, I shall not say more, save to mention that these are just some of the myriad areas in which wavelets are used. We shall hopefully be able to see some of these applications in greater depth in subsequent lectures. With that then, we come to the end of this lecture and we shall proceed to discuss something different in the next one. Thank you.