 Hello friends. Today we will be looking for Gemini approach for two-dimensional color images in multimedia information retrieval. So learning outcome for this session is students will be able to retrieve images based on color feature using Gemini approach. So in the previous lecture we have already seen what do we mean by Gemini approach and what are the steps. So first we have to decide the distance function, second we have to find out features for a plank we can dirty test, then we have to prove that the features we have selected are correct using lower bound lemma, then use any special access method to store and retrieve f-dimensional features and then compute the actual distances for qualified objects and retrieve the result. So in color images, there is a large image database using image contents as a basic query. So what can be the features that has been used? Your color, shape, position, texture, edges are the different features that will be used and what are the potential applications for this? In the medical field, in the art, fashion, industry, everywhere we can use all these applications for retrieving the images. So consider we are having the database of still images with two main types. One is the image that is called as a scene and the other one is item. So scene as an image and item is a part of that scene. Now consider this as an image. So here the number of images in the database is n. This is our the query image. So we want to find the similar images to this particular image. So how we will find it? So we can plot, there is one method that we can plot k element color histogram for the items or a scene where k can be either 64 or 256. For each image in the database, we have to compute this k element histogram. Also we need to compute this k element histogram for query object. So since there are n images, the histograms will be n. After finding the histogram, so this is what it is representation or the histogram for this particular image. So these are the values of the images or the what is a pixel and based on that this is going to be the histogram. So once we calculate this histogram, then we have to find out the distance between two images to see that which are similar or which are dissimilar. So this is the distance function that has been suggested by the experts that color similarity matrix for measuring the distance between two histograms. So where x and y are the two objects. So this is the formula that we are finding this is going to be the quadratic function to find the distance between two histograms. So this t is nothing but the matrix transposition and color to color similarity matrix a which has the entries aij which describes similarity between color i and color j. So this is how we are finding the distances. Now here that k can be 64 or 256. So if there are 256 representation histogram, then each vector will have 256 values. And then we are computing the distance between the histogram of query object and histogram of other images or all the images present in the database. Now all the images whose distance is less than tolerance or ips long are going to be the result. This is what our sequential approach that compute the distance or find the distance between every object and the query object. And again this is going to be time consuming. There are two issues one is the dimensionality curves and the second one is quadratic nature of distance function. Now computation as we already discussed in the previous lecture also. Computation will be more expensive since the function is in quadratic in nature and if you are taking 256 element color representation that will be more cost more complex and costly. Second this dimensionality curves means the values that we are taking are too high. So distance function in feature space also involves crosstalk problem. Now what is this crosstalk problem is that when we are finding the distance between two. Now consider that this is a histogram for a particular image in the database and this is the histogram of query image. Then how we are finding the distance we are finding or we are taking the subtraction of every value with every other value present in the database. So this is the bright red color. So bright red color is compared with again bright red, pink, orange then so on up to blue all the whatever the colors are present in that image. So ideally the difference in between these two will be very high or they are going to be very much different. So this is called as a crosstalk problem. So only the color which is nearer to red should be compared with red with blue to blue and red to sorry green to green. This should happen but due to this quadratic nature of this is due to this function. So all the values for all value of i it will be compared with all the values of j and that is why so this will be compared with all then second pink will be compared with all values and so on this is crosstalk problem. So to resolve this crosstalk problem and the dimensionality curves we have to use Germany approach. So again what is the first step? The distance function. So we have already taken the dehist function as defined earlier. Second find one or more sorry one or more numerical features to qualify the to discard the non-qualifying objects. So if assume that there are thousand images so all the images are not matching. So which images are immediately matching that we have to decide in this using second method that is one or more finding the one or more numerical features. So now every pixel is described by red, green and blue component. So here we are going to find the average amount of red, green and blue component in every color image and that will be represented by this average color vector. So assume that the sizes of 2, 512 by 512 of the images then it will go on comparing that every 512 by 512 computation for all the images. Now instead of 512 by 512 we are representing it only into the three values that is average of red color component, average of green color component and average of blue color component. So p is the number of pixels in the item so if it is 512 by 512 so number of that many pixels r of p is nothing but the red component, g of p is nothing but the green component and b of p is nothing but the blue component. So as we know that every pixel or the color is defined with the combination of this RGB preferably the values will be in between 0 to 255. So every image will be represented by this vector r average, g average and blue average. So look at this again image ok so this is an image. So these are the values so from this pixel every pixel we are going to find the component and then this image will be represented instead of all these 256 values it will be represented only in three values ok. So second step is that find the number of features so we have found three features now. Now what is the next step? Prove that the simplified distance or whatever the distance between the features is going to be lower bounding the actual distance. Now here the distance is Euclidean distance d average that we are going to find between the two average color vectors. So this is the distance function and it is using that quadratic distance bounding theorem it has been already shown proved that the distance between this vectors will be always less than or equal to distance between the histogram or the distance calculated using this d-hist function ok and that is why these features are correct. So once these features are found so what we will do is that we will find if these three features for every image ok so if there are thousand images there will be thousand vectors each vector will be having three values. So these three values now the difference between that vectors that thousand vectors and the image vector will be compared and which are similar out of that thousand that is going to be found. This is nothing but filter the set of images based on their RGB values. So assume that out of thousand some twenty images are nearer or similar to that given seen image ok. So for that only twenty images actual distance will be found using d-hist function formula which we have already seen for that images and then the distance twenty distances will be computed the images for which that twenty images or the twenty distances out of which which are less than or equal to tolerance that will be that are going to be added to the result ok. So look at this how we have done first we have found RGB vector for every image and also for query image. So here we are finding the difference in between red component and red component blue and blue and then green and green. So which will solve the crosstalk problem also then thousand computations are going to be reduced to the to the number of qualifying images so for example twenty only. So we are going to further do that to find the actual distance using d-hist formula only for that twenty images thus how Gemini approach is making that searching and indexing more faster. Thank you.