 I shall be talking about visualization and aggregation, visualization means you are supposed to look at the data and then somehow draw some conclusions, aggregation means you are supposed to sum up the whole data set and then somehow you say a few words about the data set. As examples let us look at this diagram there are 5 points plotted a, b, c, d, e okay and from the look of this 5 points the distance between a, b is probably the least among all the distances then probably the distance between d, b and c comes probably then c and e okay then probably b and d. So just by looking at this data set of 5 points you know that a is close to b than to any other thing and afterwards I mean between for b between a and c a is the closest then c. So these are the sort of conclusions that we are drawing about the data set just by looking at the data. Now let us look at this, these points are plotted now from the look of this diagram we are aggregating it and saying that there are 2 clusters here, this is a set of point consisting of one cluster, this is a set of point consisting of another cluster just by looking at the data set, just by looking at the data set we are making statements about the data. Here also just by looking at the data set we are making statements what is the meaning of looking we are basically able to plot the points and we are making judgment about the data on the base of the plots. Unfortunately you can do this plotting of the points only when the data is in R2, the movement the data is in R3 or any other higher dimension you are not able to plot. So you are not able to visualize and you are not able to aggregate. So the question actually boils down to how do you plot points which are in higher dimension to a lower dimension preferably dimension 2, preferably dimension 2. This problem has several connotations, this problem has several connotations, people have been working on this problem for a long long time for a long long time. You can make this dimensionality reduction, I am using the word dimensionality reduction intentionally because your original data is in very high dimension you would like to bring it to a lower dimension so that you can visualize or you can plot. You can make this dimensionality reduction in many ways, we have already read about PCA principle component analysis where you make some sort of dimensionality reduction. We have already discussed several future selection criterion functions, several future selection algorithms where we reduce the dimension but then here I am going to talk about it in a slightly different way, again I will come to this diagram. In this diagram distance between A and B is less than distance between B and C, suppose you want to decrease the dimensionality from a higher dimension to lower dimension then one of the properties that you may want this dimensionality reduction to have is probably the following, suppose your data x1, x2, xn it is in some capital M dimension of space and you would like to reduce it to say x1 is changed to y1, x2 is changed to y2, xn is changed to yn to some small m dimension where small m is less than capital M, x1 is changed to y1, x2 is changed to y2, xn is changed to yn and from capital M the dimension is reduced to small m, small m is less than capital M and you would like the reduction to take place in such a way that suppose for every three points x i, x j, x k for every three points x i, x j, x k such that distance between x i and x j is less than distance between say x i and x k say this is less than distance between x j and x k there are three distances x i, i, j is the least x j, x k is the highest and the middle one is this for every three points satisfying this the corresponding y i, y j and y k they satisfy y i and y j is less than distance between y i and y k and that is less than distance between y j and y k. So you want the dimensionality reduction to take place in such a way that somehow this less than greater than relationships among the distances they are preserved okay this is the condition that you would like to impose. Now the problem here is this is the condition that we would like to impose this is fine but the moment you impose this condition note that this is to be satisfied for every three points not just for only those three points for every three points x i, x j and x k. So how many such tripletons you can have n c 3 n c 3 such tripletons you can have and for every three points there will be some such relation let me also include equal to so that you would not have this problem of equality and inequality let me also include equality. So if you take three points you have three distances and they will satisfy one of this one then the corresponding the transformation once it is transformed to y they should also satisfy the same inequality and this should happen for every three points. Now what is the shown in topology is that such a thing is not possible such a thing is not possible for every data set in fact you have too many, too many examples where these things are not they cannot happen. For example I will ask you to do it if you have time you take points in unit not unit you can even take it in unit cube, unit cube how many points are there there are eight points and you have the corresponding distances right eight C two distances right unit cube you have eight points just look at the nodes only do not look at the edges look at the nodes there are eight nodes. So you have eight points you try to get eight points in R2 so that the equalities and inequalities are satisfied you will see that you cannot get it you will on your own you will be able to say that you would not be able to get it you can try to do it okay and if you increase the number of points then it will be that much more difficult. So in fact it has been shown there is also a theorem in topology that for in fact for very rare cases such a thing is possible such a transformation is possible in general such a transformation is not possible for infinitely many points most of the times it is not possible and if you take disks etc that is just simply impossible it is simply impossible. Now there is a phrase that you generally use topology preserving I do not know that you have come across this phrase or not topology preserving maps okay these topology preserving maps from higher dimension to lower dimension they are generally not possible you cannot have them. So the problem is that since you want some such thing to happen and the results are somehow the theoretical results are not there in fact they are just not there the results are stating that they are not possible this is not included in our course of lectures there is something called covenants self-organizing map there is something called covenants self-organizing map where he tries to bring data from higher dimensions to two or three dimensions usually people generally take two dimensions where covenants tries to approximate to the best to the extent possible from higher dimension to lower dimensions okay so that it is in some way top but then note that that is only an approximation and no such thing is possible always this is one thing that I wanted to say this is partially it comes into visualization and there are some other results that are there those results are concerning what are known as Chernoff's faces if you do if you go to any searching use any search engine and then just try Chernoff's faces you will get many faces basically they are like this what Chernoff had felt Chernoff it was a very well-known name in statistics okay there are some Chernoff's bounds on errors what Chernoff thought that we people we generally we human beings if there is little bit of change in the face we immediately notice it so he thought that probably faces are a best or a way of representing data using faces you can represent a data basically what is it that we want to have small changes are also we must be able to find out when you want to represent a data figuratively or by some plotting or whatever it is basically what is that we wanted to observe we want to observe small changes also we would like to actually get them from the data section now one of the places where we always notice changes small changes that is in faces itself okay so how do we thought that why do not we represent a data set by a face so he represented data as a face note that if you write a face like this this is two-dimensional okay this is two-dimensional now depending on he basically changed there are years also okay please forgive me for very bad drawing there are years also so length of I mean ear lobes the measurement of ear lobes maybe it may correspond to some of the features okay the distance between eyes it may correspond to some of the features the radius it may correspond to one of the features so like that the features that he has he represented them basically as a face but then he used the data set that he presented for 32 33 features it can take care of but then if you are looking at 100 or 200 features then you really I mean at least he is not able to do it I do not know whether we are able to we can do it or not that is one but then every observation he is writing as a face but then what we are doing there is between observations also we are having some relationship that we are able to find here he is writing each observation as a face but then between observations what is the difference that only when you see all the faces then you understand it but if you have to see 200-300 faces then you have a problem in aggregating right if you have to see 200-300 faces then from one face to another face you go from the first face to when you come to 50th face probably the information in the first face we will not we may not remember whereas if you see all the 50 points are these 200 points in this way we do know that this forms one cluster this forms one cluster such a conclusion if you write all those two 300 faces whether you can draw or not that is not quite clear but then this is one of the best known papers on visualization of data sets Chernoff's faces it is available on the internet even now this was sometime in 1960 70s this paper was written this is one of the best known papers afterwards you have probably the best known paper is covenants paper on self-organizing maps for dimension of the reduction in between many other persons wrote many articles where people have represented data by tree and they used in fact people used various ways graph is a way of representing data but then they are not exactly successful so this is still how best you can represent a data set of higher dimensions in two dimensions so that we understand many facets of the original data set well by use by looking at the two-dimensional data this particular problem still does not have always satisfactory solutions I will be giving a brief talk of probably around 10 minutes on ensemble classifiers let me tell you the meaning of ensemble classifiers we have been discussing about which classifier is better which classifier is not better on data sets how to resolve this problem this we have been discussing about this earlier now instead of saying which classifier is better or which classifier is not better can we say can we take a combination of these classifiers to get a new classifier so that this new classifier which is taken as a combination of the classifier maybe this works better than all the individual classifiers so this particular thing is known as ensemble classifier as you can see as you can understand the word ensemble means you have a collection of classifiers from there you are going to get a classifier that is why it is called ensemble classifier now there are many ways in which this can be done one is which is known as bagging what is done here is you have a training data set and you have a classifier now what you do is that you take simple random do simple random sampling on this data set with replacement what is the meaning of that say you pick one observation randomly note the observation and you know the corresponding class label also put it back and then you draw the next observation so the next observation could be the first observation also so if you have smaller number of points in the training set you will draw smaller number of points randomly with replacement so it may so happen that you may get the original data set itself or many times what happens is that you get n number of points alright but some points will be repeated so what you are going to get will be a multi sets okay now on every one of these sets multi sets you apply your you learn the parameters of the classifier or you do the training whatever it is okay so for every one of these multi sets you have a classifier so if you draw ten such multi sets you have ten such classifiers okay now an unknown observation comes you apply all these ten classifiers apply all these ten classifiers maybe six of them will say that it will go to class one four of them will say that it goes to class two then the majority wherever it is going you say that that is your rule majority voting this procedure is known as bagging I will repeat it basically what you do is we take do simple random sampling with replacement do simple random sampling with replacement and apply the class I mean learn the parameters of the classifier that means you train the classifier on this training set basically when you do simple random sampling with replacement you get much you may get multi sets not always but many times you get multi sets so this even if you have two or more classifiers you can do the same thing that is if you have three classifiers you have got these many multi sets for each multi set for each classifier you learn the parameters of the classifier or you train the classifier so if you have ten such multi sets and three classifiers then you have totally 30 different classifiers then when you get an observation from the test set apply this 30 classifiers on them then the majority voting that is bagging okay or and then you have like that you have several other techniques you have boosting there is something called Ada boost which is a very very nice method for ensemble classification and there are many other methods basically in all these methods basically let me not say all the methods basically in most of these methods after you get the classifiers how you get the classifiers that there may be many ways after you get the classifiers when you are applying it on a test set asset usually majority voting is taken usually majority voting is taken and there are some papers where instead of majority voting people have done some other ways other means of doing this finding which is the best classifier or finding which particular combination is to be taken one way is majority voting there are some other ways where you take something like a linear combination or some combination of the classifiers some combination of classifiers also people have taken in some situations but on this the many papers are being published now and I expect to see many more papers on this ensemble classification thank you.