 Okay, okay, so this lecture will be about the notion of stability which has already been mentioned here by Blam in the first lecture which is also slightly related to Blam lecture, so it's based on a joint work with nothing linear, and we'll start with a rather informal definition of what is stability and a motivation for this notion. So here's the informal definition, we say that an instance is stable if an instance for an optimization problem is stable if the point where the optimum is obtained does not change upon a small perturbation. In other words, if we move the instance a bit, the configuration for which the optimum is obtained does not change at all. And our main question of interest is the following question. Are there efficient algorithm for NP-hard optimization problems that correctly solve all sufficiently stable instances? Okay, so there are many reasons to study stable instances, and maybe the most convincing one is clustering problem, so we will consider clustering problems of the following kind. The input is a set of objects which we denote by x and a dissimilarity function d such that d of x, y is the distance or the dissimilarity between the object x and the object y. And we do not assume that d is a metric or any other restriction on d. And the output is an optimal clustering in some sense when a clustering is considered good if a similar object or closed object are classified into the same cluster. For example, we wish to find a cut, a partition of x into two sets s and its complement that maximizes the sum of distances between points that are classified into opposite sides of the cuts. Okay, so for these problems we can now define precisely what we mean by stability. So we say that an instance is stable if the point where the optimum is obtained does not change upon a small perturbation. So the first step will be to define what is a perturbation. So we say that an instant xd tag is a gamma perturbation of another instance xd if for every pair of points xy, d tag of xy is between one over the square root of gamma times dxy to the square root of gamma times dxy. That is if we get an object xd in order to generate a gamma perturbation of it, we are allowed to multiply each entry dxy by a number between one and gamma. Okay, so now we can define what is a stable instance and we say that an instance xd is gamma stable if its optimal clustering is the same as the optimal clustering of all of its gamma perturbation. Okay, so why are we interested so much in stable instances? So our main claim, maybe slightly exaggerated, is that clustering problems we are only interested in stable instances. And I'll try to justify this claim. Well, the first reason is that in clustering problem we are interested in instances where there is a clustering that is evidently a correct clustering. That is evident that it is really a partition of the data into two sets that each set have something is common. And this evidence is captured among other things by stability. That is if we move the points a bit, the optimal clustering should not be changed. So this is the main reason why we think that for clustering problem we are only interested in stable instances. And another reason is that we are not given the actual dissimilarity function because they might not be a dissimilarity function or maybe it is hard to measure it. So the real dissimilarity function is a perturbation of the dissimilarity function we obtain. And we are interested in cases where the solution of the instance we obtain is the same as the real perturbed instance. Okay, so hopefully I motivated you enough to study stable instances. And in the next part of the talk we will see some results and proof techniques obtained for this notion. So we will start with result of about k-mediums, k-means and k-clusters and other center-based clustering problems. And we will focus on k-mediums for simplicity but a similar result also for k-means. Okay, so here the input is endpoint metric space xd and I emphasize that now we do assume that d is a metric. And the output is a set of k-centers, c1 to ck that might be in x and might be in some ambient space that minimizes the sum of distances between each point in x to its center. Where the center of the point is the point in c that is closest to this point. Okay, so here we have a nice result by a watchtree, Blum and Sheffet claiming that there is an efficient algorithm for free stable instances for k-medium. It's also true for k-means and I'll try to describe a sketch of the proof. So the first step is to show that free stability entails that every point is three times closer to its own center than to the other centers. And using this property it is possible to show that the single linkage process plus some processing entails the optimal solution of k-medium where a single linkage process is the following process. We start with n-clusters each containing a single point and at each step we merge the two closest clusters where the distance between two clusters is the minimal distance between two points, the first in the first cluster and the second in the other cluster. So again we start with n-cluster at each step we merge the two clusters and at the end we finish after n-1 steps with one cluster containing all the space. So using the first property it can be shown that every cluster in the optimal solution for k-medium is obtained as a cluster in one of these steps and using some dynamic programming it can be shown that the optimal clustering can be extracted. Okay, so our next result is about metric max cut. So here again the input is an endpoint metric space Xd and the output is a cut, a partition of X into two sets S and its complement that maximizes the sum of distances between pairs of points that are classified into opposite sides of the cut. So here we have a rather strong theorem claiming the following that for every epsilon greater than zero there is an efficient algorithm that correctly solves all one plus epsilon stable instances for metric max cut and this is in some sense the best we can hope for because this problem is NP out so we know that one stable instances cannot be solved. I mean every instance is one stable and we know that we can solve these instances so this result is optimal in some sense and I'll describe its proof which is rather simple. We start by sampling log n points X1 to XL according to some distribution not the uniform distribution and then we show that with high probability there is a partition of the sample point into two sets A and its complement that induces the optimal cut where the cut induced by the partition A and its complement is the following cut on one side we put all the points that are on average closer to A and on the other side we put all the points that are on average closer to the complement of A and we show that with high probability the optimal cut is of this form and since we have only sampled log n points we can iterate over all the cut of this kind and find the optimal one. So this proves the theorem. Okay so we are almost finished. In the last slide I'll talk about future work we intend to do and we intend to work on max cut in general not the metric version that I mentioned before so far we have assumed rather strong restriction on the dissimilarity function D specifically we assume that D is a metric. It is enough to assume that D is a lift sheet but we can't throw away this assumption and without this restriction we need n of a log n stability to design efficient algorithm for max cut which is a rather large amount of stability that is usually not reasonable to assume in practice however we do conjecture that there exists some constant gamma star such that there is an efficient algorithm that correctly solves all gamma star instances of max cut which is the problem we currently working on. So sorry, when you say we need that stability Okay so we need stability That is all in order to prove we don't have a hardness result only rather weak hardness result something like that 1 plus 1 over n stability does not suffice us but nothing stronger than that and we only know to prove n of a log we know to design an algorithm that correctly solves all n of a log n stable instances or at least that's the best we know to prove for this algorithm Okay so I'll finish here. Any questions?