 Thank you Piyush. So, I will be speaking on planted models for k-way edge and vertex expansion. This is joint work with Arnand Lewis at ISC Bangalore ok. So, let me start with a bit of generic motivation before I actually define the objective and the models that we use. So, graph partitioning in general refers to basically given an input graph you break it into two or more parts while trying to minimize a certain objective measure related to the partition. And in many cases you know looking at a sparse partition of the graph is useful both for application and interesting theoretically. So, the number of measures that you can talk about when you talk about sparsity or related measures and here are a few of them. What we will be looking at in this talk is sparse cut or edge expansion and sparse vertex cut or vertex expansion ok. So, the point is most of the objectives that I defined in the previous slide are actually NP hard to compute exactly ok. So, we are looking for approximations and in fact even when we talk about approximations most of the instances that happen in real life are not really the worst case instances for these problems ok. So, for instance in this graph you can see that there is some kind of a partitioning already that is implicit although I have not defined any measure immediately ok. And this is the kind of inputs that we want to look at in this particular result. Again the to define the names of the problems that we will consider the first one is k wave edge expansion basically you want to partition the graph into k parts where you want to minimize the maximum edge expansion of any given part ok. So, I will define these objectives formally a little later and k wave vertex expansion a similar you know task except you want to minimize the maximum vertex expansion here ok. Now, these are qualitatively slightly different problems when looked at theoretically there has been relatively less work on vertex expansion as compared to edge expansion. So, for the purpose of this talk I will mostly talk about vertex expansion object ok. So, when I said about input instances being naturally clustered in some sense the kind of models that we consider in this work is planted models. So, in planted models what happens is we actually plant a solution in this case for the considered objectives we will say that the input graph G is guaranteed to have a k wave partition with low k wave edge or vertex expansion ok. And the goal is to recover a solution that is guaranteed to be a good approximation of this planted solution. So, we are looking at a restricted class of graphs over here. And this kind of analysis is often called beyond worst case analysis because you are not looking at worst case instances, but you are kind of looking at average case instances in some sense. And usually this yields some insights into why well known heuristics work in practice. So, many problems have been studied already in such models. We have two way edge expansion coloring planted clique so on and so forth in a series of works, but the k wave vertex expansion and edge expansion is kind of new in this sense. So, as I said I will be concentrating on vertex cuts when you think of a graph and you want to partition it into two or more parts in terms of sparsity. The natural thing to think about is to measure the sparsity in terms of edges ok. However, in many cases this might not be a complete good indicator of what we want in practice. For instance here is a graph which kind of clusters into two instances. However, the interaction between the two parts or the two sets of vertices is almost completely through a small set of hubs over here or important vertices. So, these are the kind of boundary vertices that divide the graph across the partition. And vertex cuts when you talk about sparse vertex cuts we are looking at sparsity measured in terms of the number of boundary vertices across the partition ok. So, here is the formal definition of the objective that we consider often this is called the symmetric vertex expansion in literature you also have a asymmetric version of it turns out to be similar in terms of complexity of computation. So, here the vertex expansion of a cut S and S complement. So, this is a two-way cut in the graph is defined as follows. You look at the number of interacting vertices across the boundary that is vertices in one part which actually have edges to a vertex in the other part ok. So, you measure the number of boundary vertices in the numerator and divide it by the product of the sizes of S and S complement in the denominator over here ok. There is a normalizing factor of the size of the vertex set which is multiplied out, but you know that is just for normalization. So, just to give you a hang of you know what this will mean in some cases if both the sets are of size n over 2 ok. And the sizes of t 1 and t 2 which are the boundary vertices are epsilon times n over 2 that is epsilon times the size of the part that they lie in. Then you can compute the vertex expansion to be equal to something like 4 epsilon ok. So, this is just to give you an idea of what kind of numbers we are looking at. And the vertex expansion of a given graph is the minimum over all two-way partitions of the graph of the vertex expansion of the partition concept. So, it is just a normalization factor it is not really important in any way. So, if you look at this the numerator is like looks like n. Yes, yes so that is the that is the kind of reason that we are doing this. Yeah, if you had the edge expansion objective you would not be multiplying it by the vertex ok. So, what do you mean by vertices on the boundary of a cut? So, given this cut they are exactly the vertices that have an edge to a vertex in the other side. So, I hope the definition is clear. What we will look at is a k-way version of this that is you want to partition the graph into k parts. The vertex set of the graph into k parts and what do you want to count again it is a number of boundary vertices. But now what we are looking at is the maximum over all the parts that you do in a particular partition of the vertex expansion of that particular part. So, given a partition of the vertex set into k parts we are looking at the worst the part with the maximum vertex expansion and that is considered as the k-way expansion objective of this particular partition. When you are looking at a graph again you want to minimize this quantity over all k-way partitions all possible k-way partitions. So, just again to give you an idea of what the ideal case is ideal as a natural case is if each of the sets are of size n over k and the size of the boundary vertex within each component is equal to epsilon times n over k you will find that this particular partition will have a vertex expansion or a k-way expansion of something like 2 epsilon times k ok. So, what is known? So, here I list some of the worst case results that are known for this problem for vertex expansion. For k equal to 2 Phi Gaya Harjagai and Li gave a order square root log n approximation. So, and Louis Raghavendra and Vemperla gave a order square root log d divided by opt approximation here d is the maximum degree in the graph. So, the degree turns up for some cases of vertex expansion unlike the edge expansion version of this if you are familiar with that ok. So, these are for k equal to 2 mainly for k greater than 2 there is no explicit published work, but you can infer from a couple of works that an approximation guarantee for the k-way expansion objective looks something like order square root log n times f of k times the optimal ok. So, this f of k is usually a polynomial in k ok. In terms of lower bounds again no pters is known is expected to be possible and there is no constant factor approximation as a algorithm assuming the small set expansion hypothesis. So, just to detour again to state what is the relatively what are the results known for worst case edge expansion although I have not defined the objective explicitly. It is useful to keep in mind that the best known approximations again here are of the form order square root log n times f 1 of k. Now, the common set in both of these is that either your approximation guarantee depends upon n or you get something like a square root of guarantee that is when the opt is really low it might have some issues the the approximation factor is bad. So, the latter of these these kind of guarantees are usually derived from spectral algorithms for the problem and the former is from SDP based on the technical exception ok. So, what is known beyond worst case in the kind of setting that we looked at before again here the goal is to design algorithms that have better guarantees than the worst case approximation and we do it only for as I said certain class of graphs or graphs that are generated using some models. The previously considered models are generally of the form that we do it with a combination of randomness and adversarial choices. So, you might have some part of the graph being some GNP along with some adversarial edges that I added in in some way. Peng Sun and Zennetti in a recent work considered a similar k way model to what we will do. However, the guarantees are slightly different and if you are familiar with approximate recovery in such models they kind of say that you can recover the partitions in that sense. So, our approximation guarantees are going to be a bit different in this ok. So, given that in mind let me define the model that we work with we feel that it is quite a natural model to keep in mind. The motivation again is that within every part of the partition you have good connectivity and you maintain sparsity across the partition. So, this is a common set this is the same for both edge and vertex expansion and as I said I will look at vertex expansion in particular. So, here is a bit of text I will do a diagram on the next slide which will make things clear. The idea is that you partition the vertex set into k part. So, this is what the adversary does behind the scenes ok. So, the algorithm designer just knows that the graph comes from this class, but does not really know these sets or the edges that are added in beforehand or how they are added in ok. So, this is what the adversary does. He partitions the vertex set into k parts each of size n over k and to maintain connectivity within a part you add in a spectral expander within each. So, I call the parts as s 1 to s k for each of these parts you choose a set of boundary vertices and then add arbitrary edges connecting the boundary vertices to each other. And in the last step there is a monotone adversary who can further add in edges into s into each of the STs ok. So, here is kind of what the adversary does the vertex set is of size n you partition it into k parts each of size n over k put a lambda expander in each of them choose some epsilon times n over k boundary vertices and connect them arbitrarily. And then you add in monotone adversarial edges that is you can just add in edges into each of the s i's you make them more connected in some sense. So, this makes the planted solution standout even more. And what we want to do is to recover a k way partition with guarantees that are close to what is expected. As I mentioned before in this kind of model where you have a equal partition into n over k parts of size n over k and epsilon fraction of the vertices in each part on the boundary you expect the vertex expansion of this looks like 2 epsilon k or something similar. So, you want the algorithm to recover a partition with similar guarantee and that is what we do. So, given some conditions on how well connected the each of the parts are and how many boundary vertices are there if the solution stands out well enough intuitively we can recover a k way partition with vertex expansion close to the planted one. So, this is the theorem given that epsilon is less than lambda divided by 800 k again remember epsilon is the fraction of boundary vertices in each part. And given that this is small enough and the spectral gap is large there is a polynomial time algorithm that outputs a partition where each part of the partition has got omega n over k vertices. So, it is large enough compared to the planted one and secondly it has at most order k square times optimal vertex expansion each of the parts. So, opt is less than equal to 2 times epsilon k which is what we had told before. The important part about this is that the final approximation ratio of order k square is independent of n unlike previous approximation ratios in the worst case which give order square root log n this only depends upon k and the algorithm runs in time polynomial in both n and k because one could consider algorithms if needed with which run in time exponential in k times polynomial of n, but this particular algorithm is polynomial in both n and as I have said I have not defined the edge expansion problem explicitly, but we give a similar guarantee even for the edge expansion version ok. So, let me go on to give you an outline of the proof for this and what we use is some definite programming relaxations. So, let me start with the two way case just to warm up to what the relaxation looks like. So, the goal as you might know is to take this combinatorial objective and cast it as an optimization problem. So, we do that in the following way this is the original objective two way vertex expansion n there is the number of vertices in the graph. So, you can express the same objective in terms of Boolean or rather plus 1 minus 1 variables in the following way. It is the minimum overall possible assignments of plus 1 and minus 1 where the intended solution is if in the optimal solution partitions the graph into s and s complement then you assign x i equal to 1 for vertices i that lie in s and minus 1 for vertices that lie in the complement part. So, the interesting part about this here is the numerator. So, notice that for an assignment of x i equal to plus 1 and minus 1 to vertices in s and s complement respectively the numerator exactly counts up to factor of 4 the number of boundary vertices because only for a boundary vertex does its neighbor lie in the opposite partition. So, I hope this is clear the denominator is something like 4 times the size of s times the size of s complement. So, this is just the algebraization of the original combinatorial objective again this is also of course, going to be NP hard we cannot run it through optimization solver. So, what we do is relax it into a continuous space and then try to solve that. So, the relaxation is going to be via semi definite program in which instead of having a scalar which we did in the previous one we are going to assign a vector for every vertex in this vertex set. So, we will call these vectors that we assign these are the variables in your optimization problem as U i and again this is just a restatement of the previous one, but now replacing every x i by the corresponding vector U i that we are using. So, notice that this is very similar to max j in neighbor of phi of x i minus x j square we have just changed the quadratic form over there into a norm squared function over here ok. So, the ideal solution of course, is you go back to the quadratic form if the U i's were just scalars which is a subset of the space of possible solutions of this optimization problem with U i being plus 1 if i is in S 1 and minus 1 otherwise you would of course, recover back the optimizer the vertex expansion objective, but this is slightly different. So, one point here is we are kind of implicitly using the fact when I put in this normalization constraint if you have noticed it that it is a balance partition of the vertex set into two parts ok. It is not really important for the purposes of this talk, but in case you are wondering why this turns up we are using the fact that you are partitioning the graph into two balance components of size and over to each. Again if you are just as a detour if you are looking at the edge expansion objective the main difference is that the max over here that is appearing for vertex expansion is going to be replaced by a sum ok. So, this was the two way expansion where the ideal solution was a scalar for every variable for k wave expansion a scalar assignment does not work rather what we will relax it the ideal solution is or the relaxation rather is that again you have a vector for each, but the ideal solution to the relaxation is going to be that if a vertex lies in partition S i then you have the corresponding vector in the ideal solution should be a coordinate vector along that axis ok. So, it is a k dimensional solution for k wave partitioning and here is what the ideal solution to your optimization problem should look like for k wave expansion and this is the kind of basic idea that we will use. So, again we will adjust the constraints accordingly I will not going to the details of this relaxation if you have seen SDP relaxations before you can actually decode that this is a valid relaxation for the k wave expansion problem ok. So, we added in a couple of constraints called the triangle inequality constraints and all again you can verify that all of these are valid for the ideal solution I mentioned ok. So, here is the kind of core of the proof outline and that is that when the ideal solution looks like this and what we prove is that the actual solution for k part instances that satisfy the condition on the boundary vertex that epsilon is less than lambda by 800 k or something for such well structured instances the actual solution of the SDP looks very close to the ideal solution. So, it looks like some perturbed version of it and using that is what we can recover or a good approximation guarantee. So, this will not hold for more general solutions, but for k part instances that we consider in fact, such a structure holds. Here is the kind of technical lemma that we show again what is this showing again if you forget all the formulas the one line idea of it is it is showing that the solutions are clustered according to the sets that were present in the planted partition that is the centroid if you take mu t to be the centroid of the vectors of the optimal solution for the teeth part then they are almost orthonormal that is they are both unit vectors and they are far apart in some sense ok. So, using this we can actually readily extract out the sets. The key lemma that we use for this is actually a very simple one it is about the properties of expanders or spectral expanders. The idea is that if you have a spectral expander of this sort with a vector function that is you map every vertex in the set to a vector in d dimensions. If these vectors are aligned across edges then they are also on average globally aligned. So, this is called local global correlation. So, here is the kind of statement of it is nothing but a restatement of the fact that lambda is the second smallest eigenvalue of the graph of the Laplacian of the graph. So, I would not kind of go over, but it is just a full line proof of the first part of the lemma and this basically the core that only uses a simple thing that the average sum is less than the max in the objective. So, I would not kind of go into the details of this, but it is a relatively small proof that shows that the STs are clustered in some sense, but once you show that the STs are clustered you have a couple more steps to get to the final answer sorry. First of all you show that each of these k disjoint sets that are far apart do indeed have small expansion. Secondly using a greedy choice you can find out sets you can iteratively extract out disjoint sets which have at most k times the optimal vertex expansion, but wait this gives us k disjoint sets it does not give us a partition of the vertex set, but can move from this disjoint set to a partition by losing a further order k in the approximation factor ok. So, that is a kind of basis of the order k square approximation that I said before. So, I will end with the summary of what we did and some open questions. So, what we did was for this restricted class of instances that we saw which seen natural in some sense we gave order k square guarantee on these k way expansion objectives of the graph. So, here are three open questions that are directly related to the model that we discussed. So, first of all can you of course generalize the model. One thing that we did use crucially if you look at the proof is that the set sizes were all fixed and we knew them in advance, but if you did not know this we still do not know a proof of whether you can get a comparable approximation guarantee. The second thing is even in this particular model we kind of expect that we should be able to draw order poly log k approximation. And thirdly as I mentioned the Pengson-Zanetti paper very briefly they show some sort of an approximate recovery guarantee that is they find out k sets that overlap with the planted sets. Notice that in our algorithm I give no guarantee about the sets that we recover being overlapping in any way with the planted solution. So, is such approximate recovery guarantees possible for the vertex expansion model is something that is an interesting question. Overall as I said all of this does help us understand how the problems behave what are the tough and the easy parts of the problem. In particular the kind of big open question is in the worst case can we get to order poly log k approximation for k way vertex expansion or edge expansion objectives. We will stop with that and yeah open for questions. So, that is I think very similar to the question that he asked we really cannot do that we just show existence and then the structure actually helps us go around just choosing subsets in some degree. So, we just need to look at so I am not discuss those parts, but technically what we need to do is just go around every vertex because these are all clustered around some vertices in the cluster. So, look at every vertex look at a ball around that see whether it has small vertex expansion if not go on. So, basically a poly time algorithm, but again there is no guarantee that we are looking at the STs that we planted.