 My name is Jay Kumar Radhakrishnan. Speaker this afternoon is Professor Robert Kraut-Gamal from the Weizmann Institute of Science. Professor Kraut-Gamal got his PhD from the Weizmann Institute in 2001. Then he worked for IBM Albedon until 2007, returning to join the Faculty of Mathematics and Computer Science at the Weizmann Institute and in the Department of Computer Science and Applied Mathematics, which he now heads. Professor Kraut-Gamal is the editor in chief of the SIAM Journal on Computing. His interests are diverse. He's a leading expert in algorithms with focus on large data sets, analysis of randomized heuristics, average case analysis of heuristics. On behalf of the program committee, I thank Professor Kraut-Gamal for accepting our invitation. Thank you. Thanks a lot for the invitation. It's really an honor and pleasure to be here in this wonderful conference. I'm really working on a few things. I try to focus on something in this talk. I think it's still going to be a bit broad, but I think there's an interesting theme here. So the theme is going to be about sketching graphs and combinatorial approximations. So let's start from sketching, the world sketching. So I'm going to use it for basically data summarization. We have a lot of data. You want to somehow get a smaller image, somehow summarize the data in a useful way, depending on what you want to use it for. So this concept, in this generality, of course, appears in many, many applications. And the reason I'm interested in this is basically it's like a fascinating interplay between the combinatorial optimization problems you want to solve, which is going to be the focus of today, and information theory, but not really the proper information theory definition, but how much information about the object do you need? So I don't know if that's the right information theory here. And there are many exciting results. I'm going to tell you about some of them here. And there are many open problems. I try to focus in my talk on this, because I know everybody wants to hear about great results, and they want to hear about problems to work on. So I try to emphasize whenever there are, and you'll see them here. Okay, so right into the subject, suppose we have an input graph G, and what is it usually we want to do in the combinatorial optimization problem? We have a query, we have some question kind of, Q. And we're basically trying to find the optimal solution for this, Q. So optimal solution basically means that among all feasible solutions, find the best one. So best is usually to minimize or maximize some objective. And I'm going to look at the context where the input is a graph with n vertices. Okay, throughout the talk, we have a graph with n vertices somehow in the background. Now here are two examples. Maybe start with number two. It's basically the minimum shortest path distance between two given vertices, s and t. You have many paths. You want to optimize, find the shortest one. Example number one is basically minimum s t cut. You have all cuts between s and t. You want to find the one of minimum capacity. This is combinatorial optimization. Q here is a pair s and t. And maybe for the same graph, I have many repeated questions. How about this s t pair? How about another s t pair for the same graph? And basically if I want to do some sort of like summarization of the data, then I want to get this huge graph and somehow sketch it into a small object, somehow a smaller representation of the graph, so that then when I get the query, I can actually compute this optimal solution for cut or distance just from the sketch. I don't need the entire graph. Okay, so given the query, I compute or maybe estimate. It's going to be particularly interesting if you can approximate it because then you can often use a smaller sketch, but it's more difficult technically and that's why it's more interesting for many of the people in this room to actually design a less trivial, more sophisticated algorithm that can actually estimate the optimum. Okay, so first you compute from the graph this s k of g, the sketch of g, and then use an estimator algorithm, as you can see there on the right, estimator of the sketch. So you sketch the graph once, but then you get many queries. Of course we're going to look at one query, but potentially it could get many queries. Now in this context, of course, distance problems are very common, well studied. For example, you can use a sketch, a spanner subgraph. You can use a distance oracle, but I'm going to focus in this talk on cut problems. Okay, so distances are perfectly good example. I try to somehow restrict my attention for today on cut problems. Okay, so it's basically example number one and variance of it. Okay, so let's start with this example number one, minimum s t cut. That's like the most basic problem everybody studies it in undergrad, in computer science at least. But here's like another very famous problem, so another I mean compared to binium cut for Falkerson, min cut max flow. So this is a little bit later from 61 by Gomorrian Hu. Seminal result, now I think people don't teach it as often, but it's really elegant. And that says that if you have a graph g, and you look at the binium s t cut for all possible pairs of vertices, s and t. In the same graph, but all possible pairs of vertices. So the graph is undirected, could have capacities, I didn't try it, could have edge capacities, but it's undirected. So you have n choose two basically different questions. Compute the binium s t cut for every possible combination of s and t. Right, so you can actually summarize all these cuts, these n choose two cuts by a tree. So you can replace the graph g with a tree t. This tree is going to have exactly the same vertices, same vertex set as v, but going to have the same cuts. That's always possible. Okay, so here's a very, very simple example. So you see the requirement there, the min cut in the tree t is going to be equal to the min cut in the graph g for every pair of vertices. And here is like a very simple example wanted to illustrate this. Suppose you have an n click as your graph g. What kind of tree could you use? So for example, I could use a star. So the star has to be the same vertex set. So it has n minus one leaves, and one of the vertices is going to be the center. I'm going to give every edge a capacity of n minus one. So I'm assuming in the click it was unit weight. So now you take any two vertices s and t in the click. What is the minimum cut between s and t? Well, for example, you can separate s from all the other vertices. So basically the degree of s is your cut value. It's n minus one. Let's look the same example in the tree. If s and t are leaves, you can just cut the edge going from s to the center of the star capacity n minus one. And if t is the center, then same thing. So the cuts have equal value n minus one. This is very surprising. Why? Because really the tree has only n edges, n minus one edges. So it's very small to store. Actually, if you try to compute all the s, t cuts in the tree, you immediately see that the value that you're going to get is always going to be one edge. So there are only n minus one different value that you can get from the tree. Even though you ask n squared questions about the same graph, it's the same tree, the number of different distinct answers is no more than n minus one, which implies that it's going to be also the same in the graph G. This is true, follows from this theory of Gomorri and who. It's not obvious at all. But it shows redundancy. So instead of a graph of size maybe n squared edges, you get a graph with a tree with only order n edges or the order n words if you want to store it. You can compute things very quickly on the tree. So it's very useful. But at the end of the day, what I'm going to say is that this is, for me, like a good sketch. It's a good summary of the data using only order n words. And it gives you the exact value on the top, the exact equality. I want to go to approximation, but this is, you know, it cannot be better than this, getting exactly the values. And it's deterministic. We can also look at randomized algorithms in a few minutes. This is a wonderful result. And then let me tell you a few words about the algorithm, really because it's such a key result and not taught as often these days. So it's not the full details just like the high level argument. So the algorithm has basically n minus 1 iterations. You start with the graph G, you have n minus 1 iterations. At every iteration you do two things. One of the things you compute a minimum cut in the graph. So it's going to be either in G or in graph derived from G, so I'm not going to go into these details, but you do one minimum cut execution. And then what you get from these minimum cut execution is basically a partition of the vertices and the value of the cut. So you use that to somehow recover one edge of the tree, like build one edge of the tree T. If you have n minus 1 iterations, you get n minus 1 edges of the tree and you'll be done. So here's like by picture you start with a vertex set V. You compute some minimum cut there. So it partitions the vertices into V1, V2. You get this blob, you put like an edge between these two sets. There's like a meta picture. And the capacity of that edge is going to be exactly the capacity that you found in the minimum ST cut. Then you keep doing that. So basically you do the same thing in every iteration, say for V1 and then later for V2 or something. So let's start with V2. You partition it using one minimum cut execution. You get information on how to partition V2. You partition V2 into two sets. The value that you get from this min cut calculation, you put on this new edge between what I call there V21 and V22. But you also have to do something else, which is this edge that you had before. You have to somehow extend it. You have to decide whether it connects to the blob V21 or to the V22. So you have to make these decisions. I'm not telling you all the details. But it's kind of like you keep the same edge, but you have to further direct it in a finer way. So you keep doing that. N minus 1 iterations eventually you get N minus 1 edges. Exactly on the vertex set, you get 3. Beautiful argument. But there's a lot of big open questions here. One open question, can you do it faster? So how much time does it take? This takes computing a minimum cut every time. And you repeat N minus 1 iterations. And the rest of the operations are cheaper. So nowadays, we don't even know how to compute a minimum cut in linear time. We do not know that. We don't know that. Well, we do not even suspect that it's not possible. So let's say it seems plausible that we could do it in linear time. Linear time means order of what we call V plus E, right? Vertices plus edges. So let's say it's order and usually it's connected. So it's like order E or M, if you want. And you have to repeat it N times. So it's like much more expensive than linear time, which is potentially possible. OK, so how could you attack this problem of finding better algorithms, better, I mean, faster algorithms? So maybe you can avoid computing minimum cut. OK, this algorithm works by doing minimum ST cut every time. Maybe you could compute all of them at once. But not using just like as a black box, a call to minimum cut. OK, that's one option. These are open problems, right? I don't know how to solve them. Here is another thing that seems so obvious. OK, we don't know how to do minimum cut in linear time, but we can approximate 1 plus epsilon approximate minimum cut fast in linear time. For this, there are good algorithms. So why don't you use them? So you get 1 plus epsilon approximation in every iteration. You have N minus 1 iterations. Each one is linear time. So now you at least remove that roadblock. Well, it doesn't work. Why? The analysis, which I didn't show you, really relies on the fact that these cuts are, it's like cutting submodular function, and there's something that you do there that really depends on the fact that all the cuts in every iteration compute an optimal mean cut. If you're using 1 plus epsilon approximation, then the proof completely breaks down. It's not like the 1 plus epsilon carries over. OK, so you'd like to do something fast, like 1 plus epsilon approximation. OK, we don't know how to do it very fast. Exactly, maybe we can do it with 1 plus epsilon approximation. We don't know how to do that. Like, do that, I mean, do a little bit faster using this approximation. Another thing we could try is to solve the problem, like we have really, really fast algorithms, but only a subset of the inputs, like only for planar graphs. Well, for planar graphs, actually we know. For planar graphs, there is a result, so I didn't mention it in the open, so it's not going to be confusing, but if you just, all the edges are a unit weight, maybe it's easier. And there are some results that do faster than the general case, but not linear time, or Banu tree width, et cetera. OK, so this is what I wanted to say about Gomri Hoop problem, or algorithm, sorry. Let's try to move to other concepts. For example, what I suggest here, like there's the next concept to look at, can we do more cuts, like analyze or preserve if you do this sketching. I want the sketch to handle more cuts, not just the ST minimum cuts. So the ST minimum cuts are only n squared of them, choices for S and T. The total number of cuts is something like 2 to the n, right? All partitions of the verses S complement. So there's a lot more to do. Can you do, somehow, represent all the cuts, 2 to the n many cuts, using one tree? Is that possible? That sounds fascinating, right? That must be not true. And the answer is, well, it's not true if you want to get these exact values, but it's true to some extent, if you would, depends on the approximation that you allow. OK, so this is actually possible with something that we, you know, people call Recker's tree, because he came up with this concept, and it's like the first paper, Recker 02. And there are some follow-up works that improve the quantity. So now what's known is basically you can take a graph, arbitrary graph, you can build a tree. Now the tree is not going to have the exact same vertices. So it's going to be bigger. It's not going to have n vertices. It's going to have something like 2 n vertices. Why? The leaves of the tree is going to be your original vertices. But it's going to have extra internal vertices. OK, and therefore you see the number of vertices is going to be at most 2 n. So it's a slight increase in the size of the object in terms of vertices, but it's a tree. So it's very small in terms of, like, storing the entire tree. And what you know is that for, if you compute the minimum cut in the 3 t, and you compare it to what you had in the graph G, it's something, it's a number, always you get the number within 1 approximately log n, slightly more than log n. OK, so this 3 t approximates every cut in the graph within a factor of something like log n, slightly bigger than log n. Simultaneously for all the cuts, all 2 to the n cuts. Yeah, exactly. Excellent question, right? So I have a slide for that. So in G, when you look at the minimum, you try to compute minimum cut between s and s bar, it's just the capacity of this cut. But in t, because you have extra vertices, in t you have extra vertices, so basically you have to optimize it. I want to separate this set s from s bar, things like s being a super source, as we all studied, and s bar being a super sink. And the other vertices, you have to, like, decide which side to put them on, which side of the cut to place them. It's an optimization problem. But it's not very difficult because it's a tree. OK, and the size is only 2 n vertices, so it's easy. The parts that result from these need not be connected because every partition is not... Right, right, need not be, but the size of the cut is a good approximation. Yeah. OK, so here still, I mean, OK, the representation is using a tree. It has more vertices. Tree has, as I said, this advantage of supporting fast queries. It's very easy to process, store, manipulate, doing queries in log n time, or log log n time, all the one, all these tricks. And this gives you some sketch of size, basically, order n, linear in number of vertices. Now, this argument here, the construction here is actually non-constructive. It requires solving an NP-hard problem, basically. So if you want to actually implement it in, say, polynomial time, then you have to pay an extra square root log n factor. According to the results that we know, which is, like, the last one in the citations, the rake and sha, you have to pay an extra root log n. And another thing is that it extends to multi-commodity flow. So this is something that is kind of, like, around many of the results throughout the talk today. I tried not to talk about this multi-commodity flow because it requires more time to define this and explain, and it's a more complicated object than cuts. So I'll mention it along the way, whether it extends or does not extend, but I'm not going to go into these details. OK, so we can do that with a log n, approximately about log n approximation. Actually, by the way, this is some result that says that log n is optimal if you want to represent all the cuts using one tree. OK, so this is even for a grid. So in some sense, the boundary is optimal, the body on the right-hand side, or the log n. If you want to have one tree, you cannot go below log n. But what happens if you really insist on a better approximation, you want better than log n? Maybe constant approximation for all the cuts. We don't want to store the entire graph. It's too big. So there is this other result that now says we can do that but not using a tree. OK, if you give up on the tree requirement, which is a strong structural requirement, if you remember the way I motivated the whole thing in the beginning was I want a small sketch. It doesn't have to be a tree structure. Tree was convenient, it's small, you have fast queries, but there's no reason to insist on a tree. I mean, no constraint to insist on a tree. So we can do better if you relax this. You don't restrict yourself to a tree. In particular, let's look at the following situation. So now we really, just like before, we have all the cuts. So Q is basically the query that I said before with like all the subsets. And we have this theorem. So this goes back to Benzo and Karger who came up with this concept of a cut sparsifier and then later improvements, quantitative improvements and others. So basically you say for every graph G, as you get this input, you can construct there exists a graph G prime that has few edges, only linear in N, order N divided by epsilon squared. And in that new graph G prime, and it has the same set of vertices. And then if you look at the every cut, SS bar in G prime and in G, they're approximately have the same capacity, up to one plus epsilon. So you can approximate simultaneously all the cuts within factor one plus epsilon, but now you don't have a tree. You have a sparse graph because it's only linearly many edges. Now how do you build this? This is like a talk by itself would be, and it's very interesting, but in one line I'd say you take the graph G and you sub-sample the edges. So for every edge E, you just sample it with some probability P, or PE, P that depends on this E. If you decide to sample the edge, you give it capacity, you increase the capacity by one over P. So you're doing sampling, but then you increase the capacity inverse proportionally, so that keeps the expectation similar to what you had before. Okay, and the whole difficulty is to analyze the concentration, to show that it's concentrated. The expectation is good, but you have two to the N many cuts. You can't just do a straightforward union bound. That's the approach of not all, but many of the algorithms. Okay, so if you use this, you get a sketch of size linear in N. You somehow represent the graph, actually using another graph G prime, so it's very useful because you can run graph algorithms on this G prime if you want to. So this time it's approximate, so it's not exactly as I said before, as we had before for Gom Reho. So you can see in a minute why it's like the contrast to terministic. Yeah, guarantee. And you can ask whether this trade-off is optimal. So in terms of N, it seems optimal, right? That should be easy. It's easy to prove that you need at least size N. But what about the epsilon squared? Can you drive it down? So usually we think of epsilon as being a constant, and therefore we can say, oh, it's only order N. But once you get to this beautiful result maybe it's important to get the dependence on epsilon down. There could be cases where epsilon is actually smaller than a constant. For example, things like epsilon being like one over root N. Then one over root N, then the graph that you get is already N squared edges, which means that you, you know, G prime has N squared edges, which means that you're already like you're doing nothing. So if epsilon is really, really small, basically this result somehow would go like extrapolate to N cubed. That does not make sense. It's not useful, but it does not make sense. So maybe it's actually the right answer is N over epsilon. That would scale nicely. That seems like a reasonable conjecture. But it's a wrong conjecture. So really we can prove a lower bound that says that you need N over epsilon squared. It depends on epsilon has to be quadratic for such a guarantee. So this was first shown in a paper that I had with Andoni, Chen, Shin, Woodruff, and Zhang. And there's like a simpler proof, very nice one, and also gets an extra log N factor by Carlson, Kola, Srivastava, and Trevisan. And this says the following that if you're trying to do this sketch of all the cuts, all two to the N cuts, any sketch, you're not necessarily trying to construct a graph G prime. You can represent it anywhere you want. For example, three, and you take something from this graph plus something from this graph minus the third graph. Anywhere you represent your data, you really need at least N times log N divided by epsilon squared many bits. If you're representing all the cut values with this approximation one per epsilon. Okay, so there's a lower bound. Information theoretically, there's a lower bound. Information theoretically proof. There's a lower bound on how much information you need to store the graph to be able to to report all the cuts with some approximation. Okay, so here's like one immediate corollary of this. So if you wanted a sparsifier, cut sparsifier, a graph, so not just an arbitrary sketch, but a graph sparsifier G prime, then that graph has to have at least N over epsilon squared edges which matches the construction I showed you in the previous slide. Okay, for graphs G prime, the optimal size is really N over epsilon squared. Why? Because if there was if you could always do it using less than N over epsilon squared edges, I could just encode these edges. Encoding an edge, you have to encode the two endpoints, that's log N bits for each one, and the weight of the edge. And if you do a simple calculation, log N bits is enough again. So basically the size of the encoding would only be like the number of edges here, times log N bits per edge. And if the number of edges is always small, then you contradict the theorem above. So you derive immediately a law on combinatorial objects. Okay, so let me say it differently. The corollary is about the size of a combinatorial object, cut sparsifier, which is somewhat analogous to spanners if you know that. The theorem above is about information theory. It says how many bits do you need? Sort of like a distance oracle. You have some data structure that is so many bits. Okay, so of course one implies it's very natural. But you get tight lower bound for the combinatorial object by using information theory arguments. Okay, so we cannot shave off this epsilon squared to epsilon. But actually we can do it if we relax the requirements. Okay, so if we replace the requirement with a randomized guarantee or randomized requirement, then actually we do that. We can improve. So what is the improved relaxed guarantee is that if you're trying to estimate a particular cut, ssbar within one plus epsilon, you have high probability of success. So let's say high probability is three quarters. Fix it. High probability is three quarters. So for every specific cut, I'm going to have high probability of success. Of course this three quarters is easy to improve by just you repeat say log n times and take the median result. That would improve the probability or push down the failure probability from one over four to one over poly n. If it's one over poly n, then you can take union bound over poly n many cuts. But if you have two to the n cuts, choices for s, you cannot take union bound over all of them. Okay, so up to logs, this is the same, like three quarters or something, but if you want to get all the cuts, then it's not good enough. Now this guarantee, this randomized guarantee is called, we call it for each guarantee. This comes, comes from the name, this name comes from compressed sensing where they have this for all and for each, which I think usually in English is the same, but in compressed sensing they're different, so we're using that terminology. It's like a for each guarantee. And then what we could show in that same paper is that actually for the for each guarantee we can build a sketch which has size o tilde of n over epsilon. So now we do have some log factors, so perhaps it can be improved, but the dependence on epsilon, that was our emphasis, is actually linear and not quadratic. So it bypasses the lower bound by basically relaxing the requirement. And this sketch is actually this is a sketch, it's not really a graph. So this sketch is, doesn't give you just one graph g prime. If you look into it, what it does is basically something like this, you get a sub graph g prime and a list of all the vertex degrees. And then you do some calculation between the two to estimate the value of a cut. In like one line, why is it important? Well, the way you construct g prime, I told you before in the cut sparsifier is by sampling. So you sample these edges. So let's look at the cut, we have one vertex and a few edges and you sample these edges, sub-sample the edges. Then we have some variance. And this variance is going to be relatively large and that's what usually gives this epsilon squared. When you have this epsilon to get epsilon error, that's like kind of like standard from various calculations, to get epsilon error you need one over epsilon squared many samples. That basically means that you need to, the degree of that vertex is going to be one over epsilon squared and that store for every vertex you get n over epsilon squared many edges. So how can we work around this? Well, the degrees we store exactly. So therefore we don't have this error here. And what happens in other cuts? If it's not just like the degree of a vertex, it's not like a cut one vertex and all the other ones. Turns out that here you can actually do better because for example if the cut is like n over two vertices and n over two vertices, there are like going to be many, many, many edges here. And then when you sub-sample, actually it's going to be highly concentrated. So the really the problem with concentrations is only with values that are small. So I mean this is like a very, very high level argument. It's not the way the proof works. But really the problem is really that single tones, one vertex and all the outgoing edges, these are the most difficult ones. And for this we need a different argument or using the exact degrees. Okay, now I told you that it can amplify the probability and it can actually extend it also to spectral queries. But I'm not going to go into that. It's another extension of cuts that was also in the previous slides that I don't want to get into. So I want to keep everything the discussion here all about cuts. Here is like an example application. Yeah, suppose you have two graphs. This is like a distributed instance. Suppose you have two graphs, one into two different machines, G1 and G2. And you want to sketch them and then compute the minimum cut for the union. So the two graphs are on the same vertex set and basically you're doing the union of the edge set. And you want to compute the minimum cut. Okay, so if you use the cut sparsifier from before, you can do that. You estimate all the cuts in G1. You send that. You estimate all the cuts in G2 using a sketch or cut sparsifier. You send it to a center or something. And instead we'll take the union of these two and all the cuts are now approximate within one plus epsilon multiplicatively. And therefore in particular the minimum cut would be approximated within one plus epsilon. So you can optimize on these union of the two sparsifiers. And that would give you a dependence of one over epsilon squared because you want to use these cut sparsifiers that have dependence on, quadratic dependence on epsilon. What we could do using the sketches is actually get it down to one over epsilon. And the idea is like this. You do the same thing we did before for G1 and G2, but for a constant epsilon. So epsilon equals 0.1. So you get this 1.1 cut sparsifier. So that only gives you a constant approximation for the cut. Now what happens is that there is a theorem by Karger mentioned here from Karger 2000 that says that the number of approximate min cuts up to say factor 2 is only polynomial in n. So from this approximate object we can come up with n to the 4, the 4 candidates for the cuts. And we only need to get a high accuracy bound on these candidates. And for this we use the sketch that I showed you before. If we amplify the success probability to be strong enough to withstand polynomially many queries, then we run these queries on these n to the 4 candidates. So basically I don't want to go into the details, it's not complicated, but using a constant like 0.1, 0.1 epsilon equals 0.1 in the usual cut sparsifier. And then a high accuracy sketch, just like I showed you before, which requires smaller space. Okay. So this was about so far we discussed approximating all cuts. Now I'm going to change a little bit to a slightly different topic when I have only a subset of the cuts that I care about, and you'll see in a minute. This is like another set of results and questions that involve this different concept. Okay. So if you're lost in the previous slide, now you can come back and on this slide it's like a new slightly different setup. Suppose we have this graph, same as before, but now we have K important vertices and we're going to call them terminals. Okay. These are important for us. Maybe the graph is huge like the entire internet, but we, I don't know, only own 100 of them. Anyone understand the connectivity between these 100? Of course the routing between them is not using our edges, but they go through the internet. So there's some connectivity between them that goes through the other other vertices. You can think of it as like a road network where you have like, I don't know, gas stations you want to go from one to the other, but I don't want to think about distances, so maybe gas stations is not the right example. Okay, but I don't want to think about that. Okay. So we have these terminals, the only K of them, so I'm going to use small K for the number of terminals and capital K for the set of terminals. So it's almost the same. So in the picture before, this white, this black, black vertices, these are the terminals. And now let's look at the only cuts that involve the terminals. So what has been cuts that involve the terminals? I want to take the set of terminals, partition it into two in an arbitrary way. So you're going to have S and S bar. So S bar means just all the other terminals. I want to find the minimum cut between them. So what does it mean to find the minimum cut between them? So you know S is going to be on one side, S bar on the other side, and now you have to find a cut, minimum cut that separates them, but minimum cut in G, it means that you have to decide what to do with the other vertices. It's an optimization problem. Okay, so in this picture, you have the yellow ones are fixed to be S, but you want to find a minimum cut where you have to decide what to do with all the remaining other green vertices. Where to put them? Okay, so it's an optimization problem. Actually, it's not a difficult one. You can think of making all of S a super source, all of S bar a super sink, and then it's like the standard minimum S T cut, single commodity flow. We've seen that. Typical exercise in undergrad. So I want to call these terminal cuts, because we have the terminals, we want to somehow partition the terminals in a particular way, but of course we have to cut the entire graph. Yeah. Yeah, so here I'm coming to this. Okay? So if you know S, like you've given S, basically I'm going to call it like here min cut. Okay, so given I'm not going to write down S bar explicitly. That's in this notation. It's within S and S bar. You want to find the minimum cut, but it's an optimization problem for the other vertices. Okay? And now the way I'm going to think about it is like this that's going to answer your question. So the set of terminals is known in advance. It's fixed, but I'm going to have queries. I want to optimize. I want to optimize for this S versus an S complement, or maybe another S and S complement, like left and right or top and bottom or whatever configuration. So I'm thinking of it as having many choices for S. How many choices for S do I have? About two to the K, the number of terminals, okay? So it's an optimization problem. Now it turns out, you know, when I was thinking about it, I found out that there's actually people who have thought about it before, and they call this concept a mimicking network. So what is this concept? I have this huge graph G with only K terminals. I don't care about all the other vertices. I only care about the terminals. Like how much traffic can I ship from this subset to this subset of the terminals? So what they came up with, they call it mimicking network. So it says that if you only care about cuts between the terminals, then maybe I have a smaller graph G prime with the same properties, exactly same min cuts. So you want to replace G with G prime, such that all the minimum cuts, like for all S, the minimum cuts are the same. So that answers your question, right? The terminals are fixed, but I have this guarantee for all subsets of the terminals. So like this for all there is like over two to the K many requirements. Okay? So it's equality, it's exact. So here is like a very simple illustration. It's like a very, very simple example because it's a tree and very simple one. So suppose I have the graph G on the left and the black vertices are the terminals, A, B and C. Okay, and I have edge weights. And I want to like simplify. I want to get rid of the as many vertices as possible. I only care about cuts that involve A, B and C. So for example, if I have these vertices here, these edges 2 and 5, nobody is going to bother cutting these edges, removing these edges. If you want to separate between A, B and C. I don't really need these two vertices here. Similarly, if I have this, look at this A and B. So there's an edge of cost 3 and an edge of cost 9. Maybe you don't want to separate between A and B. That's fine. But if you want to separate between A and B, it's enough to cut one edge. It's enough to cut the edge of cost 3. You're never going to cut the edge of cost 9. So basically in the G prime here, I don't have these 2 and 5 edges from before. And between A and B, I actually connect with one edge of cost 3. Okay? And there you go, I mean similarly. Okay, it's a very, very simple example. But of course there are like some things that you don't need. Like if you have a path all the way in this 3 and 9, it was basically a path. You don't need the entire path. It's enough to have one edge instead of the path. That's the general principle here. But I want to do it in general. And these people who came up with this concept, Haggira, Katayana and I think Nishimura and Ragde, so I hope I pronounced their names correctly, they came up with a concept and they proved that if you have a k-terminal network, you can replace it always with a network. Okay, I don't want to say smaller, but hopefully it's smaller. Its size is only 2 to the k. Okay? So it's on the one end. It's in the second bullet. It's independent of n. On the other hand, it's pretty wasteful, right? 2 to the k, I wouldn't advise anybody to try it at home. So, in particular, it's more wasteful than just like listing all the cut values that we care about. These are 2 to the k cut values. So if you just want to store it somehow, yeah, just store a list. If you insist that it's going to be a graph, you want to store it not just as a list, you want a graph that has the same properties, then basically this is the bound that we know up to date. The best one that we know up to date is 2 to the k. It's exponentially more expensive than the information that you really need. Okay? And that's open. That's going to be the next slide. That's still open. Between exponential and doubly exponential. It seems like baffling that it's still open, but okay. So this gives you like a sketch, because the graph gives you a sketch and it's exact. Originally it was brought for directed networks, but I'm going to focus on undirected networks. And what is the argument for this 2 to the k? It's actually quite simple. So I summarize here it in two lines. So you look at all these 2 to the k cuts that you care about. What do these cuts really do? So basically every cut partitions the vertices into two sides. Let's call them left and right. So basically it gives like a labeling of the vertices by left and right, by one bit. Left and right or 0, 1, something like this. And I have 2 to the k of this. So basically every vertex I can associate to every vertex like a vector of 2 to the k bits. Left or right, left or right, left or right, 2 to the k bits. And now, so basically it buckets all the vertices into 2 to the k, into some number of buckets. What is the number of buckets? 2 to the 2 to the k. All the vertices with the exact same like signature. Same label. The label is 2 to the k bits, so the number of signatures is 2 to the 2 to the k. And the argument is that if I merge vertices with the same label, the same signature, nothing happened. Because they always go to the same side of the cut. So if you merge them, what would be the reason not to merge them? Because if there's a cut where one vertex wants to go to left and the other vertex wants to go right, and by merging them you do not allow it. If you're going to the same side anyway, you might consider them as one vertex. That's the whole argument. So just merge all those with the same signature, you get 2 to the 2 to the k vertices. And all the cuts that you care about do not change. Okay. So here are the questions I told you. This gap between W exponential and singly exponential is still open. The only lower bound that we know is exponential 2 to the omega k. This was work I did with a student of mine, Havana Rika, back in the time when the penalty was proved by Kahn and Raghavendra. That's one question. Another question related, the open question related to these things is can you do something better for special graph families? What would be like a special graph family? Of course planar things seems very reasonable. So for planar, actually in that paper with my student, we improved it to something very close to exponential in k. So very close to 2 to the 2k. So it's exponential in k, singly exponential. And a bit later, there was a lower bound a couple of years later that showed that it's tight for planar graphs. The first lower bound I showed you above was not planar. So now for planar, we're basically know the answer. Okay, yeah. And there are like other cases, what happens if it's planar and then all the vertices, all the terminals only touch one face or only a few faces. They can do even better. What we don't know, how to do it completely, we don't know, is how to do excluded minor. Graphs that exclude a fixed minor. So you know, generally if you have something for planar, I say, okay, the next step would be excluded minor. You'd believe it's true, but the techniques that we use are completely incompatible with excluded minor. They really use the drawing in the plane. So I don't know how to extend them. So it means that we need new techniques here. And another thing we don't know, what happens if you look at the multi-commodity flows. So I told you it's like a stronger requirement. It's something I don't want to go into the details, but this is completely open. We have absolutely no idea what to do there. Okay, so moving forward to another question, is this was, okay, it was in the previous slide, but it was all exact, all the values, the cut values were exactly equal. That's a strong requirement. That's why we got this exponential and that forbid size. Can we push it down by allowing constant factor approximation, or maybe one plus epsilon even? Okay, then maybe we can get around this bottleneck of exponential and k. That's a reasonable wishful thinking approach, right? So allow approximation here. So by large, I can say this is still open. Okay, we don't know how to do that. We have very, very partial answers, like for a few intermediate questions. Okay, so let me sketch what I showed you before about these planar graphs here. No, so here. This 2 to the k upper bound. I'm not going to show you all the slides even though it was a sketch of the argument. I'll skip it. I'll do it even faster. To finish in time. So basically, the argument is that for planar graphs, you can do something better than this bucketing that I showed you before, but 2 to the 2 to k. This is the theorem. What is the algorithm? It's almost like the same algorithm. Okay, you look at all the cuts that you care about. Think of removing them from the graph. So you only see things that always wanted to be on the same side of the cut. So if I have this mental picture, I have the graph trying to separate every set that wants to be together in the cut. And I have multiple such sets. I draw all of them together. And then I look at these areas in between, like this area here, in between. All these vertices want to be together anyway. I can just merge them. So in graph language, it's like contracting these edges. It's a connected component. I contract it in one vertex. Okay, that's going to be the argument. It's essentially the same argument as before, same algorithm. But now we're going to analyze it for planar graphs. So here is the ideal. What are these circles I drew? So if you have a subset S of vertices that you care about, E sub S is like the subset of edges. These are like the cut, the edges that you want to cut if you want to separate S from S bar. This is like the optimal cut. So you fix such cut. Maybe there are ties. If there are ties, you break the tie or something. You fix such cut for every S. And then I want to draw it. Now if I draw it, if the graph is drawn in the plane, then it's well known that the dual of a cut is actually a cycle. In this case, it's not necessarily one cycle. It's a union of cycles because of some technical reasons here. So if you use planar duality, you can really draw it as a cycle or a union of cycles. So I'm just going to go over all these sets S, two to the K sets S. For each of them, I'm going to draw the cycle. It's actually not going to be one cycle. It's going to be up to K cycles. I'm going to draw all of them. And that's the picture that you see. And it basically partitions the vertices. And the question is, how many areas do you see in the picture? Can you bound the number of areas that you see in the picture? Areas, contiguous regions that you see in the picture? That's the question. If you can bound this, then you're done. So here we have two lemmas. And let's say basically the lemma number two, if you look at this union of all these cycles, the plane is partitioned only into something like two to the K, about two to the K regions. That's the main lemma. And the high-level argument is like this. You look at this picture. Now, each of these is like a cycle, right? I do one of them. So you have in the cycle versus of degree two. These are not interesting. They do not really tell us information about the number of regions. I can bound the number of regions by looking at all the versus of degree three and above, like strictly more than two, and summing the degrees. Basically, every time you have like a vertex of degree, like here four, you have four regions around it. If I sum up all the degrees, I upper bound the number of regions. So I want to bound this sum of degrees, but only for high-degree vertices, or more than two. And this you can do because you say every such intersection here, how did I get this? It was some cycle coming from S and coming from some T, or S prime. So I have to say how many pairs do I have? Only two to the K, choose two. So you do this charging argument. You have to make sure that you don't charge somebody over and over again too many times. But that's how you bound these things. The overcharging is OK that you get here, or something. OK, so that's the argument. And that shows you something about the planar network. How do you construct things for planar networks? Basically, the algorithm is still very simple. That's my, say, concern. OK, so now what if you allow bigger approximation? Just like before, we had exact, maybe we ordered one, we don't know what to do, and maybe logarithmic. So if you allow a logarithmic approximation, then actually you can come up with a graph, always come up with a graph G prime, that is a graph only on the terminals. You don't need any extra vertices. OK, but now this is going to the other extreme. The graph is very, very small. The object, the sketch, but the approximation is OK. I don't know if to call it large. I don't want to say terrible, but it's much larger than before. OK, in this case, it's actually below log. It's log over log log. OK, that's always possible. And if the graph is planar, then it actually gets better. We call this usually the quality, this approximation factor, or accuracy. And in planar graphs, we can get to order one, but in general graphs, it's logarithmic, in K. Not in N, in K, the number of terminals. And we don't know if it's optimal, the current lower bound is like written here, square root log K. So there's a gap there, but it seems like constant you cannot do. Constant you can factor approximation you cannot achieve. But maybe if you allow a sketch, maybe you can do better, arbitrary sketch, not necessarily this kind of guarantee. For example, instead of having a graph only on the terminals, which has K vertices and maybe K squared edges, maybe you can have a graph with two K vertices, or K cubed vertices. That's still relatively small. We don't know what happens in that regime. We were hoping that with K cubed vertices, you can get order one approximation. OK, this is open. I don't know how to prove that. I think it's very interesting to prove such a thing. What we are able to show is like one example, if the graph is bipartite, they can actually achieve high quality, 1 plus epsilon, and the size of the graph is only polynomial in K. We didn't really try to optimize it. Maybe it was K to the 5 or something like this. So jumping here to the open problems, it would be interesting to show more examples where you can get high quality. This is only for bipartite graphs, this result. It's very, very good, but only holds for bipartite graphs. We don't know how to deal with planar graphs. We don't know about lower bounds. This is what I mentioned before. So I have some slides about the first theorem. What do you do in bipartite graphs? So just the algorithm without the analysis, because I'm running out of time. OK, so here is the high level argument about the algorithm. It's very different from the algorithm I've shown you before. That's why I want to show you the algorithm. So suppose this is the graph. So you have the two terminals on the side, only two terminals, and only the other vertices in the middle. The algorithm that was used before was basically edge sampling. You sample every edge independently. This would be very bad if you think about it as a flow argument. You're trying to maintain the flow from the left to the right. If you sample independently, you're not going to have paths. So that's going to perform very poorly if you're doing independent sampling. Instead of what you want to do, you want to sample paths. Once you sample the left edge, I want to sample also the right edge to maintain the paths, this correlation. And because it's bipartite, then we can do something. OK, so basically sampling paths, instead of sampling pairs of edges, instead I'm going to sample the vertices in the middle independently. But if I sample the vertex in the middle layer here, I actually sample all the edges incident to it. So basically it translates to sampling non-terminals. And whenever I sample the non-terminals, I keep all the edges around it. But I, of course, increase their capacity, inverse proportional to the sampling rate. So you want the expectation to always be like 1. So if I sample with probably 1%, if that vertex is sampled, all the incident edges are amplified, their capacity is amplified by factor 100. OK, so an expectation, everything's preserved. And it's all about analyzing the concentration. OK, so I'm going to skip this argument. The whole point is to decide on this probability piece of V, how to sample the whole thing. And you have to analyze it when we're using important sampling. So basically you have to decide about the probability by seeing how important is it. If it's a vertex that has, it's like the only vertex that connects some terminal 1 to terminal 2, then you have to keep it. Because if you do not sample it, the flow between these vertices or cut becomes zero. Right, so basically that's the important sampling argument. One last thing is that it has a surprising connection to hypergraphs. OK, so recently there was some result. I tried to extend these non-results about graphs to hypergraphs mentioned here. And we can have the same definition. The capacity is preserved. This is edge specification. You have a hypergraph. So I'm going to make a connection between the two parts of the talk now. So I'm going back to this edge specification. You have a lot of edges. You want to sparsify them. So you can sample down the edges and maintain all the cuts within 1 plus epsilon. OK, you have to say what is the definition of a capacity of a cut in a hypergraph. Right, because now an edge touches many, many vertices. So the definition is like this. You pay the capacity of the edge if the edge touches intersects both S and S bar. But you pay one. It doesn't matter where the edge touches, say one vertex from here and, I don't know, a hundred from the other side, or it's about 50-50. OK, you pay units like the capacity of the edge. So what we're able to prove is something like n squared many edges in a hypergraph. And it's open whether you can, that's I think a fascinating open problem where you can improve it to order n. Because the only lower bound that we have is basically order n from graphs. And we don't know whether hypergraphs are different from graphs in this sense. Now this proof basically works. OK, there are several proofs by now. One of them at least works by just like using what you know for graphs and just extending it like repeating the same steps. You have to verify every step works, sort of. But it has connection to vertex sparsifiers because you can represent a hypergraph by a bipartite graph. I'm sure all of you have seen that. You put the vertices on one side like v of h on the one side, the hyper edges on the other side, and you put an edge if they're connected. And now it looks like a bipartite graph that you, and these are like, I'm going to call these terminals. I want to partition the terminal into two sets and I make sure that something happens correctly. If you look at it, this something is very, very similar to what I had in the previous slide about these terminal cuts. Not exactly, there's a difference, but the difference is small and the proof that I had before actually works also here. It can be easily modified. Okay, so that means that you have like yet another proof for this hypergraph by using what we had in the previous slide with these important sampling arguments. Okay, just works immediately with just adjusting the quantities. Okay, so I'm going to stop here. Basically what I wanted to tell you is that there are very interesting combinatorial features. You want to estimate using a sketch. And there are many different questions here, but for example, the size of the sketch compared to the approximation. Of course, better, better approximation will require probably bigger object, bigger sketch. We want to understand this. Another question I really don't understand is that the difference between graphs and non-graphs. So you have a sketch which is basically like a graph G prime, or something else like a table of all the cuts, or a graph plus a list of degrees. This is a non-graph anymore. So somehow, when our graphs are more restricted and more general object of the data structure helps you. And finally, I talked about cuts here. Sometimes, and many times, there are connections between cuts and distances. Okay, here I didn't touch upon this, but they do not seem relevant to all these results, but only to related results in a sense. So it feels like we need to find out yet another connection here that we don't know yet. Thanks. So one of the things you mentioned in the end is a degree sequence alone. So a degree sequence alone would probably be a sketch because it gives you part of the information about the graph without telling you the whole graph. So there could be things like determining whether the graph is connected or disconnected, sometimes connected, sometimes disconnected, always connected, always disconnected, and other things based on just a degree sequence. Probably that's not going to tell you whether it's connected because the degree sequence is two every time, all the time, to be one Hamiltonian cycle. So the degree sequence can give you the information that it is always connected, never connected, or like the example you gave is sometimes connected, sometimes disconnected. So I was... And other things based on degree sequence are looking at recently, in some sense, qualifies as graph sketching. Yeah, I think definitely. I think I was looking at it from the other angle of, like, I know what are the questions, I'm trying to figure out what is the sketch. You're asking somewhat like in the opposite direction. Yes, we're all going to do the degree sequence. If I give you this information, what can you deduce? Right? Given information, what... Like uniquely or not uniquely, etc. Yes, yes. Yeah, I have not thought about it, but of course. Are there works on... If you update the graph, how the sketch would be updated? Yeah, very good question. I think in the... Like a long time ago or whatever, there was no work on this, but recently people have looked at the dynamic algorithms, trying to update this, and there are some recent work on this. So of course they do not improve the best bounds. They're actually doing suboptimal in terms of like the size, maybe like extra log or something. But they're trying to bound the update time from reconstructing everything. So there are like two or three papers, I think, about it. Recently, like last two years. Basically, you don't want to reconstruct everything. Sorry? Basically, you don't want to reconstruct everything. Yeah, also like in my talk, I focus more on the existential results, and some of them are non-constructive, or will take exponential time. So definitely that's bad, right? Like even reconstructive will take exponential time. But most of them... As I mentioned, the one case where it's not obvious how to extend it, all the other things are actually algorithmic immediately. So do you think the graphical representation might have an advantage for such a situation there? Because if you're extracting the information and making it abstract? Yeah, that could be a scenario. I mean, I don't know of such a thing, but it could be that if you insist on a graph, it's difficult to update. Or if you allow arbitrary data structure, so you relax the constraint, then maybe now the updates are easy to implement. Because you're not restricted by this graphical structure. So that could be a way where this extra freedom actually buys you, I say, being dynamic. Like in this case, storing the degree sequences helped in saving... Instead of 1 by epsilon squared, you got a 1 by epsilon. Do you expect this idea to work in other settings, or have you tried it elsewhere where... Because this 1 by epsilon squared kind of comes up almost everywhere where sampling is happening. Yeah. It kind of took me by surprise as well that it worked. So we looked at a few other examples. You have to find a place where there is a way. Like even here, you have to relax the requirements. You have to find a place where, say, you have a bound of 1 over epsilon squared, and you suspect it's not optimal. Which was the case back then in that case. In many cases, it was so obvious that 1 over epsilon squared is actually optimal for that particular setting. So it's a little bit difficult to find such a potential question. So in your case, you're trying to actually use limited representation from the point of view of reducing the data space. I'm not sure how it is directly related, but there is an old thing I saw a long time ago on actually using more space. So it's counterintuitive, but never there is... So there's this thing called some graph reconstruction conjecture where you repeatedly drop all the n vertices of the graph in turn, drop with replacement. So you get your n subgraphs with 1 vertex each dropped. And without the labels, of course. Can you reconstruct as a conjecture? So that is more like a cryptography. You're hiding the information thing when you can get it. So it just occurred to me as interesting because there you have an expansion of the data set. Right, so sometimes you do that especially to get speed. For example, if you have preprocessing, you have a bigger object but then you can answer queries faster which I didn't talk about speed here because I don't want to make the story simple. Of course it's important. I treated it as a second class object. In those cases are really interesting. I find it interesting. I think it touches upon this pink thing about combinatorial and information theory. But it's not in the focus of the talk. But I find it very interesting as well. Thank you.