 OK, so hi, everyone. And Arthur, while you set up, let me just mention that we have a few talks lined up. I believe the next talk is not yet announced. It's going to be by Shai Moran in two weeks. Tell us about query complexity of subsetsum and related problems. And I think, Arthur, are you ready? So it looks like you're ready. I hope I'm ready, yes. OK, so let me just first introduce you. We usually go around the table, so we get to see all the participants. But everyone is here just waiting to see you, and we are already a bit late. So let me just immediately start and very quickly introduce you. So we're very happy that, despite all technical trouble, we managed to have Arthur, to my today, from Warwick. Arthur is a professor in the Department of Computer Science, also the director of the center of the script mathematics and applications at the University of Warwick. He's an expert in design of algorithms, specifically randomized algorithms. And today, he's planning to tell us about run compression for polymatching algorithms where it could also appear at the stop 2018. So without further ado, Arthur. Yeah, thank you very much. It's really a pleasure here, and I really apologize. Can you hear me well? Yes. Just checking. Excellent. So apologies for technical problems. It's really great to be here. I never gave a talk like that before. So let's hope now this will be smooth. So this is a joint work with Jakub Ponski, Aleksandar Mondry, Slobodan Mitrovic, Krzysztof Onak, and Piotr Sankowski. I saw that mentioned it will appear at stock. And so what this talk, let me just see the slides will be moving on good. So the main idea behind the results that I want to present to you is kind of a general question that you would like to understand is how can we speed up computation in distributed and power setting? And he will be focusing on to us. So we'll be trying to take some very representative model, which is the model of massive power computation. I will define this in details later. And to want to take a benchmark graph problem, which is maximum matching, we'll try to approximate maximum matching very efficiently in power setting. And kind of in this talk, I'll try to briefly tell you what are the challenges, in general, of distributed and power computations. And at the same time, it's really a very powerful model. I'll try to stress that we can achieve some, in my opinion, surprising results. OK, so let me start with the model. So the model has been introduced by Karl of Suri and Basilevitsky in 2010. And in this model, we have n items. So we have some input arriving or not. So we have some input of size n. Then we have some set of machines. And we want to solve some problem on this set of machines. Now, in the MPC model, we'll be assuming that we have this input to be allocated on machines. And then each machine has some limited space. So this will be capital S. And then these machines can perform some computations, OK? So initially, each machine should receive some n over m items. Then they will do some computations. And then once the computations are done, the machines can send something to some other machine, can communicate with other machines. And the amount of data they can be sending is proportional to the space. So if the space of machine is S, then every machine can send S information outside. And then the way how the machines are sending information is that so we have the same set of machines. So these are machines we had before. And now this is the new step, the new phase. So they are the same machines. And every machine can send to arbitrary other machine, can send the input, the computations, or the output of the computations it perform. And then in the next round, we'll be doing the same. So in short, what we're doing is in each step, every machine does some local computations and then send the information it computes to all other machines, OK? Now this has been inspired by MapReduce model. And as people are saying, it's a sleep subtraction that hides some technical details of MapReduce. To me, this is really the very natural approach how we can depict parallel distributed computation. So we are running in rounds. In each round, we split the input into small pieces. Each machine, each computer performs some small task on the small part of the input. And then we have to combine the partial solutions. And we hope that the end will be able to obtain complete solution for the whole problem. Now the big issue that people have been considered here is really what is the amount of space used by these computers. So in the original paper by Karl of et al, they were kind of saying that each machine, so that there are almost linear number of machines. So we have input of size n and we have n to 1 minus epsilon machines. And each machine has almost linear space. So total space was really almost quadratic in the size of the input. So once again, n is the input size. We have n machines. And now we are saying that kind of total work. It's almost quadratic in the input size. Soon after people, Karl of et al, they show that some problems can be easily solved in this model. And a refined version that people have been studying later was really what can we do in near linear total space, which means the total space we are allowing is really proper. Sorry, total space, which is number of machines, time, space per machine, is linear or almost linear in the input size. OK, so OK. So here our goal is as follows. We want to have as small as possible number of rounds. We want to use as small as possible amount of space per machine. And we want to perform fast local computations. Now, the focus of this talk and actual of quite a lot of research here was to be focusing on graph problems. So in graph problems, our input is a graph with n vertices and m edges. So the number of machines we would like to have to be proportional to m over s, where m is number of edges in the graph and s is available space per machine. So kind of our goal is to use total work to be proportional to really the input size. OK, now I will describe to you the problem one to study. But I still want to argue why you are especially interested in graph problems and graph matching in this particular. So I believe that graph or problems on big graphs is maybe the most natural benchmark for parallel distributed models. So kind of you would like to assume that you have some huge graph with 1,000 millions of billions of nodes. And because this graph is so big, you have to use power computers, power computing, mark produced, whatever to be able to solve graph problems efficiently. And as you have seen, as we have seen in the past many times, matching is kind of a benchmark problem for graphs. And actually, as we're also showing this talk, it's also a great test bet for many new algorithmic ideas that we first are developing for matching problems and then we can extend to many other problems. And I believe that this is also helpful to understand the power of the model. OK, so matching problem, assuming all of you know that matching problem, our goal is to find maximum matching, which is matching is set of these joint edges. One general remark that actually we do know that maximum matching is always of size at most twice the maximum size of the matching. So sorry, it's actually half, so it's to approximation of the maximum matching size. So quite often what people have been trying to do in the past when they wanted to find approximation of maximum matching, they were saying, let's just try to find approximation or actually even maximum matching, either maximum matching or it's good approximation. And to some extent, this is the underlying idea also in this talk. OK, good. So people have been starting matching problems. And already I don't know, eight years ago, seven years ago, in a sequence of papers, people show that if the space per every machine is bigger than the number of vertices, is n to 1 plus some arbitrary constant, then in a constant number of rounds, we can find 1 plus epsilon approximation for the maximum matching. So this was by Latansi, Mosley, Suri, and Vasilevitsky. And then Anna and Guha, I think they did it for weighted problem, for weighted matching. If we have just space, which is at least linear, then it has been known for a long time that we can always find two approximation in logarithmic number of rounds. Now, I don't know how many of you are familiar with either PRAM algorithms or distributed models. But this algorithm is really, I mean, the first thing is, we can just find maximum matching in parallel, very efficiently, in logarithmic number of rounds. This has been first shown by Lube, Alon, by Itai independently, and Israel, Itai. And they had very cute algorithm. Actually, first it was randomized, then people show that one can obtain a deterministic algorithm. Where in parallel setting, we can find a maximum matching in logarithmic number of rounds. Sorry, can I? I think I was confused by omega 1, which? So in this space, 5 is omega 1. So I don't understand what the difference is. So let's just say in that way. If you want to do it on PRAM or in some distributed models, you can do it with constant space per every process of every machine. But if you want to do it, if you want to solve, so as you see, there are two results. First is that we can solve maximum, we can find maximum matching on PRAM and distributed models in logarithmic number of rounds. But if we want to simulate this on the model that we're studying here, MPC, then we need to have larger space for communication between the processes. So when you write space n to the omega 1, you mean what you mean exactly? You mean constant space? I don't know anymore. So as I'm saying, for PRAM and distributed models, it's constant. So let me say the following things. So the reason why I'm putting this result here is that we know on MPC model how to simulate any PRAM or distributed computations. But for that, we need to have slightly bigger memory. And how big a memory is I don't remember anymore? I think part of the confusion is that in the first bullet, you have 1 plus omega 1, and in the second bullet, you have omega 1. So let's just say, I mean, I don't show apologies. I don't really remember exactly the bound here. The point is, because what usually you would like to have in these models, what was that? I don't remember details of the simulation. But I think that the simulation requires on the MPC model that to simulate a PRAM, you need to have super constant space. But having said that, I'm sorry. I don't remember. I guess I'm asking something very stupid. Is it n to the epsilon? Is it n to the 100? What are we talking about here? It's epsilon. The omega 1 is a small constant. It's quite a constant. So again, the first result is saying that if you have n to 1 plus epsilon space, so strictly more than linear amount of space, like with integer sorting, if you have n to 1 plus epsilon, if you have all the numbers. OK, that doesn't matter. What I'm saying is, so you have super linear space, then you can do everything in constant number of rounds. If you have, let's just say, I'm not sure whether this is, if you have non-constant amount, but small amount of space, you can come up with two approximation in logarithmic number of rounds by simulating PRAM or distributed algorithms like, for example, by off looping. OK, here our goal, and here there is one more thing. So here our goal is really to think about the model that people have been considering to be most challenging, most interesting when the space is linear. And again, what I would like to say here is that if I think about map-reduced model, we are thinking that we have small number of machines, but these machines have huge memory, like they have super constant memory. But it's kind of infeasible to assume that the memory is really much larger than linear in the input size. So this is kind of the setting that we are interested here. So we have linear number of vertices, so number of vertices n, and we'd like to say, can we do anything within space which is linear with respect to the number of vertices? Number of edges may be quadratic, OK? And our goal is to understand that which is following from standard simulation results, which is saying that round complexity is logarithmic, whether one can really improve it. So once again, if we're talking about, let's just take Piram algorithm and simulate it in other model of parallel computing, like, for example, MCP, MPC, then we have logarithmic number of rounds. Because Piram algorithms, they have logarithmic number of rounds. And our question here is, can we beat this round? Can we obtain lower round complexity, lower number of rounds, to find, for example, good approximation of map-reduced machine? And here will be results, hopefully. First, there will be something. Just to make sure I understand the model again. So when you say space, is that space per machine? Is it total space? Yes, space per machine. OK, and the. OK, no, let's just say it that way. This is the amount of input that machine can store, can obtain. What I'm saying is, to some extent, we're ignoring the computation on a single machine. But what I'm saying, I can send to every machine at most all of n information, so that the machine can store only such part of the input. So those would be, but the graph might, if the graph is parsed, then it's easy, right? Because then one machine can do everything. Is that correct? Yeah, yes. So to some extent, OK, yes, as you can, yeah. So if the graph is parsed, all these problems are really easy, because, say, if the graph is parsed in a number of edges, send everything to a single machine, a single machine will find the perfect match. OK, the real problem is, what can we say about graphs where we have lots of heavy, high degree vertices, and the number of edges is possible n squared? OK? Right, and then you would need maybe m over n machines. The total space shouldn't be in a number of edges, right? Yes, yes. And so in this case, really the main challenge is we want to keep space low. Number of machines will be changing in every round possible. It may be big, but we need big space, a big number of machines, because the input is big. So you always, so let me go back. I don't know how slowly it will go here. Let me go back to this. OK, OK, one more. OK, so kind of our setting is we have m, capital M machines, where every machine has space s, and total number of machines, of course, must be of order m over s. Does make sense? Right. OK, now let's. So people have been arguing that kind of the most interesting model is really, OK, the most interesting, but also the most challenging, maybe, is that if we have linear space, then this is kind of the model which is capturing a trade-off between trivial solutions on a single machine. We know we cannot do that. But also if you have sublinear space, for example, then you know that you cannot even output the entire solution. So you are trying to think that we have linear space with respect to the number of vertices. Now another issue is, which is I'm not sure how it's important, but this is kind of something that people have been trying to study in the setting of semi-streaming algorithms. So if you are thinking about the golden standard for streaming algorithms, we'd like to have logarithmic space. But for graph problems, we know that this is infeasible. And people know that actually if we have linear space, then linear with respect to the number of vertices, not number of edges, that this is the very first moment when we are able to solve some problems efficiently with small number with low complexity. OK, I will try to actually. So our main result, and in my opinion, the most interesting result, is that what we can show, and I will try to give you some ideas behind this result, that if we have linear space, we can find constant approximation in a little of log n rounds. So we can beat this logarithmic bound, which when I first was thinking about this problem, I thought, wow, it's essentially impossible. So we show that linear space, constant approximation, we don't really need logarithmic number of rounds. Specific result is that our number of rounds is really log log n squared. And this works even if space is slightly sublinear. Now, here the concrete results. So first, I mean, the direct implementation of our algorithm, the way how we are presenting this in the paper, and maybe the way how one should see it, is that we're able to come up with two plus epsilon approximation algorithm of maximum matching, and we're using log log n squared rounds. And the space is n times poll log. OK, now by using good reductions, sorry, non-reductions, we can actually improve it to one plus epsilon approximation. The number of rounds is slightly bigger. Space is the same. And we can actually come up with two plus epsilon approximation for maximum weight matching in the same number of rounds and with space, which is O tilde of L. And OK, and I didn't let me come back here. And actually, so this is kind of the way how one should see the result. We can go down with the space a little bit down. We can obtain space, which is n over poll log. So once again, we have two plus epsilon approximation in log log n squared rounds. We can improve it to one plus epsilon approximation by non-reductions. And we can also extend this to two plus epsilon approximation of maximum weight matching in similar number of rounds. Now, I believe that it's great that we can come up with one plus epsilon or two plus epsilon approximation. But really, the essence of the result is that we're able to obtain constant approximation. And moreover, in my opinion, the most interesting thing is the weight how we are showing this. And so our approach is essentially as follows. We are taking some simple constant approximation distributed of PRAM algorithm and we are emulating and simulating it in a way that we're taking a large sub-constant number, sorry, super constant number of rounds in this parallel algorithm for PRAM or distributed algorithm. And we're able to simulate this number of rounds in a single round on the MPC model. So we're calling this round compression, which essentially you should think in that way. We're taking some parallel algorithm, for example, PRAM or whatever model you like. And we are looking at the number of rounds in this algorithm. And we are saying we can actually take some super constant number of these rounds and we can squeeze it and perform in a single round in our model. That makes sense. I'm not sure whether you should be asking your questions and expect responses, but let's hope it does. So what is really kind of the underlying, so I will show you actually a few variants of simple parallel algorithms. And you shouldn't really think about this as an algorithm on any specific model. You should more like think intuitively how these parallel algorithms will behave. So let's just take the following algorithm. We have an input graph with maximum degree delta. Capital delta always will denote maximum degree. And what we are doing is we are taking all vertices which are heavy, which means its degree is between delta half and delta. And we are trying to find a matching for them. And then once we found a matching, we remove all matched vertices and also all heavy vertices. As the result is, the degree will go down to delta half. And we repeat the rounds with delta half in the next round. So once again, the whole idea of the algorithm is take the vertices with the heaviest degree, with the largest degree, try to match them. And then you reduce the degree. And you repeat with the rest of the graph. It is not difficult to see that, OK, assuming this matching algorithm here will be any reasonable, maybe even any maximal matching algorithm will do. Then you will be able to show that you will have constant approximation algorithm of maximal matching or maximum matching. And you will have logarithmic number of rounds. So this is the basic algorithm that we would like to start with. I'm not saying how to implement this algorithm on PRAM, on distributed computing models. Kind of think conceptually that this is kind of an algorithm where we are working in rounds. And in each round, we are finding some good matching and we are getting rid of vertices with the heaviest degrees, OK? Now our goal is to simulate a variant of this algorithm in number of rounds, which is sublogarity. Or in short, we would like to take some number of rounds here and run them on a single machine independently after splitting the graph accordingly. OK. Well, good. So as I said, my main goal here is I want to show you how to get constant approximation. OK. How to get constant approximation. This was, sorry, apologies. What happened? OK, constant approximation, log log n rounds, log log n square rounds, and spaces like that. And reductions to 1 plus epsilon approximation plus weighted case are standard. OK. So this is the algorithm that we want to study, as I told you before. Let's just try to implement this algorithm. So this is the implementation that we have. This is kind of. And let's just try how we could implement this algorithm on a parallel computer on this map-reduced model, or MPC model. And we'll be using ideas from Parallel Centrum who had some similar algorithms, I mean in parallel setting, for vertex cover. And then Anna Kent Rubinfeldt, who had this algorithm for maximum measure. OK. So we would like to exploit the find that computational single machine are free. And the main challenge is really how to coordinate computations on different machines. OK. So our idea is essentially as follows. We'll split vertices among machines at random. OK. So we have n vertices. Now we are saying, split this vertices proportionally to each machine at random. And let each machine to compute a maximum matching. Does make sense? So we have all these machines. We have the input graph. We're saying, take the vertices accordingly at random. Allocate them to machines. On each machine, solve the problem. And then remove all much vertices and try to collect the information back to the machines. So this is how one could solve this problem by computing. But actually, the useful thing is that, as I said, computation on a single machine are for free, which means we can actually find maximum matching. We can find the best matching for a single machine. If only we were able to analyze the quality of this matching, maybe with this we'll be able to kind of compress number of rounds in our algorithm. OK. So this is our approach. The real challenge, though, is that if we're looking at the maximum matching on a single machine, that this maximum matching will potentially miss many edges, potentially missing many vertices of high degree. Why? Because we have a vertex. It is matched. Sorry. We cannot add any matching on this single machine. But this vertex may have lots of edges to some other vertices from which are right now has been allocated to other machines. OK. So actually, what we'll be always also doing here in this algorithm, we'll try to find some matching. We'll remove all matched vertices. And we'll be also removing some other vertices, which we will be saying, OK, we don't care about this other vertices. OK. Now, and our hope is that the number of other vertices, that the number of these green vertices here will be low. And if you think about it, if we could always argue that the number of these other vertices is proportional to the size of the matching we found so far, this will give us a constant approximation. OK. What I'm saying is the following thing. So we are trying to find the maximum matching. So we already said, OK, we have k edges in this maximum matching. What I'm saying is we can always remove arbitrary number of O of k other vertices because the number of the size of the maximal matching will be proportional to the number of vertices we have removed. And it will be still within the bound we can lose if we want to come up with a constant approximation algorithm. And we'll try to emulate the number of rounds of previous algorithm. And we'll actually try to modify this algorithm. I'm calling this peeling algorithm, but you should yes. So in a typical algorithm that I showed you before, there are kind of few important things that we'd like to have. We'd like to take special care of vertices with high degrees. And for that, we need to have some mechanism which will allow us to approximate vertex degrees, only for vertices which are heavy, which got many neighbors. And then the second thing is to maintain algorithm which will work well in our MPC model, we would like to be able to have selection of random neighbors of this heavy vertices. And if we have this, we can find heavy degree vertices. We can find a matching for constant fraction of them. And our goal is to implement this on the MPC model. So once again, conceptually, once again, the idea of the algorithm is consider vertices with high degree and try to find a good matching which will match a large number of them. For that, finding a matching, the kind of two types of edges would like to have, edges between high degree vertices and edges with the rest of the graph. And for the rest of the graph, we'd like to have a special pool of vertices that we would like to consider for matching with them. And we would like this to be a random set of vertices. OK, where we are. So this is kind of the way how we would like to run this algorithm. So we start with the graph. Originally, initially, the maximum degree is linear. We don't know anything about that. So what we are doing is we are partitioning vertices at random in two root of n groups. Each of the group will have a root of n groups. So root of n groups? Yes, good. So we have root of n groups. Each group will have at most linear number of edges. Because if the vertex set of root of n vertices, of course, it will have at most linear number of edges. Now, what we'd like to do is we'd like to find a maximum matching or very good matching which will heat high degree vertices in this graph, in each group. OK, so once again, we have root of n groups. Each group has root of n vertices and linear number of edges. Now, to see which vertices have high degree, because this is kind of random collection, random partition, we know that the degrees will scale down by a factor of root of n. We have n vertices in total, but we're taking in each group only root of n of n. So the degrees will scale down by root of n. And our goal right now is to find high degree vertices. The degrees which correspond to the degrees in the original graph to be of size between n-half and n. Find a random number of neighbors and try to find a matching which will take care of most of them. Now, we remove these vertices. We remove the matchings and we'll move to vertices with roughly n-half maximum degree. Now, we repeat the same by partitioning vertices into actually root of n-half groups. Sorry, two times root of n. No, root of n-half groups. And we repeat the same process. Now, suppose that we are running this. So what will happen after half of log n faces? Each vertex, so on each machine, we can have up to root of n vertices. So we'd like to model, we'd like to kind of keep track which vertices have high degree in the original graph. But because we're doing random sampling, as before, we're scaling everything down by factor of root of n, we cannot really go down below this number of faces. However, after so many phases, we'll be able to reduce the maximum degree to something like root of n. And if we were able to simulate all these rounds, these half of log n faces in a single round on APC, we'd be able to obtain log log n number of rounds in total. So once again, in one single round, we'd like to simulate so many phases to reduce the maximum degree to root of n. And with this, if we perform this in a single round, then we'd be able in a single round reduce the maximum degree from n to root of n. And if you repeat this recursively, you would have a sequence of in each round would just take root of the previous power. So we'd end up with log log n number of rounds. Anyway. Arthur? Yes. I'm not sure if we're supposed to see all the details. I'm just a bit confused. Because let me know if it's something that I'm missing or something that you're trying to hide. Of course, after you finish one such round, you have degrees squared n. And you can't just apply the same thing, right? So OK. Sorry. Coming back here. You are talking after one phase to one round? I mean, one round. After one round, we'll reduce the maximum degree to root of n. Right. But then I cannot apply the same procedure, because now I don't have a dense graph anymore. So like the groups are going to be bigger. Is that what you're doing next? So the groups are always of size root of n. Why can they take them bigger? Once the degree is small, I can take a bigger group, right? Yes. But still, you have the same? Yes. So you can do the same. You can take more. So actually, in the second round, so in the second round, you would be able to use less machines. So each machine sees more, but it's a bigger part of the graph, right? Sorry? Will you be able to include more of the graph in each machine? Currently, each machine stores some very small parts of the graph. OK. Yes. So after root of n, you would be able to get, I don't know, n to 2 third vertices, I think. The maximum degree is square root n, but inside this small part of the graph, actually the degrees are on average small, right? Yes, they are going down. So after first round, you would be able to work with smaller number of machines, OK? So here, I'm saying root of n groups. I think after first round, you would have n to 1 third machines. Because then this would, I think 1 third is the right number. Maybe root of n, I don't know. Maybe I want to quarter, but I think n to 1 third. I'm not sure. Anyway, just to cut it short, this is not the algorithm that we can implement. This is, we are unable to really do it. But first of all, we have no reason to believe that this algorithm didn't work. But the challenge is that if we're making this random partition, something changed here, which is great to see some other people. But so what is really the challenge here is that if we're moving from face to face, the algorithm requires to assume that we have random partition of vertices. And vertices are really no longer randomly partitioned. So once again, what I'm saying is we have these phases. In phase one, we are saying randomly partitioned vertices. In phase two, randomly partitioned vertices. In phase three, randomly partitioned vertices. But we cannot simulate this. We have to start with one single random partition and then keep it for the whole round, which is making things complicated. And I'm assuming I should get close to the presentation. So what we are really trying to do, even though this approach would be great, it would be giving us log log n MPC rounds. We don't know really how to do it. We have slightly weaker goal. We are able to have the following algorithm. And I'll present you really the main idea of the algorithm I want to represent. So our algorithm, in each phase, we will have some small reference set, which we'll be using to estimate vertex degrees. We'll have vertices that we want to consider to be heavy. And for this vertices, we'll be using actually the definition which vertices are heavy will be done to some extent at random. And then we'll have set fi of framed vertices. And our goal, so the set r i is set of random vertices, some one of log n of total set of vertices. We'll be using this to estimate vertex degrees in the whole graph. Now, so once again, this is a description of a phase how we want to have in each partition. So we have the entire graph. We partition this into machines. And now on a single machine, we'd like to kind of use the following information. We'd like to have some set r, which we'll be using to tell us the degrees of vertices in the entire graph. Then we have vertices that after estimation of through set r, we'll be able to say, oh, these are vertices with degrees which are close to delta 1 half, in our case or more. And we'll always be using some small set of framed vertices, which will be using these are potential candidates to match with vertices from h i. And in our case, what we're trying to do, we're trying to find matching between vertices h i and f i. And then remove all these three sets and repeat. And, yeah, OK. So this is really how the algorithm is doing this. And the main thing is, yes. I'm going back. There was a question a couple of minutes ago. If you can clarify why the degree goes down by square root n. And also, let me say, we started late, so you can still take about 15 more minutes, so don't need to rush. Yeah, maybe. No, no, I know, but I also don't. OK, so which, because the question, I don't see text, but maybe that's my fault. Where did you say, where was the question about which part? The question was from Erfano. He's just asking if you can explain why degrees go down by square root n. I think he was referring to the random sampling. So how come degree? Oh, yeah, of course. Yeah, so most likely it's about this line. So we have n vertices. And we start with the maximum degree, which may be us because n. OK, so here we are saying we split everything into root of n groups. So each group will have root of n vertices. So now you should think that if the original degree of n of vertex here was k, then this vertex, because we partition everything into root of n, different random sets, that the degree of this vertex will go down by root of n. Induce subgroup. Induce, that's the word I think. Yeah, induce subgroup, yes. So once again, we have total, we have complete graph. We are taking one vertex. Now it's in one partition. But it's because in this partition, we have every one over root of n vertices. The degree will go down by factor of one over root of n. So think about complete graph. So right now we start with complete graph. Vertex is n minus 1. If we take groups of size root of n, we are ending up with root of n sides. I hope this clarifies. So kind of the idea is that once we sample something, then of course the degree will be scaled down. And in particular, if this is the step which is also important, once we started with this scheme, if the real degree is below root of n, then we won't see any edges in given partition. So once again, we partitioned vertices into root of n groups. And in each group, instead of the real degree, we see only the degree divided by root of n. And so if the real degree is of order of root of n, we'll see only a constant number of edges or none. So this scheme, if we have vertices with low degrees, we just won't be able to see any edges. OK, hopefully. Now, this is really the algorithm that we want to simulate. And let me, OK. So our algorithm is really the following algorithm. So think first about, forget about parallel algorithm. Think that we are running algorithm in rounds. In each round, so this is round i, we first take a reference set which we'll be using to estimate the degree of any vertex, OK. Then we take set of heavy vertices, which are vertices which are having high this estimated degree. Then we are looking at the neighbors chosen to some extent at random. This is the set of friends. And for this H i union F i, we're trying to find a large matching. Here is the description of a matching, but you should just think that this will be a large matching of linear size. And if you run this algorithm, that one can show that this algorithm, after logarithmic number of rounds, as kind of with whole input information, so not really as no partition, but we are just running this as a sequential algorithm. After log and rounds, we'll be able to find a constant approximation of a matching. And our goal is to simulate this algorithm. Can you explain why you need to bother with estimating degrees? Can you just store it? It's n of n, right? OK, so good question. But let's just think in the following way. Start the first round. We start with the first round. For every vertex, we can learn its degree. But then we are moving to the second phase. We still would like to understand what is the degree of this vertex. While moving to the fifth, to the log log n phase, we still would like to estimate the degree of given vertex. OK, so at the beginning, as Oded is saying, at the beginning, we know exactly what is the degree of every vertex. But later, on a single machine, so you should think in that way, we have single machine which have only some random subset of vertices. And then at some moment, we'll be removing some of these vertices. And in each moment, you would like to estimate its degree. OK, so this set R sub i is exactly used as follows. We'll have single machine. On this single machine, we have only partial view on the entire graph. But still would like to have a good estimation of a degree of every vertex, a special degree of vertices of high degree. So we're using this as a reference set. So this is something essentially every element is taking from a log n. So once we are checking how many of these vertices are in our neighborhood, it's giving us a good estimation of our degree, to some extent. Let me move because our main part of the analysis, and let's just say that 30 pages. So you shouldn't really think that that's something. Is that we'll be able to show that even though in phase one, we are removing some vertices. In phase five, we are removing some more vertices. We'll be able to say that we still are able to keep the distribution of vertices almost uniformly at random. So our algorithm, the main goal of our algorithm is essentially as follows. We start with random partition at the beginning of a given round. Then we are removing some vertices from the matching and heavy vertices. And we are still able to maintain almost a uniform distribution of the vertices in the remaining in every partition. So to some extent, I mean, this is really the main contributionary of the paper. That while maintaining this property, that even though in some possible deterministic way, while removing some vertices from our partition, we are still able to say that the remaining vertices, the choice of the partition to the machine they have been allocated, it's almost uniformly at random after every phase, after the simulation of every phase. That makes sense because I'm sure that that's quite vague. But this is really the most important part of this result. What we are doing is, so once again, what we are doing is as follows. We want to simulate a super constant number of rounds of phases. So that's a phase. We want to simulate a super constant number of phases. And each phase works as follows. We randomly select some vertices, find some matching, remove some vertices. Then the next phase, we randomly select some vertices, find some matching, remove these vertices. Now what we are able to show is that from the point of view of a single machine or of single partition, even though after second round, it's not perfect randomness of the selection. It's almost perfect randomness. And what we are showing is that after phase phase, we'll be using a tiny bit of this randomness. After second, we'll be using slightly more. But still, we are able to show that we are able to perform super constant number of phases and still maintain random partition of the vertices or almost random partition of the vertices. And what we can show is, we can emulate with this log n over log log n rounds with phases and still maintain this randomness on almost every machine. So basically, in a sense, you're saying you're not really wasting all the randomness, there's still kind of random because the algorithm doesn't really decide based on the exact choice of groups. For instance, the algorithm decides based on degrees and degrees don't depend so tightly on the group you chose because any random choice would give you a similar degree. So an approximation of degree would also be good. Yes, so once again, we have this approximation of degrees. For this, we have this reference set. So to some extent, in every single moment, through this reference set, we can estimate very well almost exactly the degree of the vertex in the entire ground. And secondly, even though we have removed some vertices already from my partition, the fact that I belong to this partition or to another is still almost random. And as I'm saying, this is almost random and this error in every phase will accumulate. So we are able to emulate only some log n over log log n number of rounds. But still, this is a super constant and with this we'll be able to obtain total number of rounds, which is proportional to log log n squared. So in a single round, I don't remember exactly, but I think we can show that most likely the difference between the original uniform distribution and distribution we had is something like maybe 1 over n. In the second round, it will be getting small. I mean, it will be getting bigger and there will be kind of going up. But altogether, this thing will be negligible. That makes sense. So there are kind of technical details which I don't really want to talk here maybe. It's important that while removing this reference set, R-I in every round, it's important that these friends can be also selecting themselves. So there are kind of some issues that we have to ensure in order to maintain this randomness from round to round. Let me move to, OK, yes. So to some extent, in my opinion, the main idea is quite simple. So once again, you take this algorithm working in rounds which relies, which depends on the starting thing, which is random partition. But then you know that in the second phase, you won't have random partition anymore. But what we're able to show is that this will be almost random partition. And then in the third, the fourth up to log n over log n phases, the partition will be almost random. And if it's almost random, still we can have the same analysis to show that the same algorithm will kind of emulate with some small error, the original algorithm. OK, so that doesn't make sense because it's really the last thing about the algorithm itself. The next thing is I wanted to post some follow-up work and maybe also some related problems. Are there any questions now? I don't know how it usually works. But OK, follow-up works. So we had this result last year. And since then, there have been three very nice results. First, Asadi applied the same approach, also kind of this round compression approach to obtain logarithmic approximation for vertex cover in log log n number of rounds. And this was a combination of round compression algorithm and combination of Asadi-Kana approach from some other paper before. Then shortly after our result, Asadi, Batani, Bernstein, Mirok, and Stein, they showed that our algorithm actually, I mean, they showed that one can find maximum matching, cost approximation for maximum matching in log log n number of rounds with space, which is O tilde of n. And space has been later improved to the same space as ours and over per log. But I'm not really sure whether this was last year or this year that they kind of put an updated version of the paper. So I'm not really sure when this was done. And they also obtained two-plus actual approximation for vertex cover. The algorithm was quite different. So it was really relying on some techniques developed by for dynamic matching algorithms by Bernstein and Stein. And then most recently, there was a paper by Gaffari Golaakis-Mitrogicent Rubinfeld. It's been on archive in February, where they show essentially the same result as ours. In my opinion, approach is similar to ours, but it simplifies some steps. And they also were able to obtain one plus epsilon approximation for vertex cover problem, also with the same log log n number of rounds and n o divided by polylog space. And what was very nice, I really like it a lot, is I never really thought about this. They can also apply this result to the congested clique model, which is classical model of this with the computing. So they are obtaining log log n number of rounds and congested clique model for this problem. Now, let me just move to the last two slides. So when I first start thinking about it, I saw that log log n bound would be great. But actually, anything low log n is great. And now once I'm thinking about it, I don't know, maybe actually one can beat log log n number of rounds. There is no real reason why log log n should be a barrier here. On the other hand, we know that in these parallel computers, in this MPC model, it's extremely difficult to prove any lower bounds. And maybe one can actually try to argue that one can beat this bound, or maybe not. I actually like the round compression model approach a lot. So maybe one can also apply it to some other related problems. And finally, one thing which I always thought was very annoying. I really would like to have, I would love to see, an analysis that maybe one can get log log n number of rounds using the algorithm that kind of I show in passing during the presentation. That kind of is redoing the following thing. You start with a vertex set. You partition this into root of n sets. You are trying to find as good matching as possible. Now, this is really something which is simply what does me as good, or what is best matching. But you are trying to do some tricks to reduce the degree of all vertices. For example, to root of delta, or maybe even to logarithmic number. And maybe one can obtain some better bounds for maximum matching problem here. I just don't really understand this process yet. OK, thank you. Thank you, Arthur. And this is a good time for more questions. Yes. We're going to start before the questions show up. So just to make sure, you mentioned one plus epsilon approximation for vertex cover. But that's NP-compete, but that's fine, right? Because computation is free. Yes, yes, yes. So you know that if you have n square space partition, you can do it in constant form in single round. Yes. Your algorithm, the running time is actually very good. It's not you're not doing any exponential computations, right? Yes. So what is your question? Yeah, so I mean, this sounds like this might have actual application, so you will probably at some point want to worry about running time. So you don't want to completely complete problems. But I think you don't. I think as far as I can tell, the algorithm you have is very efficient, right? Oh, our algorithm is very efficient. OK, OK. But the algorithm itself, I mean, so the algorithm itself, I essentially show you the algorithm with this set R sub i, F sub i, H sub i. So there is nothing really deep there. The analysis is complicated to show that we can actually capture the error in every single phase and estimate it properly. So the algorithm is efficient. Yes. Yeah, of course, I'm just saying it's probably not efficient for the 1 plus epsilon vertex cover approximation. No, no, no, I don't expect so, but yeah. And the 2 plus epsilon, shouldn't that just follow immediately from maximal matching the 2 plus epsilon? Yeah, so OK, to get 2 plus epsilon, essentially one does the following. Compute cost approximation of maximal matching and then repeat it. So find a cost approximation for maximal matching, remove all matching which you have matched, and you run the same algorithm on the rest of the graph. And in a constant number of rounds, you will get 1 plus epsilon approximation of maximal matching. Then you switch to 2 plus epsilon for vertex cover. So for maximal matching, we are getting 1 plus epsilon. For maximum matching, we are getting 2 plus epsilon because of reduction 1 is 2. And there are no tricks that actually you can do something to get 1 plus epsilon approximation. And there's also a question from Gialag. Is the communication cost of your algorithm known? Communication cost? So in every round, all what we're doing is we're selecting vertices. So each vertex select itself at random. And then the vertex sends all the, I mean, the communication cost is proportional to n, I think. Let me just think. Because for congested clique, most likely it was used, it should be proportional to the memory. Because all what the algorithm needs is to remember to memorize the matching found so far, to remember which vertices are still in the game, which vertices have been removed. And every time all what we're doing is when we're doing this partition, every vertex is kind of coming with its neighborhood in a given partition. So total should be linear, or O tilde of n, assuming we have memory, which is O tilde of n. I'm not sure if this answers your question. Gialag, welcome to the table. Every machine is sending essentially O tilde of n information to other machines, I think. Then the machines that listen, they pick whatever they need, right? They cannot store everything here, right? Yes. More questions? Clemont, any questions? So I think it's this idea of, in a sense, you're blindfolding the algorithm. We're using the soft threshold. You don't want the algorithm to actually know the exact degree, right? You're saying that. I remember saying a similar thing used in the paper on the shortest vector problem by itai, kumar, and chiva, kumar. For similar purposes, basically for analysis, they want to make sure that they have enough randomness. So they force the algorithm not to actually see the input. Yes. So this choice of heavy vertices, we cannot afford this to be a deterministic function, because a deterministic function would reveal too much information. So we want to still maintain some kind of randomness, as much randomness as possible. You only want to use the most significant bit, in a sense, yeah? This is a beautiful idea, I think. And in their cases where points and space, they had to move the points randomly so they don't actually know where the point is. Yes. Any more questions? OK, so thanks, everyone, for staying despite the technical trouble. I hope this doesn't happen again. I'm not quite sure what went wrong. But Arthur also, for being brave and diagnosing it's alive with us. And so thanks again, Arthur. It was great. And thank you so much for the question. Great. And if anyone wants, we can stay a bit longer. So just turn off the YouTube stream. Also, thanks for everyone watching us on YouTube. And we have two weeks. Again, Chai Moran is joining us Wednesday, April 11.