 Okay, so welcome everyone to today's TCS Plus with Alex Androni. Before we start, I should thank all my fellow co-organizers, Anindya De, Thomas Vidi, Gautam Kamat, Clemence Canon, Anindya Rosentine. And as usual, our tradition is to go around the table. So let me try to introduce all the groups. We have Esan with the group from USC. Hello, Esan. We have Erfan with the group from Indiana. Hello, Erfan. Thanks for joining. We have G from the Simons. Hello, everyone there from the Simons Institute. We have Reza and I'm not sure actually where from. I can't find it on my list, but welcome. We have Shravas from NYU, a few floors above me here. Hi, guys. Slobodan from MIT. Hello, everyone. And this looks like, is it some? Yeah, it's Thomas Vidiq from the group from Caltech. We have Toronto. Chris joining us, the group from Toronto. Hi, everyone. Welcome. We have Yasemin joining us from Johns Hopkins. Hello, welcome. And Yijun Chang from University of Michigan. Good to see you. And I think that's probably all for now. So today's speaker is Alex, Alex Andoni. Just a few words before we start the talk and just present Alex. Alex has had his PhD at MIT with Piotr Indyk. He graduated in 2009. He was a postdoc at Princeton for a year and then spent about five years in the Microsoft research. He's now a professor in Columbia. His interests are in high-dimension geometry, metric embeddings, algorithms for massive data. And today he's going to tell us about graph connectivity in log diameter rounds. So welcome, Alex. All right. Hello, everyone. And thanks for inviting me to the famous DCS Plus Talks. It's a pleasure to finally give a talk here. Definitely my most distributed talk. And I'll talk about parallel algorithms for graph connectivity. And I shall mention that this is joint work with Xiaosong, Klitzstein, Zhengyi Wang, and Peilin Zhang. So let's start. So first of all, the general motivation for this research, this paper and some previous papers came from the wild success of modern parallel systems, in particular systems which were designed with many computers in mind, which are trying to solve one big task. And I'm sure you have heard many of these modern systems before, such as MapReduce, Hadoop, Dryad, Spark, and many others. And I mean, of course, I'm here not to talk about the systems, but more of the theory of the systems. And in particular, the success of the systems somehow forced us to rethink a little bit the theory of modern parallel computing. And in particular, kind of as a theoretician, you'd kind of ask, you know, there are many questions to ask, but some of the basic questions are, first of all, kind of to establish a theoretical formal model for the systems and then think about, you know, where this model leads to new algorithmic techniques. Okay, so first of all, let me describe the model. So, I mean, of course, this talk's contribution is mostly on the algorithmic front, but I'll describe the computational model, but has been established roughly in the last, maybe, nine, 10 years. Okay, so you think about the computation being done by P machines or processors, and each machine has some amount of space, which we denote by big S. And we think that the total space, basically the number of machines times the space per machine, will be about proportional to the input size, basically as much as the input as we have. In other words, you cannot really replicate your input too much. Okay, so we have these machines and somehow we store the data between themselves. And, you know, this talk is about graph algorithms. So we'll always think about input as being a graph and vertices and edges, and really the input is some graph, right? This is our abstract graph on images. And somehow at the beginning, this graph is distributed in this space of these machines. So this is kind of split between these machines. I mean, here it's just depicted by the edges being split between different machines. And the computation proceeds in rounds where each machine does some local computation and then it does what is called shuffle all round, which basically reshuffles the data between the computers. And then, again, the computers do some computation on their local information and then maybe again reshuffle the information and then do some more computation. Okay, note that this shuffle from the perspective of communicating across the network is very expensive. And in the general principle, it's to try to reduce as much as possible this number of this shuffle operations or basically these rounds. Okay, and I should mention that the output oftentimes will be something which is also proportional to the input size. For example, order M in this case. In particular, it doesn't fit on the machine. So the answer is not stored on one machine either. But perhaps it will be stored across different machines. And for example, here, one problem we'll be talking about is connectivity. So for example, these machines are computing the connectivity of this input graph up here. And the edges are now colored by the name of the connected components. So these edges represent this abstract coloration of the graph corresponding to the two connected components. Okay, and perhaps this is used for some other application down the road. So this is the model. This is how computation proceeds. Basically, local computation and the shuffle all repeated in a few rounds. And the main model constraints or the main cost measure, first of all, is the number of rounds. So as I already suggested, we'd like to reduce the number of rounds as much as possible. We'll also think about this space per machine as being something which is small polynomial. So delta think about it as let's say being 0.1 or let's say it is root ham. It's another good number to have in mind. For example, the space equals to root ham comes out just from requirement of saying that let's say if our space is at least as large as the number of processors, for example, because a machine has a list of all the other machines participating in this computation, then plugging it together, we immediately get the space per machine as the at least root of ham, where ham again is the size of the input. But in general, you can think of size per machine as being some small polynomial of the input size. And this bound S bounds the space per machine, but also naturally it bounds also, for example, how much information a machine can get in a round. So the total in communication will sum up all the sizes of the messages per machine per round will also bounded by order. So represented by this red bound. And, you know, we might want other things from our model, for example, that we have let's say linear runtime for each machine locally per round. But this will not be the main constraint for us. And usually this will be easy to achieve in our algorithms. Okay, so I mean the main focus, the main cost that we measure is really the number of rounds. This will be the focus in this model. Okay, and this model has a combination of a number of formalizations of the parallel models. In particular, it is a form of what is called bulk synchronous parallel model dating from 90s. Later, about 10 years ago, there was, after these systems have appeared, a number of offers introduced a map-reduced framework, which is some model, which is a precursor of the current model. And the current model is really introduced in this paper. And it is called massively parallel computing or MPC. And basically since about 2013, this was the model of choice for studying the modern parallel systems. I should mention that I include citations here. I want to read the names just for interest of time. But later on, some references will be abbreviated if they appeared early in the talk. Okay. All right, so these are the model constraints and what can we do first of all? The first natural question is, okay, there has been a lot of research done in 80s and 90s on parallel algorithms. In particular, we were done under the hood of what is called PRAM algorithms. And the question is, can we reuse those algorithms in our new model MPC? Or is there more things to be done? So first of all, the good news is that, yes, we can reuse the PRAM algorithms that were developed before. And in particular, you can get, for any PRAM algorithm, you can get the number of rounds in our model to be roughly proportional to the parallel time in the PRAM model. Okay, so basically it says that you can simulate the PRAM algorithms without a significant slowdown. Okay, so this is very good. This means that we can reuse a large body of very nice algorithms. One caveat in a sense is that typically those algorithms are at least logarithmic parallel time. So I guess the most kind of classic example is to consider the problem of computing an X or of N bits. So here, lower bound from 89 showed that even kind of on the fanciest available PRAM, which is called CRCW, we need parallel runtime, which is a roughly logarithmic internet. Okay, so, I mean, which is nice time, but the question is, you know, can we really get even better runtimes in the MPC model given that the settings are somewhat different from the PRAM model. And basically this has led to the research goal of trying to develop faster algorithms in the MPC model. In particular, for this XOR problem, we can solve the XOR problem in constant time on MPC, on MPC model. So improving from logarithmic to constant. And this is done basically as, you know, computing this VRT. So here is an example. This is just considering the case when the space per machine is something like N2.0.5. So root of the number of bits. So this is essentially done in just one round. At the beginning each, let's say, input machine has a routine of these bits, which satisfies the space bound. Now, in locally, each machine computes the XOR of the bits that it has locally. So, and then it sends this one bit, which XOR of its local bits, to the one last machine, kind of, which is, which will compute the final answer. Okay, so each of these machines will compute the XOR, send it to this machine. This machine has an upper bound of root N of the amount of input that it can get, but since there are root N machines here, it is all good. And this machine can compute the final answer, basically, the XOR of the inputs that it gets. And obviously, if S is less than root N, then you can generalize to build this VRT of depth, which is logarithmic base S of N, which is constant as long as S is sampling the number of N. Okay, and in general, you can think about NPC, sorry. You can think about NPC as being a circuit where you have different gates. Each gate has a fan in which is S to N to power delta, basically the bound on the space per machine. And the function that is at every gate is, in a sense, an arbitrary function. Again, ideally, we'd like this function to be a nice function. It's a linear time algorithm, but in general, we can think about arbitrary function at the gate. Okay, so this is something to have in mind. Okay, so can we really design better algorithms, faster algorithms in the PC model? Okay, so let me try to recap a few of the research progress that has been done over the last years since NPC has been introduced. So first of all, to repeat something that I already suggested, that we can basically simulate any classic PRM algorithms. So in particular, for the connectivity problem on graph, we can obtain a logarithmic parallel time or logarithmic round complexity algorithm. Okay, just by simulating the classic PRM algorithm from it. Another algorithm that obtain or a set of algorithms that obtain number of rounds, which was in this setting when the space is equal to n to power delta, which was constant, was our work from 2014. It applied to problems such as minimum spanning tree and f mover distance, but it really applied only to geometric graphs. So these are graphs which are implicitly designed by a set of points and the distance in the corresponding metric. So it is not really on the combinatorial graphs. So for the situation when the space is equal to n, let's say number of vertices to power delta, for small delta, these are really the only results for general graphs. But there are more work done when the space is somewhat larger than our ideal antipower delta. In particular, if our space is a little bit larger than the number of vertices. So it is antipower epsilon, which is still less than, let's say, the number of edges or the total space. Then it is possible to solve, let's say, connectivity and minimum spanning tree and some other problems in constant number of rounds. So here, note that this result really only makes sense for graphs which are relatively dense so where the number of edges is at least n number of vertices to power one plus epsilon. More recently, a nice set of work managed to break through this bound of n and they obtain algorithms that can use space per machine to be a little bit less than linear in the number of vertices. So it is n divided by polylog n and they obtain the number of rounds which is log log n to some constant. So some of these papers obtain some log log n to some power and then it has been improved in subsequent papers and it applies to problems such as approximate maximum matching and vertex cover. So these are really the results so far and obviously this kind of the obvious question or the main challenge that remains that was basically since the first simulation of the Piram algorithm in the NPC model for connectivity, the main question that has been there for a number of years is to obtain NPC algorithms for graphs which run in number of rounds which is much less than logarithmic when the space per machine is really less than the number of vertices. So it's n to power one minus some constant ideally as n to power delta. And this is not known even for connectivity. In fact, it is not even known where we can get such an algorithm to distinguish these two cases where we have a graph which is one big cycle versus a graph which is contained is composed of two cycles of length roughly n by two. So we don't know how to, the best algorithm to do this when space is, satisfies this bound is logarithmic in n just by simulating a classic Piram algorithm. And in fact, there has been work proving that log again bound is necessary for a restricted set of algorithms. Think of algorithms that basically are allowed only to send around edges but are not allowed to do any kind of coding, for example. So the hardness of this problem in fact suggested to use this particular problem as a hard problem to prove conditional hardness of some other problems in the NPC model. Okay, which was done in these two papers. And of course, we'd like to prove general over bounds. Ideally, it would be very nice result to prove that in general, any algorithm requires log log n rounds to solve the connectivity. Unfortunately, this seems to be a very hard problem because a general lower bound in a PC model would imply some circuit lower bounds which we know is a very hard problem. And we don't know how to prove these kind of statements at this moment. Okay, and the somewhat side remark is that this space for machine being proportional to n and whether we are above this barrier or below this barrier seems to be a natural barrier in a related model, namely the streaming model. And in particular, it is a barrier because we know that many problems including connectivity cannot be solved when the space is less than the number of vertices but many problems, suddenly many more problems are solvable when the space is a little bit larger than the number of vertices. So including connectivity, but many other nice problems. In fact, this barrier is so fundamental for graph problems in the streaming model that there is an actual name for the streaming model which has space a little bit larger than that and it's called the semi-streaming model. I won't mention streaming model anymore in this talk so I won't go in more detail here. Okay, so this is the state of the art and what is our main result? So our main result is to design faster and PC connectivity algorithms for graphs which have additional structure. In particular, we focus on graphs which have small diameter D, okay? So big capital D will always represent the diameter of a graph and formally we prove a following theorem but for space which is, as we'd like it to be and to power delta for some constant delta we can get an MPC algorithm for solving connectivity on graphs which runs in order log D times log log N rounds and in fact it can obtain a kind of clean log damage rounds if the number of edges or the total space is at least larger than the number of vertices by some polynomial. So it is at least N to power one plus epsilon. Okay, so this is the main result and this is what I will talk about for the rest of the talk and I should mention that having algorithms run times depend on D is something a natural consideration to have in particular there is a paper in 2013 that proposed an algorithm in conjectured that in fact the algorithm runs in log diameter rounds it turned out that the algorithm does not obtain actually log diameter rounds but it was a problem that appeared it was a parameterization of the graph that appeared before and of course having run times depend on the diameter is something which is very common in distributed algorithms where there is a natural lower bound that your run time has to be at least proportional to the diameter of the graph in the distributed algorithms. Yes. Sorry, just to get some questions here. Okay, thank you, yes. Yeah, I love questions. Yeah, and everyone here please do ask questions because my questions are usually stupid so I encourage others to ask questions. If you're saying that if M is bigger, is big enough, then it helps you somehow but how can that be? I mean I can artificially add edges without changing connectivity Yeah, it means that my space is larger but remember that my space, so the system, the entire system so the space over all the machines is at least the input size so if I have many edges this means my total space is large. I see, so S is just for one machine and then to the delta, but then effectively we have more processors when M is bigger. Yes, yes, exactly. If you fix S kind of to be at the power delta meaning having more edges this means that you must have more machines because you have to somehow store the input at least. Right, of course, thanks. And the way the input is partitioned, you said it, but can you say again, how is it partitioned? It's arbitrary at the beginning. So you cannot choose it, it's given to you in the worst case. Yes, it's worst case, yeah. Thanks. I mean usually you can, let's say in one or couple rounds you can resort things. For example, sorting is a is a primitive that you can do in constant number of rounds in this model. Again, not the point of the stock, but usually you can resort. So, you know, without the originality you might as well assume that it's an arbitrary and we'll see, I'll talk a little bit about of the implementation kind of a little bit later in one slide. Okay, thanks. Have a question here? Yes. Hey Alex, so you mentioned MPCN, you compared it to PRAM, but there are other models distributed algorithms such as, for example, suggested click, how do they compare? Good question. I won't be able to answer it online. Since I don't know them that well, but I think those all, I mean, those models naturally, so there are more distributed models, right? In distributed models, the general difference is that if we have the graph itself it's not only the input, but it's also a restriction on how the machines can communicate between themselves. Right, so this is the usual restriction in distributed algorithms, but you can only communicate to neighbors, you can't communicate to anybody else. Whereas here, in a sense, we can communicate to anybody else. In the click model, it's exactly everybody can communicate with everybody else. So, I don't know the details of that model, to be honest. Maybe I can take it offline. I just, you know, if you tell me the model, maybe I can say what are the differences. It feels like maybe we should do it at the end of the talk. Hello, I have a question. I don't know if you can hear this. Yes, yes, I can hear, yeah. Okay, so I'm go from University of Houston, actually that's the Reza group. So, I have Reza as my student. So, one question is I think he's asking the congested click model there. It was recently shown that two of one rounds can be done. So, that's obviously optimal. So, for MST and connectivity. But what I'm asking is there's something called the K-Machine model. I'm one of the, I mean, authors of that. It has been there for a while. I mean, you also go to this model. I think it's, in some sense, it's more realistic than the MTC model. But in some sense, it also subsumes it. And basically the kind of result that you're showing here is also sort of implied there. Of course, then we don't care about the law of factors. But basically, the small paper that you put, I mean, we show that connectivity and MST can be solved in optimal number of rounds. Basically, N by K squared rounds, where N is the number of nodes in the graph. It's regardless of M. The number of it is irrelevant. And the K is the number of machines. So, basically, it's the optimal thing. N by K squared is optimal. Of course, there are law of factors. Of course, here the main game is to play with the law of factors. The law of factors are, we don't care about it N by K or it's much larger. But the point is it's more realistic because there it's, and that's one of the reasons why MTC and MPC and MapReduce models, they are not that relevant in the reach now. Because in the Pregel model, you don't set everything. So, in MTC model, you completely shuffle everything. That's not smart. So, PARC and Pregel, they don't use that thing. I mean, because they don't shuffle everything. I mean, that's completely costly. Generally, MapReduce kind of thing is not used for graph algorithms because of this shuffling thing. So, they are good for constant round algorithms. That's where MAPReduce, we just do for constant rounds. It's okay. But for more number of rounds, the K machine model type algorithm, which models Pregel and PARC kind of systems, I mean, that is more realistic because you only send what is needed. You don't basically transfer the graph shuffling. So, there's no shuffle operation basically. You try to minimize the number of messages, communication that is going across every round. And I mean, in that model I think you quote the production. You don't mention it, but I think this is also a very, I would say a very relevant model. In fact, it's more relevant than MapReduce and MTC for the K machine model. So, that's what I mean. Yeah. I mean, I understand and I mean, you're right that there are models which are more specifically or systems which are more specifically designed for graphs, for graph algorithms. I mean, I don't know of formalization of those models, which are substantially different from MTC. So, this is a game which you quoted. It's also, I just checked your paper. I mean, it's one of the citations. So, this was your first proposal in soda was 2015. So, it's sort of more recent. And then for the actual paper you quoted in SPA. So, in SPA we show the algorithm to the O of N, O tilde by K squared. So, I do care about this It's a generalized model than the congested click. So, the congested click has been on for a while but it's completely unrealistic for big data because you can't assume that there are N number of nodes for N machines. You can't assume there are N machines, there are N nodes. If N is large that's completely unrealistic. We take this line because for non-experts this is not very useful. You're welcome to say here after the talk but I'd like Alex to continue now. So, let's just take this offline and focus more on questions for now. And after the talk we should have some discussions like that. So, if that's okay let's try to continue. Yeah, I'm happy to discuss this after the talk and I mean at the moment this will be the model, this will be focusing and, you know, of course you know, after that we can discuss about algorithms which try to minimize the communication per round. So rather than having all out kind of communication to try to reduce the communication it's a natural goal. But we can discuss later on about this. Okay, so continuing I should mention that there is independent parallel work. I mean I guess it's only proper but there will be a parallel work on parallel algorithms that obtain faster algorithms for connectivity on graphs for graphs which have special properties. So one paper obtain algorithm that runs in log log n rounds for graphs which have good spectral gap so think expander for example and there's another paper which obtain log log n rounds for random graphs which again would also likely have good spectral gap. I should mention that our result of these two type of graphs will obtain log log n squared number of rounds just by standard connection between a diameter and you know what properties these graphs have. Okay, so I'll talk about the algorithm now and the algorithm is based on a classic idea from for designing parallel algorithms for graph problems and it is based on leader contraction framework which was particularly brought to the NPC world kind of paper and this leader contraction framework works as follows. Okay, so at the beginning we have a graph and the algorithm proceeds in stages in iterations and graphs of different iterations will be called G1, G2 and so forth. Okay, and we'll continue with iterations until there are no more edges. So until we get a graph where there are no more edges okay, and what we do in each iteration is the following steps. First we choose a random set of leaders from the current vertex set. Then for every non-leader we'll select one of the age set leaders and contracts into that age set leader. If one exists, if it doesn't exist then the non-leader doesn't do anything. Okay, and the graph that is for the next stage will be the graph where we have performed these contractions, basically the contractions of the non-leaders into the some age set leader. Okay, and at the end of the day when there are no more edges we get a bunch of isolated nodes and each surviving node corresponds to a connected component basically which is exactly all the nodes that contracted into V directly or indirectly. Okay, so this is the algorithm and I'll just go through the for an illustration of the next slides. Okay, so let's say this is our graph G1 this is the graph we started with on nine nodes. So we're preceding stages so in the first iteration we pick a bunch of leaders, so let's say these are the leaders, the stars are the leaders. Okay, and now for every non-leader so every node without a star we need to choose a leader which is adjacent to it if one exists. So for one there is no adjacent leader so we do anything. For three there are two leaders and let's say it chooses one of them for four there is exactly one leader so it must choose that leader now we go to two again chooses one of the two leaders one of the two options five doesn't have any adjacent leader so it doesn't do anything and now we contract all the nodes so this G2 is obtained from G1 by contracting all these non-leaders which have an adjacent leader so in particular three, four and two nodes have been contracted and of course this contraction operation also inherits the edges so the graph G2 but to obtain is the following graph So this is the first iteration now we continue the next iteration sorry if animation being a little too fast sometimes So this is the graph G2 again we choose a bunch of leaders let's say these three nodes are the leaders now for non-leaders basically these three nodes 8, 5 and 9 will choose adjacent leader so let's say it chooses this as the leader 5 chooses this is the only option for it again same for 9 and we obtain G3 by contracting these three nodes and we obtain this graph 1, 7 and 6 and then in the last iteration we have these three nodes we choose a leader it's a 7 is a leader and we contract the two nodes into adjacent leaders so the graph G4 but to obtain will be just the node 7 and there are no more edges in this graph so we are done we know that all the nodes that were all the nodes were contracted in 7 first of all and this means that all these nodes are part of the connected component together with 7 yes so this is the random mate of the node which is very known in the log and okay this is the random mate of the node yeah so this is yeah I'm describing a previous algorithm yes this is a classic idea we use many times since 80s indeed yeah so but this is the framework I mean our algorithm will use this framework and do something in addition to that okay so let's do so first of all before doing analysis of this algorithm let me describe perhaps a little bit more detail how you would implement such an algorithm in the NPC model so the general rule of thumb or how you should think about this is that any edge local operations can be done in order one round so a little bit like in distributed algorithms except that we may change the graph structure from time to time as long as the total space bound is satisfied so at some moment we'll be adding more edges so as long as we don't add too much so that the total space bound is violated we are okay okay this is the rule of thumb for details and for example show you how to roughly how you'd handle one iteration of the leader contraction in constant number of rounds so the general principle is that each node and its incident edges let's say node 2 and the edges from 2 to 8, 2 to 5 and 2 to 6 will be assigned to a handler machine so let me just call a handler machine and of course one machine can be handler for a bunch of nodes okay and the handler for machine for vertex view will basically do operations that are local for this vertex view in particular it has to choose let's say it also has a random coin to choose whether V is a leader or not and if it's a leader then it communicates this to the handlers of its neighbors so if 2 happens to be a leader then it will communicate this to 8, 5 and 6 if if V happens to be non-leader then it has to select a unique adjacent leader so if 2 is non-leader then after it has learned from 8, 5 and 6 whether they are leaders or not it will choose one of them to be the preferred leader and once it has the preferred leader it will send its incident edges to that leader basically for the contraction operation stores that V is in the same connected component to the leader and so forth, these kind of local operations okay so one issue, one obvious issue is that what if there is a node whose degree is very large so in particular it is much larger than the space boundary machine so in particular this amount information does not fit in a single machine because a node potentially may have a degree which is something like N whereas the space from machine will be let's say root N so what do we do in this situation so in this situation we store this information in distributed fashion so in particular for one such node basically large quantum one piece of information if it's too large then we will partition between many handlers in particular let's say D by S handlers and then the processing of these operations will be done via a tree of depth log base S of D which is a constant again because D is bounded by N and S is at least a polynomial in N pretty much like we did with the XOR operation so for example selecting a unique leader can be done in log base S of D rounds via the same tree that we used for doing the XOR operation and and one last mention about the implementation is that how do we do the assignment to handlers so if let's say hello yes so how is the graph distributed completely by XOR partitioned by XOR is it like in a vertex centered fashion so for example in the K machine model we use vertex and its incident gauges are in the same machine of course they are up to the base constraint so is it like that so I mean the MPC model does not have a restriction how we store the graph I mean any reasonable way is okay what I'm describing here is how the algorithm would implement such an algorithm so it is not and what I'm describing here is not an absolute must that MPC model must do but this is one way to do things and it's a relatively common for MPC algorithms okay so in particular it is normal to store let's say node together with incident edges together but they may not fit I mean this is I mean in a sense it's a pretty big deal and it is important to deal with this issue but the node and incident edges may not hit on a handler on one machine so then the storing of a node with its edges is distributed as well so we use D by S handlers and then processing let's say selecting one of the neighbors between between D or M should be done in a tree in a tree fashion so kind of in beliefs we first select between local neighbors we select one and transmit to machines on the next level then they select among you know the ones that receive the selects the next one and so forth so it's kind of an or operation that you would implement in a tree exactly the same fashion as I described the XOR operation but it's not necessary that it's not necessary it's not necessary it's the node and incident edges it's not necessary it's it depends what you mean by not necessary as in the MPC model I mean does not have does not enforce anything it is our choice of how to design algorithms the algorithm is always based on worst case graphs so it is kind of classic kind of shure shure m basically saying that for any worst case graph we get this bound so this in particular means that there may be for example nodes with degree which is almost linear in particular this means that the all the incident edges to this all the edges incident to a node may not fit in the machine and at the beginning we assume that the input is distributed in arbitrary fashion I mean of course respecting the space bound per machine so the edges could be permitted in arbitrary worst case fashion at the beginning so then I will be a leader so in node so if we just for example show the edges so I cannot hear it well suppose it did not suppose you store it by distributing edges arbitrary so every every machine gets let's say equal part of the equal partition of the edges right then I think a leader is you have to elect it has to be consistently agreed that the same will take with the leader right so we need to first assign kind of edge at the beginning each node and incident edges will be assigned to a handler machine and this is done in a constant number of rounds at the beginning oh okay so okay do the standard pre-process yeah in a sense it's a standard trick you think about this is this is pretty much corresponds to sorting let's say we're sorting by key which is equal to the node so then all the so first of all you duplicate each edge you duplicate twice because it will appear once incident to one node one vertex and once incident to the other vertex and and then you sort for example by the node so this means that all edges incident to a particular node will be consecutive so it will be on a machine or on two machines or if it doesn't fit on a machine then it will be on many machines but then we have to be processing that node a little bit more carefully these three of these depth okay and in particular assignment to handlers can be done either in random fashion the maximum size of this information block by information block I mean this red square basically the all the neighborhood of a node if it's if all of them are less than space bound then we can just distribute them randomly and then kind of usual kind of been packing arguments will tell that all machines will roughly have the same load in a sense they'll handle roughly the same amount of information or if this max size after for example this partitioning is roughly close to the space bound then you know random is not good enough because it will introduce another log so then you know usually it is done by more careful load balancing and this is kind of a common thing for systems in general so yeah in the random way the leader is elected with probability of so each node is elected with probability of what is the probability here I'm getting this on the next slide perfect question so let me let me go on this description so this is about the implementation I won't describe much more about the implementation but we'll describe the algorithm in slightly higher level level detail so so how many iterations so let's do an analysis for leader contractions so how many iterations do we need to do so the number of vertices in the graph in the next iteration is exactly the number of leaders because the leader survive for sure as well as the non-leaders which have no edge center leader in the graph okay so suppose the probability of being a leader of a node being a leader is exactly a half then in expectation the number the fraction of nodes that survive will be uprobounded by three quarters okay and why is this the case so for every vertex that survives in the next iteration with probability half it is a leader so when it definitely survives with probability half it is non-leader okay and it may happen the probability if that node happens to have degree one and then there is another probability of a half when its node is a leader or not a leader okay and the node two I mean in this case for example with probability two with probability half will contract into five but the probability half it will survive so overall it will be half probability from here plus a quarter from here and therefore the fraction of surviving nodes will be uprobounded by three quarters so this means that at least in expectation in expectation in logon rounds will be done basically there will be no more edges or the number of nodes drops below one which is not possible so there are at most logon rounds okay and if I think basically this is pretty much the best possible for a graph which is a paragraph so if we have a graph which is a very long line composed of nodes or for example it's a cycle of length and then this is the best possible okay there is you know you kind of choose probability of being leader differently to improve this number of rounds to be below logon okay this is exactly why it was hard to distinguish if it's one cycle versus two cycles problem okay so this is this is the situation but you know as already suggested what happens if the degree is large so if we have a lower boundary degree and say for example each node let's say this node two it has a sufficiently high degree at least D then we can choose we can set the leader probability of being a leader to be much smaller okay so for example you know here it will be let's say it's enough to choose leader probably to be roughly a quarter to have reasonable probability but two will have an adjacent leader okay so this is and you know basically this will be the idea so let's explore this idea further so suppose we do what you know we do leader contraction in a graph which is a high degree graph okay so in particular think that suppose that all degrees are at least D and think of this D as being sufficiently large okay not a constant but let's say polynomial so in that situation we can set leader probability to be let's say roughly proportional to sorry to log n by D and in this situation then with high probability we have that the number of leaders will be order log n by D fraction of the vertices so the number of leader will be very small fraction of the vertices that we started with and at the same time every non-leader just because it is adjacent to at least D nodes each of this probability of being a leader every non-leader will have an adjacent leader with high probability okay so this means that this means that the number of vertices in the next round drops by this fraction so I mean if you think of D being very large then it will draw by a fraction which is roughly proportional to the degree so this means that the high degree we can get here in our assumption the faster we make the progress, the faster the graph size drops so this will be our sorry power point jumps ahead so this will be our new strategy in particular we will do a step that I call densification namely it will be a step that increases the degree of all nodes above certain threshold and then we will do leader contraction okay and this will be the iteration so I'll do densification, leader contraction then I can densification and leader contraction and basically this is this is the algorithm that we get this is pretty much the same algorithm that I showed when describing the leader contraction except these two red boxes which I'll call the process of densification in particular before picking the random set of leaders and then contracting the non-leaders we will convert the input graph GI into GI prime which will have a mean degree at least some bound DI which is a number we'll fix later and we'll respect the same connect components as the graph GI so we'll basically add many more edges to increase the degree to some bound DI to be a leader to be basically as I suggested in the previous slide it will be logged in by DI and maybe we'll take minimum between half and this one if DI is not high enough this is the main algorithm and now I'll describe this algorithm now okay are there questions at this moment? okay so let me first state the densification lemma the algorithm that I'll describe in the next two slides so the densification lemma says that we have an algorithm an APC algorithm that can convert any graph GI into GI prime which has defined on the same vertex set has exactly the same connect components but has an additional property that each degree is at least some bound DI which is an input parameter to this algorithm and the number of rounds will be keeping the diameter this is where the log diameter appears and the total extra edges will be order of ND squared so the graph will add more edges and the number of edges that is added is ND squared okay note that if we want to make every node to have degree DI then somehow we expect to add roughly ND edges in this case we do a little bit worse in particular it's ND squared but it will still be good enough okay so I assume this lemma at the moment let me finish the analysis of our overall algorithm for the connectivity problem and then I'll describe how to implement this densification lemma so again the algorithm how does it do how does it proceed we have a graph G1 we do the densification step which increases the number of edges we do the graph G1 prime then we do the leader contraction which decreases the number of nodes to obtain the graph G2 and then we repeat so again we will obtain a graph G2 prime which will do densification in particular increase the number of edges then decrease the number of nodes and so forth and we alternate okay so let's analyze this process so total available space is order M so remember this is what we start with the space the total space over all the machines is order M which is at least the number of edges this is related to all this question we have sorry so in iteration I our graph G1 has let's say N I vertices okay so this means that we can afford densification with a degree D I to be roughly root of M divided by N I this is because our densification lemma will produce total number of X radius which is order of N D squared so that if we want this bound to be about order M we can set D I to be around this okay most importantly note that the smaller the number of vertices it is the higher degree we can afford okay this will be important so now we do linear contraction of in high degree graph in a graph which has degree D I this reduces the number of vertices to N I which is roughly N I divided by D I this is as I mentioned on the previous slide that no linear contraction is in a sense much more aggressive and now you know how does this proceed so in iteration kind of it's natural to look at this quantity to see how this quantity changes over the iterations in particular we look at the ratio between the total space available divided by the current number of nodes and N I plus 1 so just plugging in this N I which is roughly N I by D I we plug in into this formula we get that this is equal to M divided by N I so the previous fraction times the DI that we can afford now we plug in this DI which is root of N by divided by N I we get that the new the new ratio between the total space divided by the number of vertices increases by a factor by a power of 1.5 so whatever the previous fraction was we raise it to power 1.5 okay and now how many times can you raise a number to a power before it becomes about M this means that if we after log there is 1.5 of log M rounds this quantity this this ratio would be at least M so of course it cannot be more than M because the number of vertices has to be at least one so this means that the total number of rounds will be at post log log M again since M is at least M is polynomially related to M this means that this is also log log M number of rounds okay and overall we have low log M rounds of densification linear contraction and each densification step is runs in log D steps so this means the total number of rounds is log diameter times log log M and just to explain roughly what happens is that as we do as we go through the steps the size of the graph drops in particular the number of nodes drops and as the size of the number of nodes drops this means that we can do much more aggressive densification so we can add more edges and the more edges we add the faster the leader contraction step decreases the number of nodes so this is in a sense how we get how we get this bound so the number of iterations of densification linear contraction is only log log M because there is this exponential increase here okay so and now let me describe this densification step the densification algorithm again what we want to do is we take a graph and increase its add some edges so that the degree increases to at least D the total space will be Nd squared and it will proceed in log D rounds okay and the basic idea is I'll call it truncated broadcasting it's related to some broadcasting algorithms that were done in these models before and in particular what it means is that you know we want to add more edges while preserving connected components so basically an unrestricted broadcasting will basically tell all the neighbors about all the neighbors that I know so a node V will tell each of its neighbors in the neighborhood so gamma will be the neighborhood of V will tell all the neighbors that it knows and this will create more edges so the neighbors learn about my neighbors and this process will continue in iterations and we will send the sent of neighbors up to D so as soon as the set becomes more than D we don't need to send more than D nodes just because we just need to ensure that the degree is at least this lower case D okay and here is how the algorithm works so we will have this neighborhood it will proceed again in iterations so the beginning will have gamma one to be the node V together of all of its neighbors and we will proceed again in iterations if it happens that all the vertices have gamma basically the known neighborhood to be at least D when we are done I mean this is exactly what we wanted here otherwise for each vertex V if its its own kind of known neighborhood has size at least D when we are all set so we will just copy the same neighborhood for the next iteration as well okay now if this is not the case but there exists some neighbor so there is some U in the neighborhood of V whose size is at least D so there is a very knowledgeable neighbor of our vertex V then this node U will send its neighborhood and the new neighborhood of the node V will be its old neighborhood together with whatever my knowledgeable neighbor knows okay and output with the capital D just to denote that this is truncated to the item so the node U doesn't have to send the entire neighborhood which can be very large of size M let's say but you can just truncate it to the item okay and the third case is when neighbor of this case happens so when the neighborhood of V will be just the union of all the neighborhoods of its neighbors so its neighbors just communicate to it all they know and this will create the new neighborhood of the node V okay and at the end of the day basically we output we output the graph G prime where edges are D U where for every node V and its neighbor U in the neighborhood of V so I realize it is 2 o'clock now what is the procedure for being slightly out of time or just don't ask us this is fine this is where you get a D squared from if D neighbors and each one might have yeah so I'll have illustration exactly on the next slide so this is exactly the same algorithm so let me just illustrate exactly where the space comes from so the case 2 the first case is kind of obvious what happens here I mean we just don't do much in the second case let's say V is the node 6 and 9 is this very knowledgeable neighbor which has very large neighborhood let's say here so what we'll do is in this iteration node 9 will send part of its neighborhood let's say D is equal to 4 so when the node 6 will create these new edges this will be the additional the edges that we add to the gamma of the node 6 and the third case is exactly where also this additional space will appear namely node 6 is adjacent to many nodes each of them have neighborhood which have size at most D otherwise it would be case 2 and then it will create edges between 6 and all of its all of these vertices basically distance 2 from 6 and this is exactly where the space and D squared comes from because the maximal number of edges that we can add to 6 in this way is about D squared so there are at most D neighbors here 2 to 6 and each of the neighbors of 6 will have at most D neighbors by itself so overall it's order D squared exactly okay and basically this is the last algorithm sorry the last slide for the algorithm so let me just quickly show why this runs in logarithmic number of rounds so log of a diameter of rounds so it's actually a relatively simple proof and it works by induction and in particular what we prove by induction the inductive hypothesis is that the gamma of i plus 1 of V is either already at the boundary or if it is it did not reach this desired boundary then this is composed of exactly all the nodes at distance at most 2 to power i from V where by distance I mean the hop distance this is unweighted graphs okay so and we can prove this by induction so of course this is true for i equals to 0 otherwise for bigger i consider so we fix the node V now fix some node x at distance at most 2 to power i from our vertex V since it's a distance 2 to power i there must exist some node which is in the middle basically a node U which has distance at most 2 to power i minus 1 from both V and X then think about U as being the midpoint between V and X okay and we either have that the size of the neighborhood of U is at least D and then this conclusion will follow by case 2 namely in the next stage the gamma i plus 1 of V will contain at least D neighbors from this node U which is verinology okay or if this is not the case then X belongs to the by inductive hypothesis gamma i of U is less than D so in that case X must belong to gamma i of U because it's a distance at most 2 to power i from it as well as U must be in this neighborhood of V and then by case 3 gamma i plus 1 of V will be union of all the neighbors of neighbors therefore the node X will be included in the gamma i plus 1 of V okay so and you know once you believe this inductive step this means that since all the nodes for any node V all the nodes are distance capital D because diameter is D this means that in log D around for every node V either it's neighborhood must be at least of size D or it must have exhausted its entire connected component this completes the analysis okay so this is the last technical slide the next slide is conclusion of your question so far hello a question one minute sorry yes so one one question is this also what's in PRM the algorithm no it won't because we cannot collect this information fast enough so it won't give you better than login no no it won't give you better than login so what will it give you better than do anything better even for low even for low diameter drops I don't know so what what is the second question what is the consequence if the number of edges the space is only bounded to Nd instead of Nd squared if you just throw Nd then what will be the consequence they won't be like some constants will be improved that's all right so it doesn't matter doesn't really matter I know so let me go to this earlier slide since we're running out of time maybe you can some people might have to leave and we can continue with some information yes that's right okay so okay so the last slide so you know the main statement that I showed to you today is that we can solve graph connectivity on graphs with diameter capital D in a PC model where the space is arbitrary polynomial so n to power delta for a constant delta and the number of rounds basically log diameter times basically this quantity which is at most log log N okay and if N by N is at least polynomial in N this becomes a constant okay so it becomes just clean log D number of rounds so we also showed that similar you can obtain similar runtime for some related problems so some you know simple extensions of connectivity in particular spanning forest DFS sequence and being in spanning tree where the diameter now is respect to the diameter of the spanning tree and there are some natural open questions which remain in particular to obtain a clean log D round so without this additional factor also whether it is possible to obtain deterministic algorithm so far this leader contraction algorithm very heavily relies on randomness and the other kind of natural problem kind of you know kind of the next the next step in extension but feels like a very interesting problem to solve in the PC model is the shortest path algorithm so given a graph and two nodes find the shortest path so as far as I know we don't know even an approximate algorithm that would run in polylog number of rounds so this is you know even independent of the log diameter rounds so even in polylog and rounds this is doesn't seem to be known and I'll stop here thank you thank you Alex so before we continue to the question session let me just mention that oh thanks everyone for joining also we thank all the viewers on YouTube so we can't see you but we know you there so 10 of you this time thanks for joining it through the YouTube live broadcast in two weeks we have C Ceshadri here and two weeks after that it's the last day of October it's Michal Kutki so some of you might have to leave but we are still here so feel free to go on with some questions to Alex any more questions yeah I mean just I want to say a little bit about the K machine model I mean I think it would be good to compare that a little bit more with this one here because I think that's more realistic model than MPC because it's a more that's what captures things like okay I mean I should probably be more aware I think I've seen it at some moment a while ago but I don't remember the details now I know it's a different model but you know I'll be kind of look a little bit more in detail and understand the exact details um I did do it before but I just don't remember now off the top of my head that's second I decided on your paper I just looked at it but I didn't read the paper I mean I don't know in what context you are writing it so this is there are a series of three papers there are other papers but there are a series of three papers the original paper came in soda 2015 the connectivity and MST algorithm came in and there's a reason for 18 paper on other grapple showing lower bounds and upper bounds on grapple like triangle enumeration and things like that page rank okay okay so I think I let you stay here discussions just turn off the broadcast so Google is not too angry with us