 So it's my pleasure to introduce Nancy Lynch who's joining us from MIT. Nancy has been doing fabulous work in the area of theory of distributed systems for the, I don't know, let's just say a long time. She's been awarded the Knuth Prize, the Immanuel Fjord Prize, the Dijkstra Prize twice, and these are once more than all of their respective namesakes, so I've never heard of. Dijkstra did win the Dijkstra Prize. Yes, he did. So, and let's see, so Nancy is well known for lots of things. It's kind of hard to summarize such a great career, but probably most well known for various impossibility results, like that impossibility of distributed consensus and mist of faults. So more recently she's been looking at wireless distributed problems and problems that arise when you're in wirelessness and mobility, and I think that we'll be talking about this today. Okay, thank you. Okay, so this is about, oops, so I guess you can hear me, okay? Okay, so this is about distributed computing theory. Distributed system is a collection of components that compute individually interact, and the idea is they're supposed to all be cooperating to solve some problem. All distributed, all computer systems now are distributed. Distributed computing theory is a part of theoretical computer science, and it studies theory underlying these types of systems. Okay, so this is, it's a cover of my book on distributed algorithms. So distributed computing theory traditionally contains a whole lot of algorithms and lower bounds and impossibility proofs for problems that these kinds of systems are supposed to solve. It's a list of some typical problems that have been widely studied in that area. Also, because the algorithms are very complicated, difficult, they have race conditions, it's difficult to understand them, there's a lot of work in this area on techniques for modeling and analysis. The traditional theory has involved fixed networks, but now distributed systems are changing, and so if we want the theory to, we'd like the theory to also work for wireless networks, mobile networks, robots, etc. So there's more to do. Okay, so why should it be so different? Why can't we just carry over results from standard distributed computing theory? So wireless networks, instead of sending messages over individual wires, have different sort of communication model. You've got local broadcast communication where a node sends a message and it reaches many nearby nodes. Who it reaches depends on the signal propagation patterns. And then you also have an issue of contention. If you've got some node that's sitting there and different nearby nodes are sending at the same time, the receiver may receive two things at once, which it doesn't actually interpret as messages, it will just get a collision, so it causes message loss. You may also have to deal with not knowing who the participants are. The network may be changing, nodes can fail and recover and join and leave the system. You may have to deal with mobility. So there's lots of lots of stuff that's different. So it's complicated. We'd like to have a theory that's similar to the traditional theory, but that's not so easy to do. So how do you approach something like this? It's not a solved problem. I'm not going to show you a solution to everything today. Well, you can try to identify individual problems and solve those. You can get upper and lower bounds for those and then you can see how the different problems fit together. And then you can identify some layers, abstraction layers, that can help you decompose the design and analysis of algorithms. So what I'll do here is just describe some recent work a lot from our group, some others, just beginning theory for wireless networks. So this is just a picture of some of the kinds of problems you might want to study in wireless networks. You could start with basic issues of managing the message contention, and then you might have to synchronize the clocks among the nodes. Then you might want to establish some structures like routes for sending messages between nodes across the network, other structures, spanning trees. You might want to just broadcast a message globally or compute a function in the network. At a higher level, you might want to manage serious amounts of big data in dynamic networks. Then you might have some control applications like robot coordination. So there's lots of stuff. This picture on the right is sort of an attempt at describing the kinds of layers. You have basic services like clock synchronization, contention management down here, and then you could go directly to global communication, or you could start building network structures and then move on up. Okay, so in this talk I'm going to basically talk about the first part. It's going to take up most of the talk. Some results on dynamic distributed systems where you assume reliable communication. And then we'll look at what happens when you deal with collisions and unreliable communication. Okay, so we'll start with the first topics on reliable communication. And there's basically four things I'll describe pretty quickly. How do you compute functions in dynamic graph networks? How do you maintain connectivity among robot swarms? How do you maintain consistent memory in dynamic networks? And then one example of an abstraction layer that can work for wireless networks. Okay, so for computing in dynamic graph networks. Okay, so this is from a stock paper from 09. Illustrates how nodes can solve problems in the network that keeps changing. So the model is you have a dynamic graph. What's a dynamic graph? Well, you have a fixed set of graph nodes and at every, you have communication working in rounds. At every round you have a different set of edges in the graph. So the graph is completely changing at every step and you imagine you have some adversary that's controlling the changes. Okay, but you know, if the graph isn't connected you're not going to be able to do anything. You need some kind of assumption. So we'll make a pretty basic assumption that the graph at least is connected at every round. It'll be totally different, but each time it's connected. You could weaken that, but okay. Then the algorithm consists of a bunch of processes. These are state machines, automata, associated with the graph nodes. We assume they have unique identifiers. They don't know anything about the graph. Well, for some results we assume they know a little bit about the graph, like an upper bound on the number of nodes, but not in general. Okay, so what happens is at each round are everybody, each node looks at its, each process looks at its state and decides on a message to send. Message gets sent to all of its neighbors at that round. Okay, so we assume here reliable communication. So the message is reliably received by all of its round R neighbors. Not all makes sense. So here's a simple example, a global broadcast problem. If you have some arbitrary nodes starting with a message, the problem is for the message to eventually arrive everywhere. And the algorithm is very simple. Any node that has the message keeps transmitting it. At every round from when it gets the message forever. Okay, so theorem is that this algorithm solves the global broadcast problem in connected graph, dynamic graph networks. In fact, every node receives the message. All the nodes receive the message within n minus 1 rounds. So I don't know if that's obvious. The key idea is that at every round some new node receives the message. Some new process gets the message. Okay, why? Well, let's look at any round, round R. So let's say A is the set of nodes that receive the message before round R. We know that since the whole graph is connected, there has to be some edge between a node in A and a node that's not in A. You just partition the nodes into the two sets, the ones that don't have it and the ones that have it. There has to be some edge between them. Okay, so that node U who has it is going to transmit and the node at the other end of the edge V is going to receive it. So that's at least one new node that gets it. It's a key idea for a lot of these algorithms. Just a picture of A and V minus A and there has to be some edge between them. Okay, a similar example. The problem of computing the minimum. Let's say everybody starts with an input value and every node is eventually supposed to determine the minimum of everybody's input in the network. Okay, so the algorithm is similar. Every node transmits the minimum value that it's ever seen at each round. And after, if it has to actually perform an output, it has to know when it's done. So for this, let's assume a known upper bound on the number of nodes and after this number of rounds, everybody outputs whatever it has as the current minimum. So this solves the minimum problem in connected dynamic graph networks and the key idea is again the same. At every round, some new node receives the minimum. I mean different guys are receiving different things, but we only care about where the minimum goes. And at every round, some new node is going to receive the minimum. You can see A is the nodes that start out having the minimum and B and then V minus A, the nodes that don't have the minimum. There has to be some edge between a node in A and a node not in A. So somebody new gets the minimum, okay? All right, so building on this idea, you can do more complicated things. Suppose you have these nodes in a network and this, it wasn't even clear that you could solve this problem initially. Every node should output the exact number of nodes in the network. They don't start out knowing who else is there. Okay, well there is an easy solution if everybody can just send everything that they've ever heard. So nodes could keep sending all the ideas they've ever heard at every round. And then the same kind of argument that, you know, we've just seen will show that everybody is going to get all the ideas. But maybe that's cheating. It seems much harder if the messages have to actually be small. So suppose the nodes can only send small-sized messages. For example, they can only send one ID at each round. And they have to be small enough so you can only fit one ID at each round. Well, then it becomes hard because a process is sitting there wondering which ID to send. It doesn't know who it's connected to. It doesn't know what its neighbors have already gotten. But there is such an algorithm. So I'm not going to go into details, but the general flavor of it is, okay, so the algorithm is based on a size verification sub-protocol. To figure out what the number of nodes is, let's guess that it's k and we'll keep doubling the guess. So we have a sub-protocol that can guess. It verifies whether or not the number of nodes is less than or equal to k. And we'll just use that for k equals 1, 2, 4, 8, and we get the exact count that way. Okay. So this actually solves the counting problem in order n-squared rounds. It's pretty nice. So the trick, of course, is how do you solve the k-size verification? So it's based on another sub-protocol, which is called k-committee election. Okay, so k-committee election is a simple problem where the nodes have to form committees of size at most k. So they all have to, you know, record in their state the name of a committee. The size of the committee, meaning the number of nodes in each committee, can be at most k. But if k is big enough, if k happens to be greater than or equal to n, then it happens that all the nodes are, in fact, in the same committee. So that's the heart of the k-size verification. Okay, and just to give you a brief flavor, the protocol begins with a leader election phase that elects one or more leaders. These will be the leaders of committees. But it will manage to elect only one leader if k is, in fact, greater than or equal to n. Okay, so each leader then starts a committee, and then he tries to recruit other processes, other nodes to join the committee. So you go through a bunch of phases, each one takes order k-round, and in each phase, the leader tries to get one more node to join its committee. So it's a lot of sequential stuff going on. And the protocol uses ideas that are very similar to the ones I just showed you for computing the min and for global broadcast. Okay, so with this sort of technique, you can actually solve non-trivial problems. Okay, so you can solve basic problems, turns out, in dynamic networks, as long as the network remains connected. And actually we can weaken the connectivity requirement. If we know that the nodes are connected every so often, you can formalize that. That would be good enough. It just slows the algorithm down. Okay, but then that leads us to the next problem. Why would we expect the mobile nodes to remain connected? In fact, in some cases, maybe we could ensure that the nodes remain connected. So here's the next topic as an example of how you could actually ensure connectivity. So this is from a completely different area. The first work was by my grad student, Rotem Ashman. This is now work by another grad student, Alex Cornejo. So in his work, he's considering a collection of robots. They're cooperating to do something, like explore an environment. They have to communicate. They have to determine properties of the environment. They have to coordinate their activities. And it requires them to solve all kinds of problems, like global broadcast, electing leaders, etc. Okay, so how are you going to design algorithms like this? Well, that's much easier if the robot swarm maintains a connected communication graph. Okay, so he's worried about the problem of making sure that the robots do remain connected. So his idea is to use a separate service, a connectivity service. That you can combine with a motion planning algorithm. But the idea is that the connectivity service is going to be the piece that makes sure that they maintain connectivity. But you still want to try to allow the robots to follow their motion plan, at least approximately, as much as possible. Okay, so how does this work? He defines a communication graph based on Euclidean distance. So you have a graph. It's a geometric graph that has an edge between robots U and V. If the distance between U and V is at most some fixed bound V. And the assumption is that nodes within distance D are able to communicate reliably. Okay, so what's happening? So the robots are trying to do something. Each robot has a motion planner program running on it. And at each round, the motion planner proposes where the robot should try to move, proposes a target location. But before the robot actually follows the trajectory, it proposes this location to the connectivity service to try to get approval. The connectivity service either approves it or modifies the plans, trying to modify it as little as possible, but is guaranteed to preserve the connectivity. Okay, so the connectivity algorithm is a true distributed algorithm. It just works based on local rules. Okay, so the way it works is at each round, every robot figures out which of its local edges are supposed to be maintained. It's going to guarantee that it's going to maintain some of the edges between itself and some of its neighbors. It doesn't have to maintain all of them. Okay, and then the connectivity service is going to guarantee that all the critical edges are preserved. So the key problem here is determining the critical edges. Okay, so each robot is supposed to determine enough edges so that when you put them all together, the graph as a whole is going to remain connected. But you don't want too many of the edges to be called critical. You don't want to be forced to preserve too many edges, or you may not be able to make such great progress toward your targets. Well, it turns out that solutions to this problem can be based on many different geometric conditions, and this is a problem that's been worked on by other people. In the literature you have critical edges based on what's called Gabriel graphs, another one relative neighbor graphs. There's a whole sequence of papers by Lee, Wattenhofer, and others which talk about cones starting at each node, and you try to populate cones of a certain size. Make sure you maintain connectivity to nodes within each sector of a certain size. And then you can also consider local minimum spanning trees of the edges. So what we do is to preserve the local minimum spanning tree edges. You basically get a minimum spanning tree where the weights depend on the Euclidean distances. So you get a local minimum spanning tree at each node. Turns out the union of the local minimum spanning trees of all the nodes must contain the global minimum spanning tree of the entire network. So that has to maintain connectivity. Now, what Alex discovered is that all the other geometric conditions that other people had studied, Gabriel graphs, et cetera, and all these papers on cones, they all produce sets of edges that happen to include the local minimum spanning tree edges. So this shows that local minimum spanning tree is provably at least as good as any of these other choices. So what you can do then is prove local minimum spanning tree is good enough and separately prove all these other things imply local minimum spanning tree. You don't need to go through all the proofs, the separate proofs that all of these different kinds of conditions work. Okay. So the point here is you can get inexpensive local algorithms to maintain connectivity for robot swarms, which supports the robots carrying out algorithms like ones we saw that require connectivity. You can also extend this to a robust form of connectivity where you're K connected instead of just connected, connected by K different paths. Okay. So what else can we do in dynamic networks? So now we think about the kinds of things that we do in traditional distributed computing theory, fixed networks, not robots. One of the main things we do is try to maintain consistent shared data somehow in the network. Well, can we do that when the network is changing? Okay. So let's take a look at another project on maintaining atomic memory in dynamic networks. So what's atomic memory? Well, you want to be able to read and write shared objects. You want the results that you get to make this look like you're actually accessing a centralized shared memory. So one of these objects are going to be stored somehow in this network. But, and they'll be accessible by read and write operations. The read and write operations can be concurrent. You want the operations to act as if each one is actually performed at some fixed point in time sometime in its interval. So this is a kind of standard definition of atomicity or linearizability if you've heard of that. Okay. So for example, somebody might do a read and get a response of zero. Let's say the data start, an object starts out with an initial value of zero. Somebody reads it, gets a value of zero. Concurrently, somebody tries to write the value eight to the object and succeeds in doing that. Or you can have the same picture where somebody does the same right, but now the read gets the value eight. So either one of these is okay. Either one of these is atomic or linearizable because I can find a point in the interval in each case where the operation is behaving like the, the, like it was performed at exactly that point. So here the read is performed before the point for the, for the right. And in the other case, it's the other way around. Make sense? Okay. And these are just examples that are not atomic. You can't write the eight, complete it, and then a read comes in and gets a zero. You also can't get a right happening concurrently with two reads where the first one gets the eight and the second one reverts to the old value zero. So this is nonatomic behavior. There's no way to put those points in these intervals. These aren't behaving as if they occurred at some point in their intervals. Okay. So how does one do this? Well, let's look, let me just review basically what happens in static networks. Okay. So the, the algorithms are basically designed to give you fault tolerance. So you can't just have one copy of each object. You usually would replicate it at many nodes. Okay. So we can define quorums of nodes, quorums for reading, quorums for writing. We have read quorums and write quorums. And the key property here is that every read quorum has to have a non-empty intersection with every write quorum. Okay. So in order to read, this is a simplified version of the algorithm. In order to read, you would access a read quorum of the object, copies, the replicas, and you would take the latest version that you happen to see. To write a new version, you would write it not just to one copy but to a write quorum of copies. So the idea is if you've managed to write to a write quorum of the copies and then you read, you're guaranteed to get the latest version that was written. Makes sense? Okay. So actually it's a little more complicated than that. You have to put tags with the, the versions to keep track of which ones are later. And then, excuse me, a write needs to figure out what tag to use. So it first will access a read quorum to find the biggest tag so that it can then make sure to pick a bigger tag. And so that's an extra phase that a write needs. It can't just write. It has to first read. And in fact, now you can get optimizations to avoid this in most cases, but in general, a read would have to propagate the version that it's about to return to a write quorum before it returns the result. Because if he doesn't do that, it could be that he happens to see the only replica that has that version and someone else could come later and not see it. So now both reads and writes are two phase operations. They both access first a read quorum in the first phase and then a write quorum in the second phase. You can do some optimization on this. Okay. So what happens in dynamic networks? Okay. So what do we want to do? Well, we still would like to guarantee atomicity in all cases. Still would like to tolerate faults and have good availability and good performance as long as things aren't changing too rapidly. Why might you want to have such data? Well, if you have robots exploring and collecting data, they might want to just be maintaining the data so other robots can read it and get the latest value. Are there possible applications? Okay. So an algorithm for managing this in dynamic settings is the Rambo algorithm which stands for reconfigurable atomic memory for basic objects. Rambo came from the soldiers running around trying to maintain battle data. Okay. So again, we have each object replicated at several locations. And we still use quorums. We have now a particular, for each object, we have a particular set of nodes that will store copies of the object as the members of the quorum, of the configuration. And then we have read quorums and write quorums as we did before. So this is a lot like the static algorithm. And you have concurrent operations. So this is good if you have just small changes, just transient changes. But if things are changing more permanently, bigger changes to the system, you might want to actually migrate the copies somewhere else. So you might want to reconfigure to change the quorum configuration. Okay. And you have to worry about maintaining atomicity even though the configuration is changing. You want to be able to do the reconfiguration concurrently with the reads and writes. Okay. So, yeah. No, the graph can be changing, but we assume it's not changing in some extreme crazy way. Configuration changes here means configurations for the object. These configurations. So formally which locations are keeping copies of the object and then what's the quorums? So you want to change the configuration for each object. Yeah, probably, you know, some nodes are leaving or dying or, you know, new nodes have entered. So, yeah, you might want to just change who the owners are based on whatever is happening. It doesn't have to be. It could be. Okay. So the Rambo algorithm, I guess the biggest innovation for it is that it separates the reconfiguration from the main read-write algorithm. So the reconfiguration part just determines the configuration. It determines a breach object. It determines a sequence of configurations based on, oh, I took something out, but based on some proposed configurations, some requests to form new configurations which could come from, you know, a variety of sources like monitoring what the network looks like. And then it tells the main algorithm about the new configurations. So the main algorithm does the reads and the writes. It manages the reads and the writes of all the objects. It's basically running the static quorum-based protocol. But instead of just using one configuration, it will use, if there's more than one configuration active because a configuration is changing, it will use both configurations. Okay. So you can have, during changes, you can have more than one current configuration, like an old one and a new one if the change hasn't been completed yet. So then we just do the reads and writes using both of the, both the old and the new configurations. It's kind of overkill, but it's enough to maintain the atomicity. So this separation allows you then to just garbage collect the old configurations in the background when you know they're not needed any longer. So just briefly reads and writes have two phases again. Phase one, you collect the object values from read quorums, maybe not of just one configuration. Now all the active ones. Phase two, you propagate the latest value to write quorums of the active configurations. And you can do lots of reads and writes concurrently. Quorum intersection, make sure you get atomicity. New configuration arrives, you just continue till you're finished, but now you may have to incorporate the new configuration as well. Okay. How do you get rid of the old configuration? Well, so the main algorithm is going to garbage collect the old configurations in the background. It's doing this concurrently while the reads and writes are going on. So basically it's a two-phase algorithm itself, but now phase one is a little more complicated. You have to tell a right quorum of the old configuration about the existence of the new configuration. At the same time, you're collecting the object values from a read quorum of the old configuration. So to do a reconfiguration, you have to access both a right quorum and a read quorum of the old configuration, both to collect values and to inform the old configuration about the new configuration. Phase two, you just propagate the latest value to a right quorum of the new configuration. Yes, yeah. Right. We didn't do that, but under some, we basically said under certain assumptions about the rates of change, then you're guaranteed to be able to do it. So we gave some sufficient conditions based on rates of change. But yeah, I'm sure there's more that you could do to characterize more precisely what's sufficient and what's necessary. Okay. Well, implementing reconfiguration turns out it uses distributed consensus. Now, distributed consensus is in general a hard problem, but we're just using it for the reconfiguration, not for the reads and writes. So this is just something going on in the background. It doesn't delay the reads and writes. So old configuration members propose a new configuration. The reconfiguration service uses consensus. Consensus actually can be implemented. Well, you know, our impossibility results as you can't absolutely guarantee that you're going to get consensus in an asynchronous system, but you can guarantee that you don't violate the safety properties of consensus, and you're guaranteed to terminate if the system stabilizes for long enough. So yeah, this can be delayed while the system is unstable, but once the system stabilizes, you should be able to finish consensus. In the normal case, you'll finish consensus quickly. Okay. But even if you don't, it just delays your, it doesn't delay the reads and writes while this is happening. Of course, it introduces the danger if you wait too long that, you know, the old configuration will be dead. Can't use it anymore. Okay. So what does this say? You can get atomic memory and dynamic networks at a pretty reasonable cost. Okay. But now these algorithms are sort of tricky. The proofs are not easy. So, you know, the other question that comes up when designing these kinds of algorithms is what do we do to simplify algorithm design for these sorts of networks? So an important approach is to try to introduce some abstraction layers. So here's an example of an abstraction layer. We grouped a lot of work on virtual node layers for a few years. So the idea is you'd have a mobile network, nodes moving every which way. You're going to overlay that with a virtual static network. So basically you're going to have stationary virtual nodes at fixed locations in the network. And they're not really there. The mobile nodes are going to emulate them. How do they do that? Well, they can use a replication strategy, replicated state machine strategy as Leslie Lamport developed, or they can elect a leader. But they'll be able to have all the, every virtual node can be emulated by real mobile nodes in the vicinity of the virtual nodes location. Okay, so now you need to be in a setting in a model where the nodes have access to geographical information. They need to know their positions and where the virtual node is supposed to be. Okay, once you have the virtual nodes, it becomes much easier to write algorithms over the network because, you know, virtual nodes are stationary and you assume they're pretty reliable. So you can write algorithms for problems like message routing across a wireless network, mobile network, global broadcast of a message throughout the network, maintaining atomic memory, robot coordination, et cetera, using virtual nodes. So now I'm not going to show you anything formal, but I'll just show you some pictures. So here's a bunch of mobile nodes running around all these directions. So now imagine that the virtual nodes just spring into existence with the, what's happening is the mobile nodes are emulating them, but now you can have a higher abstraction layer where you just pretend they're there. So what can you do with this? Well, suppose you want to route a message through the network to a certain vicinity, a certain geographical region. Instead of just, you know, passing it from one mobile node to another one or flooding the message, you can just send it systematically on virtual nodes in the direction of that location. So you can do like sensor net type data collection this way, just sending on the virtual nodes. You can do coordination applications. You can have a virtual node coordinating the activities of robots or vehicles or aircraft in the vicinity of that virtual nodes location. So put some structure on the network. It's pretty haphazard if everybody is trying to negotiate what they should do. Then the virtual node dies. And when others come back into the region, it comes up again, but it's forgotten anything that it knew before. So you have to be able to program in a model where you can have a virtual node losing its state. You have to be able to adapt to that. So the higher level algorithm is going to have to have some resilience to it as well. Some applications would have the virtual node actually duplicating its data and nearby virtual nodes. There's different ways to approach that one. Okay. So another example is coordinating the motion of robots. A little example is, suppose you have a curve in the plane and you have some robots anywhere. So now the object is to move the robots so that they're all on the curve, or almost all on the curve, and they should be evenly spaced around the curve. So this could be, for instance, if there's a hazardous waste area, and this is the perimeter, you might want to have the robots around the area. Okay. A virtual node approach here would be every virtual node could coordinate the robots that happen to be in its region. What will it do? Well, it will direct them to go to the curve, and it will space them on the curve. So this is a lot better than the robots trying to just negotiate with each other about where they should go. But if one virtual node has too many robots in its location compared to its neighbors, it can send some of the extra robots to its neighbor's region. It should also ensure that at least one robot remains in its region to make sure that it can emulate the virtual node. It sounds kind of circular, right? But now we're in a setting where we have control of where the robots go, so we can ensure that the regions remain occupied. Virtual traffic light, okay. If you have an intersection without a real traffic light, you can have the computers in the cars programmed to emulate a virtual node. Separately, the virtual node can be programmed to be a traffic light. So you can put in easily any policy that you want, like, you know, 30 seconds in each direction. And the car just sees, you know, red or green on the computer inside the car. So here, you know, the virtual traffic light dies when there's no cars around, but who cares, right? There's no cars around. So this is an example of where you don't have to do anything for backup. You can just lose state. Okay, so continuing this to its extreme, suppose we want virtual air traffic controllers. So you have over the ocean no air traffic controllers, right? So you can put, so you have aircraft flying, you know, in air space where there's no controllers. So how are you going to control where they go? Access to regions. You could use virtual air traffic controllers, which can be emulated by computers on any aircraft in their sectors. You can program the virtual air traffic controller to behave just like a human air traffic controller would, you know, simple policies. What does the human do? It keeps track of all the aircraft in a local region. It tells the neighboring air traffic controller when they can send another aircraft into the region. It tells the aircraft how to move within the region, like spiral around while you're waiting to go to the next region, whatever. So you have here is everyone needs to reach or execute the virtual machine directly in order for it to work. So is there any way to combine this notion of a virtual mode? No, there's some kind of idea that I also get a guarantee from the other guy that he has it back. Well, this is, yeah, this is what the FAA has regulations for. Like everybody runs TCAS software now. So this is a setting where you actually could do this. No, I mean this is just sort of everybody should be running the compatible protocol. So I'm a theoretician. There's a lot of practical reasons this might not work, but the virtue of it is that this is actually compatible with the existing air traffic control system over land. Okay, so all right. So I hope you see that virtual node layers are a way of supporting simple algorithms for a whole range of problems. What they do is they hide the complexity of the dynamic motion of the nodes. They're very easy to implement. These replicated state machine strategies, leader strategies are easy provided you have reliable local communication. We actually did a couple of projects involving simulating these, but one of them was using NS2, a simulator that supposed to give accurate simulations of wireless networks and they have collisions. So of course this wasn't working completely right when messages got lost. We had to change the algorithms to try to make them fall tolerant. Okay, oh another remark is you can also use a VN layer to get an easier implementation of reconfigurable atomic memory easier than Rambo if you have the virtual nodes storing the object replicas. Then you can just use a static quorum based algorithm with quorums of the virtual nodes and the reconfiguration is handled automatically by the underlying emulation algorithm. Okay, rather than having this heavy weight kind of brute force reconfiguration that we did in Rambo. Okay, so all right. So they're nice theoretically, but we come up against this issue of message collisions, message losses. Seth Gilbert's PhD thesis has a whole big section where he talked about how you do virtual node emulations for collision prone wireless networks, but this got us started on a bigger study of algorithms in wireless networks with collisions. So this is the last part of the talk. So here's, if you have unreliable communication with message collisions, let's look at a few problems that arise here. Okay, message collision. So you have a static network, a static graph, and communication still goes in synchronous rounds. At every round, some nodes are transmitting and the others are just listening. Transmitter can't hear anything except its own message. So what is a listener here? Well, if nobody, none of its neighbors in the graph are transmitting, here's nothing, just idle. If exactly one of its neighbors transmits, it gets the message. If two or more of its neighbors transmits, it gets this top symbol, this is bottom, this is top, which is a message collision. So it doesn't get the message in that case. Okay, so a simple example, actually this is a very recent paper, is a leader election. Think of a single hop, end node, complete network. Everybody can communicate with everyone else, but the nodes don't know who else is there. Okay, so the problem is the nodes want to simply agree on one of them to be a leader. And everybody should output the leader's ID. Well, think about it, you don't know who else is there at all, right? So you have to do something to figure out who's the leader, who's got the biggest ID or the smallest ID, without knowing who's there. Okay, if you didn't have to worry about collisions, if you were in the model like the one I've been talking about so far, every node could just transmit its ID. Everybody would receive all the IDs because it's reliable, and then they would just pick the minimum one and you're done. Okay, with collisions, you've got a problem. If the node decides to transmit, others could be transmitting as well. You get a collision, so you've wasted the round. If the node decides not to transmit, maybe nobody else is going to transmit either. So again, you've wasted the round. So you need to do something a little smarter. Oh, this is just animation from my co-author, the leader. So leader election. All right. So let's just imagine that everybody, even the senders, here's the transmission, here's the transmission results. You can get around that pretty, it's pretty easy. So everybody hears everything. Okay, so now you can work in terms of the bits of the ID, IDs of the nodes. So what each node does is it starts out being active and it just systematically goes through the bits of its ID. Okay, when it gets up to a certain bit, if it's still active and it has a one in that bit position, then it's going to transmit its entire ID. Otherwise, it doesn't transmit anything. Okay, then it receives, it sees what it gets. What's the received message? It could be nothing. It could be an actual message, an ID, or it could be a collision. If it's an ID, then, well, we're assuming here, everybody hears the same thing. So they all get the ID and that elects that process with that ID as the leader. If you hear a collision, then, if you heard a collision and your bit is zero in this position, you just drop out. So the collision means there's two or more other nodes that have a one in that position. So all the guys with zero can drop out and just let the remaining ones fight it out. And you just proceed this way through all the bits and by the end, you're guaranteed to isolate one node as the leader. So this is also a lower bound. The b minus log n, number of bits minus log of the number of nodes, is also a lower bound. And the kind of proof that you do here is suppose you have a faster algorithm and you use a pigeon hole argument to say that there are some number n nodes that exhibit exactly the same transmission pattern, the send or not, for all of the rounds. And if you just pick a system consisting of those n processes, they're always going to not send together or always collide. So that delays the, it delays any useful information from being conveyed. That's an oversimplification, but it's a basic idea. Okay. All right, next and very key problem is the problem of reliable local broadcast. So now let's consider an arbitrary graph, not just a single hop graph. And we have some bound on the number of neighbors of any node. So the problem, simplified problem here is some of the nodes in the graph have messages to send. Really for local broadcast, we'd like to have everybody sending and everybody should be receiving those messages. But simplified problem, some of the nodes have messages to send. And all we want is for every receiver who has a sending neighbor to get something, right? Every receiver that has a sending neighbor should get at least one message in the clear, no collisions. Okay, so I'm not saying every message has to be correctly delivered. I'm just saying every receiver has to receive something. So okay, an important algorithm to handle this problem is what's called the decay algorithm by Bar-Yuta, Goldreich, and Itai. This goes way back to 87. So there's a lot of phases, but what happens in each phase is each node transmits its message with decreasing probabilities, probability one, probability a half, a quarter, one-eighth, exponentially decreasing probabilities. And you can show with very high probability if you do enough phases, every receiver with a sending neighbor receives some message. Why is that work? Well, look at it from the point of view of any receiver. In any phase, one of the rounds is going to hit the sweet spot where the total transmission probability of used neighbors is close to one, between a half and one. If you get that, then there's a constant probability that exactly one of used neighbors transmits in that round and you does not transmit. So you get the probability just to the right, you don't know how many nodes there are transmitting, but at some point during this phase you're going to get the probability to be right to have a good likelihood that exactly one neighbor is isolated for transmission. So you get constant probability that you receives a message and you can get that probability as high as you like by doing enough phases. So this guarantees with high probability every node receives a message in constant phases and in fact you can show all the messages get delivered to everybody within enough phases. Okay, global broadcast. So now we're back to the problem of a single node starts with a message. The message should arrive everywhere, but now we're worrying about it with collisions. Baruda et al developed this algorithm that's based on decay. When any node first receives the message, it transmits it using decay for a few phases, enough phases and then it hauls. That's it, that's a whole algorithm. The claim now is with high probability the message gets delivered everywhere within time that's pretty small. It's just the diameter of the network and the log of the total number of nodes. Okay, and this the algorithm is this fast because decay ensures that every receiver gets something quickly when anytime any neighbor is sending. So for instance, here's a graph we want to get this message broadcast to everybody. At any point you have some nodes that have the message and some nodes that don't. This guy is sitting here and within a small amount of time because of decay it will hear from one of its neighbors. It doesn't matter which neighbor it hears from. Receiving the message from any neighbor is good enough. Okay, so if you look at any receiver J at the end of the network, how long does it take to get the message? Well look at the shortest path let's say the top the upper path to J. Message progresses one hop along the path within a decay phase with high probability well within several decay phases with high probability. So it goes there, a few decay phases, a few more decay phases. Okay, so the total number of phases is small as with high probability. Okay, so there's there's many other problems we'd like to solve in collision prone networks, but we have to cope with collisions and now things are starting to get complicated. You have already decay mixed in with a broadcast, the global broadcast strategy, but if you want to solve these other problems and at the same time worry about managing the contention, it's just going to be too hard to develop any sensible sort of theory. So an idea here is to try to mask the contention management within a reliable local broadcast abstraction layer and use another abstraction layer local broadcast abstraction layer to separate the issues of contention management from the higher level algorithm issues. So this is a reliable local broadcast layer. We called it an abstract MAC layer because a MAC layer is the part of a real wireless network that manages the contention intending to provide reliable local broadcast communication. So if you have physical network and you could build an abstract MAC layer to hide all the collisions then it's much easier to build the higher the high level algorithms like the ones I've been talking about already. So this is just a quick since I'm running out of time quick description of our one example of an abstract MAC layer. You send a message to it, it gets delivered somewhere, you also the sender gets an acknowledgement so it knows that it's ready to send the next message. What is it guaranteed? You get reliable delivery of the messages to your neighbors. So this is not global broadcast, it's just local broadcast. And the sender gets an acknowledgement after all the neighbors have received the message, but the sender doesn't need to know who actually has received the message. So this is not like unicast with an acknowledgement. This is just you send, you get an act back, your neighbors have it, but you don't have to know who they are. And what to analyze for this? Well you can talk about the acknowledgement time, you can talk about what I was describing before, the time for a receiver to receive some message when some neighbor is sending, call that the progress time, that could be a lot smaller than the acknowledgement time. Okay so what can you do with these layers? Well you can design algorithms to implement the layers over various kinds of low level networks. You can use decay style algorithms, we have one paper on that. You can also use completely different techniques for different sorts of low level networks. So this kind of an abstraction can really separate out the physical network layer issues from higher level algorithm issues. So we have a paper with Muriel Medard and people from her group on using network coding algorithms to implement reliable local broadcasts. So all right so you can get reliable local broadcasts and then you go ahead and design algorithms to solve higher level problems. Once you have those you can use a general composition theorem to combine the two levels of algorithms and get algorithms for the high level problems automatically over the collision prone or you know whatever physical model you're using. So here's examples of some higher level problems that we've worked on. Out of time right? So I should wind up. Okay so this just says well another example of a high level problem we can solve is single message broadcast and multi message broadcast. We get good bounds for these, I'm going to skip the bounds. So the summary here is the message collisions complicate the design of algorithms. You need to do contention management. You could use abstraction layers to try to separate the issues. Okay does that resolve everything? No we've neglected at least one other important thing which is uncertainty. These models so far don't have a lot of uncertainty in the communication range. So so far everything is based on having a single communication graph that represents exactly where the messages reach. But for signal propagation there's uncertainty involved. So you could go to another extreme and talk about two graphs where the first graph represents where messages must be delivered and then a bigger graph with more edges represents where the messages may be delivered. Okay so this is kind of imagine an adversary says every time a message is sent it might get to some of the nodes that are connected by the dotted edges get to all the ones connected by the solid edges. Not sure the distinction. So it's under adversary control. Somebody sends a message and an adversary, imagine an adversary says fine I'm going to let it get to here and not to there. The adversary gets to choose which of the dotted edges. Okay yeah I think of it as just having an adversary controlling where the messages go as long as it sends it on the solid edges. It cannot go outside of G prime. Okay and then just flip through this a lot of the results change. Local broadcast decay doesn't work very well because there isn't a single sweet spot anymore for the probabilities. A couple of cases that don't work. Let's get that. Another example that doesn't work for local broadcast. So we have a bunch of papers that show things that do and don't work for this model but this is what we're working on now. So this is basically the end of the summary. These slides will be available so people can see the references if they're interested. Okay so all right so all I did was overview some recent work that looks like bits and pieces on distributed algorithms for wireless networks in two pieces with reliable and unreliable communication. Still we need a lot of work to develop a complete theory for wireless networks that's as good as the traditional distributed computing theory. We'd like to have complete sets of algorithms, lower bounds, ways to fit the algorithms together and it should span all the way from physical network up to high level applications. Some aren't some aren't yeah you have to be careful about that. Decay for instance doesn't require any feedback at all. Some of the results do. Yeah I tried to give a little bit of a flavor with a pigeonhole argument so there's these adversarial arguments that show that you can choose some nodes whose behavior conflicts in a certain way. An example I like is we have some good impossibility results for the dual graph model because that's a much harder model to deal with so it's easier to get impossibility results. So in POTSY I think last year might have been 2010 there's some lower bound results for global broadcast in networks based on dual graphs. Those are pretty pretty interesting kinds of techniques. I mean in practice I guess if you the distance is long enough it's not going to. I mean I know there's always some small amount of signal that propagates to the ends of the earth. Oh okay no I wasn't modeling repeaters that would be like separate. Just trying to model simple signal propagation characteristic. Yeah I mean you know there's a lot of work on the SINR model where you actually model the drop off of the signals. You know it's interesting to consider how well can you implement local broadcast in that type of model we haven't done that work. We have to make sure you put uncertainty in that model as well. Yeah I mean often that is just you know the signal drop off and that's pretty deterministic and maybe you'll add in Gaussian noise so that's probabilistic but there's no adversarial behavior there maybe doesn't capture all the uncertainty maybe in a sense it's a better behave model than you might see in some real network. Yeah okay so space needs nodes and then when an airplane gets near one of those that's when the node comes into being. Exactly. And then if there are no airplanes around it goes away. Goes away right. But I find a plane where the air traffic controller could just go away. Well I guess I mean you know it's just somebody arbitrating among a bunch of planes in the vicinity they need some protocol to do that. T-casts right now when two planes come together one goes up one goes down they have some kind of a protocol that doesn't always work that it's basically just an advisory though because the pilots are supposed to use common sense since the advice can be wrong. So you know maybe if you had something that looked more centralized to coordinate the behavior of the nodes it would be better than having this sort of haphazard negotiation kind of strategy. I don't even know what T-casts does with more than two planes.