 Good afternoon. The work I'm going to present here is a project that in which I have been involved for, it started four years ago. Actually the support for the funding is finishing this year from the original sources and now what we are doing now is, yeah, thanks to the support that we are getting now through the Maria El-Maetsu, but again it seems the Maria El-Maetsu is starting. What I'm going to do, really spend most of the time, is describing what is to give context to what we are doing, describing what had happened in the whole project during the four years. So what is the idea, the goal of the project, and then at the end go and describe specifically what is the work that we are going to do under the Maria El-Maetsu. So it's about distributed computing, but we are taking a view that is somehow driven by data. So it's actually relatively, I will still continue. So let's start by a simple motivation. So this is how networks, routing works in networks. So if you have a network of nodes like that, what it is is that each node is stored at a table, like the one is there. This is a standard routing algorithm called distance vector protocol, and what it is is that you store a table in which when you say, when you receive a message that says, that is going to let's say node d, it says where to send it, okay. In this case it's called distance because somehow what it is, is that this table has been built somehow using these values that are there, the numbers that are there, so to get to using the shortest pad when there is this. So each node has these values, these tables, okay. There are other algorithms or protocols, and this one is called pad vector routing. That is a little bit different. What it is is that instead of putting just the next hop, what it puts somehow is a pad of how to get to the end. And the reason for that is that in many situations there are reasons or policies that say, hmm, I am not going to use the shortest pad. I'm going to use this other pad. Let's say for I have contractual agreements with somehow that pad of the network and then I have to use it and I cannot use the other one or for security reasons. I don't want to use the ones outside. I want to use a different one. But again, what it is is that each node in order to do this, to in order, has a table of this form in which you say, okay, if a message comes from this to go to this direction, this is what I need to do next. What I have to, what I'm going to follow. So the question is how this tables are compute? So how this information is compute? So what it is is that if you have the information centralized, so somehow you have all the data about the network. What are the nodes? What are the links? What are the values? Okay, there are standard computer science, there are many standard algorithms to do that. Okay. Dijkstra algorithms, there are many, many algorithms to do this. And what it is, is that there is this language based on logic that is somehow an extension of the relational database that is able to describe the centralized version of these algorithms very succinctly. Okay. So this algorithm that is here, somehow centralized. If you have the information of the whole network centralized, you are able to compute somehow all these tables centralized. And the idea is to do, so it's a simple recursive procedure in which what you start is first collecting the paths using direct links. This is the first rule that you see there. The second rule, what it is, is that extends the path by an extra link. Okay. So at that point, you collect all the possible paths and then there is an extra computation after you do the collection of all the paths that somehow picks the best cost. The best cost in this particular case is the number of hops, but you could put your own policy in which best could be something specific like security or contracts in which you define what is best based on that. And then after this is computed, somehow you create the routing tables. Okay. So what it is, is this a notation for relational databases in which you have a relational call routing table like this one in which you have the arguments. Okay. And this is what you are computing if you have all these stored in a relational database. So the observation that was done around eight years ago is that this logical system, so it is based on mathematical logic by simple new interpretation of some of the symbols there, you can actually do the computation distributedly. So that you don't need to have the information locally, but distributedly. So what you do is you copy the same, let's say the same program, the same set of rules in every node. And what you do is that by annotations, this annotation what it means is where this somehow this particular computation is going to land. Okay. So actually it's not, I don't think it's very important to understand exactly what is the meaning of this, but what it is is that now we take the same program that we had before to do the construction somehow centralized and by just a simple reinterpretation of how the computation is done. So you are able to do this computation in a distributed manner. That is actually the way that is computed in reality. So in reality, when you have a switch or a router, you have a program written there inside usually that starter, let's say in C or a language of that sort that computes this. They don't use this type of languages, programming languages, what they use the standard programming languages to generate this table. And these programs are running all the time, all the time, because there are updates. And here is the same. So the difference, what happens here is that or the interest of our work has been, aha, now we have a description of the system that is not C, that is something a little, is more formal, more mathematical, in which we can interpret this using mathematical logic and do analysis. So at the same time, we can use it because this you can implement in a relational database. So you can actually use it more or less the same, the same syntax that we had here and actually run the program without doing anything else. So motivation, more motivation. Why this is important? Why to think about this new language, this formalization of distributed programs like this simple program that is using routing. And for that I give you another simple example. This is in a completely different situation. This is a distributed voting algorithm. The algorithm has 1, 2, 3, 5 steps. The first step is that each node has an initial opinion, either good, that is the light color or bad, that is the dark color. So what the nodes are going to do is to communicate the opinions to the neighbors. They're going to say, so I got a good opinion or a bad opinion. So the next is that the nodes are going to receive these different votes from the neighbors and then by majority they will decide if they are still being good or bad. And if they switch, what they have to do is inform the neighbors that they change and they repeat. So this is a simple algorithm to do voting. And the question is, does this algorithm terminate? So actually in distributed computing what it's called is convergence. Because it's a repeat loop that you start working, working, working and sometimes the nodes not need to do compute anymore. They somehow go to a stable situation. So as you see this algorithm is very simple. It's just actually more than a table you have a single value and what you are doing is sending the value around according to some simple criteria that is counting something that they have in a table, the number of votes of the neighbors and then repeat. So answering this question is difficult because computational in distributed computing as opposed to uniprocessor computing, computational nodes lack knowledge of the global state. So when you do computation you don't know about what is happening in the other nodes. There is no global time frame available to the nodes. So they are, in general, the clocks are not synchronized. It's very difficult. It's a hard problem to synchronize nodes. And there is no determinism because when you send a message you don't know when it's going to arrive. So for the programmer, when they see the program, it's very difficult to, is this program correct? Is the program doing what I want to do? Because there are factors somehow that are difficult to visualize. So what is the project, has been the project focus. The project focus is to try to use this idea of a more declarative description of the problem in order to be able to answer questions as is this program going to converge. So the standard programming mechanisms that exist in software engineering is that you, and even here you describe your algorithm in some pseudocode. Sometimes it is a formal specification could be. Then taking that, you prove the correctness that the code is going to do what you want. The algorithm is going to do what you want. After you make sure that the algorithm is correct, then you implement it. You take this algorithm and write it in C or write it in Java. Then after that you have to actually run it. You have to check that the implementation somehow matches the code or the algorithm that you described at the very beginning. So there is a mismatch there. So you have to make sure that the actual code that you are writing is capturing precisely what the algorithm does. And this is not an obvious step. And finally, of course, you have to deploy and you execute. So the idea here is that we want to reduce these steps. So we want to describe the algorithm in a declarative language in something mathematically precise in which we can symbolically prove that that particular piece of code written in this declarative language is correct. And then after that deploy it. So we don't have to do this translation, the translation to C. Because the language in which we describe the problem that is an extension of SQL is the same language that implements the algorithm. There is nothing to change. So you prove that your program is correct. What happens in general is that you have to do checks or proofs about C, the program C or Java. In order to do here, we don't have that somehow coincides with the algorithm. Here we don't have that because we are writing the algorithm directly in the same language that is going to be implemented. So of course, this is not a general programming language that we are going to use. What it is, is an SQL based inspired distributed computational model. So what we are thinking is each note, we have a relational database. And somehow what is happening is that you do queries there and depending on the results of the queries you send data, messages to other notes that is data. There are also pieces of databases. So the specification is directly executable. And at the same time, the specification we can because is written in these languages, this precise language based on mathematical logic that we can do actually analysis using many techniques that exist in theory improving. So what we have done is in our SQL based computing framework is we have this language in which we can describe the protocols or the programs and given a particular environment, this environment could be, let's say, a network of computers or you have, you can have a network of virtual machines or you can have a single processor with a single machine in which you are going to have many processes talking to each other, something in which you want to do distributed computation. It could be in a network or it could be in a different way. In which somehow what it is is you are implementing a distributed program. So there what we do is that we have to somehow play with the engineering part of how the description of the algorithm that is written in this language communicate underneath using standard protocols. It could be remote procedure calls if it is in a particular application or it could be using TCP if you are in a different situation. So it depends. So our system, we have the programming language but underneath, we need to somehow have all these tools in order to be able to actually deploy the system and run it. So in addition to that, we have a simulation environment. The simulation environment is driven to networks. It's driven to networks in the sense that you are not going to do, the communication is going to be passed through switches or through routers. The communication could somehow, the nodes can move and the network can change, you can add. So the simulation somehow helps you to at least test what is going to happen inside this simulated environment. So we have developed that. And the last part and the most relevant to what we are doing is this abstraction that we are having because we are writing this in this SQL like language. So in this SQL language what we have is we have the description of the protocol that is in this logic. We need to have somehow the description, the initial description of the global state of the network. So how the network works. And in addition, so in order to do the actual analysis, complete analysis, we need to somehow know something about the communication. And this is standardly done by using somehow a description of how the links behaves. And using, it says there, there is an IO automaton. So I will explain more in detail about what is this, but the idea is that somehow we need to capture what is the somehow the protocol, the mechanisms of communication, because we can, it can be synchronous communication, could be a synchronous communication, could be reliable, could not be reliable. There are many conditions that could happen there. And in order to do the analysis, you need to know that. So this part is also required to be in the system analysis. What is this then is that we take this and we are able to use techniques from model checking that is very common for theory improving situations in which we are able to build a global transition graph of the whole system, everything, the whole system. And what we do is based on this global transition graph, we are able to somehow ask queries that provide us answers like is this problem going to converge? I mean, so that's the idea of the whole framework. So getting more into the details of the different components of the framework. The first one is this distributed state machine system in which, as I say before, somehow when you are before deploying the application, before deploying what you want to do, you have to describe how the communication happens. So you are able to give directives of how the network in which you are going to be the deployment is going to work, what type of communication. For example, you can say that there is a communication, there is a tree and so the communication is in one direction or a communication is only between parent and child or is a regular network in which everybody talks to everybody else. All these things have to be somehow specified. The other thing is how to, so you have to inject data, you can inject data to each local node. So how to inject data and how to store the data locally. So we have two types of implementation. One in which the data is actually stored as relational database and the other is in which we use just Java objects as the data. And in addition, there is something independent, actually again very more in the engineering part that somehow you're able to put policies in order to somehow stop traffic based on things that are meta of the system in order to protect. Okay, so but actually our analysis system doesn't work with all this. So we are not able to analyze a program that for example uses Java as a data model or uses policies. So we are not, we are staying somehow in the most simple system in which everything is based on relational databases and there are no policies to filter the communication between nodes. So to give you a better more understanding how it happens. So a node, what it is, is a database that contains the state, the cornerstone of the system and what it is is there are two inputs. So it's modeled like an input output state automata in which the state is represented by a relational database and there are two types of input. One input that can be done by the application itself. Okay, so the local node, I can say okay I am right now joining this particular, let's say, Skype. I right now in this moment connecting to Skype. So my local node will somehow send information to a node that says Jorge is connecting to Skype. So this is local. The other are these messages. So you can receive messages from other nodes saying okay this has happened in my node or I want this from you. So what it is is that this input that could be from the application or from other nodes will somehow trigger a change of state. And as I say, so what it is that the data, the state is represented by a relational database tables and the messages are SNPs of relational tables. So you send also relational tables. So what the programmer has to define is this function, the transition function. What happened is the message arrives what I have to do. If this is the input what I have to do and that's the program. But the program is going to be limited to this SQL condition. So what the only thing that you can do is to make a query to the current state of the state, the current state of the system and the system where you are moving. So you are somehow able to say okay if this is happening I'm going to start computing something and collect information that is going to be part of the current state. So is this possibility of doing this iteration that allows us to build somehow the the table. So for example in the in the routing in the routing problem a message could be for a for a node says okay I found this path to this destination. And then the node receives and says okay I need to check my other paths and I too what are the current paths and now need to start computing to see what is the resulting part. If I have to change my mind you have to change my table because now I am learning about a better path. So that's the process but all this is doing is being done with SQL and the part of the computation is that this extension of SQL that has been available for more than 10 years is that you are able to write recursive SQL queries. That's the only thing okay. So I think I'm going to skip this example and go to the simulation. So the simulation environment is based in a in a system that has been developed in the United States for the it's an open source system and was developed for by the Army Research Lab of the United States called CORE. And what it is is an emulator of networks and in the emulation you are able to describe switches, you are able to describe networks, you are able to describe how good the connection between the nodes is throughput and times for a loss of traffic and all this. And then what we are able to do is to we are able to somehow deploy these different the different state machines inside this emulator. And what you are able to do is to run this emulator see modify on the fly so you can somehow see the network the current state of the network and move elements around for example move it far away if it's a wireless connection move it far away from the network so that the connection breaks because it's too far away or bring it closer so the connection appears so the messages could appear or connections could appear disappear on the fly in order to test the program. So at the end you can this there's a full system in which you can retain logs and then analyze logs and see how the system is working. So that allows the programmer to somehow debug how the system works so it's very in that sense it's very programmer oriented programmer see oriented in the sense that is that tool for him to somehow debug this particular problem that the system is fully distributed. So doing analysis now these are going to so given that we have a formal description about how it's not in the network those local computation can we verify if the local computation will terminate that that was the initial question let's start with that problem so so the question as I say before cannot be answered if you don't know what is the communication between the system okay how are it how how the communication happens what is the protocol so the way that the system is is thought or device there is a basic consideration and that is we consider only message patches as communication mode passing message because there are other other architectures that can be used but we the only the one that we we describe here or the one that we study in our system is just message passing that is the most common in distributed system that are from different computers let's say so now after this is decided what happened in the communication the communication is how the channel how the links behave so what one of the things that we are doing is that we are going to assume that the order the analysis that you have done is that the order is preserved because that's also all the thing that could happen that you send two messages in in one order and they can arrive in different orders so these are possibilities we we have been doing work in this moment in which the order is preserved so the important part is that the description of how the dynamics the dynamics of the network happens can be written in the same language that we are using to describe the other protocol so we can write rules relational database rules that describe this behavior of the system the only part that is important to be able to do this is that the you should allow or there has to be allowed non-determinism so there has to be somehow a way of describing probabilistic results when you do the computation in the communication okay so in order to to be able to simulate this so so what we have is that we have defined rules to do synchronous models and asynchronous models so and not just that even though we we do we have done analysis or the analysis tools for when there is a preservation of the order it's very it's very easy to modify somehow the models to for example a model when order is not preserved also model in which the the communication is all reliable when you lose messages so all these things can be somehow relatively easy added to the system in the same language in the same formal language and the important part of that is and then we can take that and do analysis so the analysis test takes this mathematical model in logic and uses standard techniques to describe somehow the state of the system and the state of the system is somehow takes the the union of all the states take the union of the links this is the standard definitions that are used in the in distributed computing to do manual proofs that correctness of algorithms what the advantage that we that we have is that we don't have to do it manually all this description can be do automatically and then describe all these states based based on these in this mathematical model and then from there use somehow tools that they are coming from model checking to do the analysis and what it is is that so we are able even though there could be millions of millions of states we are able to build a representation of the system with all these all these notes and links because we have a very compact representation of the states very very compact so so and then what happens is that instead of doing analysis on the on the on the on the symbols on the on the on the structure of the program what we do is analysis of the graph so what it is is a conversion state state is a global state with in which all the queues are empty so what we have to identify is states in this graph where the queues are empty and then that means because the mean of the queues empty is that there are no messages no messages there so everything stops so nobody nobody is putting messages and that means that it's stopping so to detect that so what we need to do is if we want to check for example if the system sometimes converges because because non determinism it might not be all the cases so what we have to see to look is if there exists an state in which the these queues are empty so we have to look in these states and traverse the system and see if we if we find these states for oscillations what we have to do is to look for loops in the graph so the all these algorithms are very very fast you can do because now you are doing graph traversal algorithms okay a existence of permanent oscillations means that you can somehow are able to stay in a loop without being able to reach somehow there is no possibility of reaching of one of these states of okay all best converges is that you are not able to get into loops and never convergence is the absence so so what it is is that you are now transforming we are transforming all the problems all the related to convergence to a properties of the graph that the graph queue view is humongous so it could be very large because there are many states so to give you an idea so one thing is this problem has been unknown to be computationally very hard so it's not that we are dealing with easy problems so what it is is that what we are doing is taking advantage of all this knowledge about how to do theory improving and how to represent a state how to do model checking to try to solve the problems at least in a reasonable way and see how far we can get so what we did is we took these algorithms it is called BGP that is a gateway algorithm that is known to to you can so it's very similar to what i say policies to decide what are the paths what are you going to do you can put policies and the users can define their own policies and what happens is that this algorithm you can somehow set policies that it doesn't converge so what it has been proved is that there is a sufficient property of the network configuration that ensures that there are no divergence that the system will always converge and the sufficient property is that doesn't exist a particular topology that they call the dispute wheel it's not important to know that how a dispute wheel what is important to know is that okay if the if the network doesn't have this dispute wheel the system will converge the thing is that there is only only a sufficient condition the meaning is that this is the dispute wheel is there you don't know what is going to happen and it's very difficult to check so what we did was to somehow implement describe the BGP algorithm in our system and introduce dispute wheels to see what happens if we were able to analyze and here is what we get so so we we have networks okay the largest network so we we try larger ones but the largest network in which we were able to say something meaningful was with eight nodes okay so you can imagine how difficult it is so but we we were or we are a state of the art okay so nobody can do more than that so there are other techniques that what you have to do is to take the topology and try to somehow reduce the topology removing things around necessary simplify in order to make the the the network is small okay and somehow big like super nodes as opposed to the whole the whole network in order to be able to do this okay and you can see that in order to be able to work with this a somehow a network of eight nodes it was this is in in minutes in order that's all the time that that took okay so there is a another typical configuration that is used in the in the field called the surprise wheel so as the surprise gadget the surprise gadget is a network topology that contains one of these dispute wheels but it is very difficult to see that this is going to converge okay so sometimes when the surprise wheel appears in a particular configuration the system actually terminates and sometimes doesn't terminate so it's very difficult to determine manually if this is going to happen so we did experiments in in this and we were able to do things that people were not able to do before there was somehow again a prove a do analysis over over this practical system that is the bgp protocol and again we have results about if always converts sometimes converts or never converges okay so so you see these are at the end networks that are small so there is still a lot of work to do manually in order to be able to get to situations that when somebody wants to do the analysis he has to somehow isolate where is the area of the network that he wants to analyze in order to be able to do this however the thing is that this this this uh this type of computation is not only for network for for for algorithms of routing this is a general computational algorithm mechanism method that uses this relational database so the the the goal or our our a project in in the maya and mezzo is to actually try to verify formally how difficult is to do analysis under these conditions the conditions of the the of this computational model that is relational databases as states and you send you send a somehow database or relational databases and you restrict your computations to these sql recursive rules okay there are actually our language is not the only one there are several of them in different in different places developed in different in in different parts of the world the area u-pane has one berkeley has one there are different places there is another in in in germany that that they do this type of computations based on this idea of of relational databases so so the collaboration here i started this uh with victor because i am known especially in complexity i need somehow help in the complexity complexity area uh at the same time i also uh contacted uh people that has work in actually in web services that has a feeling that is very similar uh to that are especially in complexity in order to help us also with the work and we have some brilliant results as well we have from like half a year ago uh with the the collaborators in the University of bolsano and the results are very limited so we assume a reliable communication and a fixed topology during the execution so we cannot change the the we also classify the input classes so how the system could could be a closed system is that doesn't receive input from the outside the only thing that happens in the system our message is going on between nodes an autonomous system is a system in which the input happens only at the very beginning so only you receive input just at the very beginning there are many situations in this that happens but it still is limited and the last one is interactive so that you can introduce input at any moment during the system okay so the the thing that uh again is problem is is new for the the the distributed community community distributed results is that we are using a concept that is inspired in databases that is called boundedness and boundedness is related to the amount of the the size of the languages the size of the language that is uh that is somehow uh available in a particular snapshot of the system and what it is is that if you bound that if you say that you cannot have more than let's say 20 million symbol and this is fixed for this the the whole life of the system then you are able to do somehow proof so there are disability problems disability questions that are resolved by just putting that and notice this is this is not saying okay this is not saying that there is no there is no that there is no an infinite number of symbols what it is you can have infinite number of symbols but you can never have in a particular state more than a fixed amount that that could be very large okay so based on that what we obtained was that the analysis of convergence in the different in the different situations many situations is undecidable but we also so we got that in in in other situations very precise uh solutions uh answers and most of the problems are in in in the p-space complete space so it's a little bit it's harder than in p so it's an exponential exponential space so this is what we have right now very very limited and what really we want to do is to uh study complexity on the specific types of topologies and how see how this affects you have rings trees mesh that is a little bit more you can you have more information about to to work about uh also starting the compresses of the of when the the the system is changing when the network is changing and last look for other properties that are not these convergence sometimes you don't need to converge you need somehow to just look for properties that each node maintains okay so example the node never have never done these i never get to the state this particular state so are more local properties uh that are very useful so we are looking to see or to understand better the complexity in order to help to develop new analysis tools uh i think that's it and with zero time can you have a so use a general language to express these as formulas of some logic so i want to test whether i give an ltl formula okay also okay for this dynamic system and things like that i mean right right yeah actually the the the the complexity proofs that we have the format compresses are based on on an extension of linear time because we need to somehow it's first order linear time okay so the that's the proofs are based on that particular the implementations are not based on that okay but the proofs are so so so we have that language so if you express a particular question the question that is about convergence but yes yeah we use