 Yeah. Thank you. Oh, thank you. Thank you, Candice, for the introduction. I'm the Marcus here in this webinar, and I currently wanted to add that Stefan and me combined have about three decades of experience with TLA+. And in this webinar, the two of us will take turns showing you what TLA+, the temporal logic of actions is and what it combined with its toolset can do in order to make sense of reactive systems that is concurrent distributed systems that are notoriously hard to get right. Correct, because I guess this is the reason why you're here in your career in your tenure as a software engineer, you've realized that building correct concurrent and distributed systems is super difficult. Yeah, if they ought to be reliable so that the operation and the maintenance of these systems is relatively cheap. It's very difficult. It's very hard. And this is really the bread and butter where TLA+, comes in. Designing these systems so that they live up to the expectations. For example, our colleagues in the industry have used TLA+, to design planetary scale distributed databases, concurrency primitives that exist in the standard libraries of your favorite programming languages, but also client facing applications, distributed applications running on cloud services, micro service architectures. All of these have been designed with TLA+. Unfortunately, we don't have time to look into these systems today because we really only have one hour, so we can't study reward systems. But we selected a representative of a concurrent system on which we will demonstrate the powers of TLA+. And I could even imagine that some of you might have already implemented such a system in one way or another. It's easy to explain. We have a front and back end system. So we have in front end, we have producers that create some sort of data. The piece over here, in this case, there are four, and they want to send data to the back end. This could be a user request, for example, that the back end has to act upon. And the front end, the producers add the data request into some data structure. Let's call this buffer today. What will be queued until the consumers remove the individual elements from the buffer. And if you like in the multi-threading world, this would be an array that sits between the threads, the front end and the back end threads, or in the multi-process world, this could be inter-process communication, IPC, right? But also then in the distributed world, the four nodes, the four circles here could be nodes that send you some middleware, some messaging bus or so on, data to some back end system. The unifying property of the middleware here, the array, the IPC, is that it has some fixed size that causes the producers to block at some point. If the buffer is full, the producers have to wait. And they wait until they get a signal from the buffer that there is room available once a consumer removes an element from it. And this is all great and dandy and things go back and forth between the producers and the consumers until the point where this thing unfortunately deadlocks. Okay, I will now switch over from the diagram here from the abstract world that we like to study in TLA plus to the nitty gritty details of the concurrent program. So you see a bit of rust code over here and everything outside of what I don't discuss is irrelevant. Here is the implementation of the algorithm I just explained. Yeah, the producers, a set of producers execute this function here. And what the producer does is it locks the buffer. It gets exclusive access to the buffer. There is no concurrency going on with other producers or consumers. And when the producer has exclusive access of the buffer. It checks whether or not the buffer is empty. Sorry, the buffer is full. And if the buffer is full, the producer knows that it has to wait, in which case it waits yields the lock and signals some other producer or consumer that there is and waits for the, I wait for a second. In the case that there is room in the buffer, an element gets added to the buffer and the producer signals to one other party that there is now one more element in the data structure. And like I said, mutually exclusive. The consumers carry out this piece of code here. They also lock the buffer. Check if the buffer is empty. If it's empty, they wait, wait for the signal. If there is an element in the buffer, they remove it and send a signal in turn. So now, where's the deadlock? I've already consulted with our favorite AI body. And it says, as an answer to the question whether or not this code is deadlock free. Based on the code you provided, it appears that the code is deadlock free. The producer consumer threats are using a shared buffer and a condition variable to synchronize access to the buffer. The producer threat waits on the condition variable if the buffer is full and the consumer threats waits on the condition variable if the buffer is empty. This ensures that the producer and the consumer threats do not access the buffer at the same time. Yeah, that's what I said they are mutually exclusive. Which would cause a race condition. That's also true, right? If we would have no lock, then there would be garbage in the buffer eventually. Therefore, the code is designed to avoid deadlocks. Well, this sounds a bit fuzzy to me. There are either deadlocks or there are no deadlocks. Yeah, avoiding deadlocks is like, we might have a good day or not. So, let's see if we can get some more clarity to that by, for example, just testing this piece of code. Yeah, how about we run our rust implementation here with a buffer of size three for producers and three consumers. Now this indeed does that lock deadlocks. And we get this debug output here that is, I guess a couple thousand 10,000 lines long. Now we would somehow have to make sense of the output here to diagnose where the deadlock is coming from. But maybe, maybe I should just start with a smaller configuration, right? Maybe just one buffer of size one, one producer and one consumer. And now we wait for this thing to deadlock. And in the meantime, I will hand over to Stefan who will now use this time while we wait for this to deadlock to explain to you what the TLA specification of this buffer producer consumer system looks like. Okay, thanks Marcus. Let me just share my screen. Okay. So here you see a TLA version of the system that Marcus has been explaining to you. Now TLA is not like a programming language. It's a bit more mathematical things. So what we are doing here is forget about this extends this is just some import of standard libraries. We are layering some parameters. So a set of producers, a set of consumers and the capacity of the buffer that they're going to use. And as you can notice, there are no type annotations, right? These are just names identifiers that are introduced here. And everything in TLA is a set or at least it is an untyped language. Okay, so what we are doing here is we introduce an assumption on these constants, which says, okay, producers should be non-empty set, consumers should be non-empty set. I mean if we have no producers and no consumers, the problem is not interesting. Also we are assuming producers and consumers to be disjoint. Think of that as at least their roles being disjoint. And also there's a buffer and it has a capacity which is a natural number. And it shouldn't be zero, a zero place buffer doesn't make sense either, right? So these are explicit assumptions that we have to state. We are going to model the actual system and for modeling the system we are using two variables here. One variable that represents the buffer. And the other variable that represents the set of waiting processes. Either producers that are blocked because the buffer is currently full or consumers that are blocked because the buffer is currently empty or was empty at the time where they tried to access the buffer. Okay, this definition of vars is just a tuple containing both of the variables. More important is the definition here, the set of running threads, that's all the producers and consumers, so producers set union consumers, this backslash cup means set union and TLA plus. Remember that those are sets, right? But we subtract the set of waiting processes, right? Okay, so these are the set of producers and consumers that are not currently waiting. And then we have notify and wait operations similar to what Marcus showed in the Rust code. And so notify, what will it do? Well, if some process is waiting, some producer or consumer is waiting, then it will wake up one of the waiting processes. So, and we don't care which one and there's no notion of random numbers or probabilities and TLA, so we just pick an arbitrary one. We say, okay, we remove some element of the wait set from the current wait set. So wait set prime here denotes the value of wait set after that notify action takes place. Right on the new value of wait set is the old value with that element X removed that X is just an arbitrary element that is in the wait set, right? And we are sure because we just checked that the wait set set is not empty. Well, if the wait set is empty, not if I just to know up so we don't change the wait set. And similarly, the wait operation just adds a thread or a process T to the wait set. The two main operations now that we have in the system is put and get on that to case a put will be invoked by a producer. Right, and this models produce a T adding a data item D to the buffer, right, and there are two cases. So this is a disjunction we in TLA we write that as a list where disjunctions and conjunctions are basically the bullet items of a list. So in two cases, either the, the, there's room in the buffers of the length of the current length of the buffer is below the capacity, and then we can just append the data item to the buffer. And that means now there's an item available and will modify one of the processes using our operator modified that we just defined above. So in two cases, if the buffer is full. So if the length of the buffer is equals the capacity, and then the producer has to wait, and we invoke the wait operator that we just defined. So that is symmetric so get means a consumer trying to retrieve a data item. So for simplicity, we just ignore actually the data item because at the end of the day we're interested in finding a deadlock or not so which item is read is not really not important. This is a message. Whenever you're modeling a system, think about what is relevant and what you leave out. There's no universal model of a system you write a model or a specification for a certain purpose and here we're interested in that deadlock checking. And so we can actually forget the data values. Still included in data values in the put operation, simply because it will make into interpreting counter examples that we get easier. Okay, otherwise the two operations are symmetric. And then we tie everything together so we are describing a state machine. So we are describing the initial state in which the system starts so initially the buffer is empty so the empty sequence is written like that. And the weight set is the empty set which is written like this. And then we describe all the possible transitions in the system. And so, traditionally, these operators are called in it for the initial condition next for the next state relation but really these are just names you can call them whatever you like. So the transition relation says well they exists this backslash even steps is some running thread T. So some thread is not in the wait set remember such that either T is a producer, and then it will do a put. And here we just insert its identity into the buffer that makes it easier to to to find out which process just executed. So T is a consumer will get some something from the buffer. Okay, so this is our TLA specification. Now the question is what can we do so we are interested in deadlock. So let's do a deadlock checking with with TLA. And for that we'll use the model checker. So to go back up to my to the start of my of my module, I declare these parameters here produces consumers and buff capacity. So for the model checker, I have to instantiate those parameters, just like Marcus said well I'm running the system for, I know for producers three consumers I believe with a certain buffer size I have to do the same for TLC. And the configuration file so that you see the configuration file over here. So we declare the, the constant that we want to use. So here I'm just taking a very, the smallest possible configuration basically that makes sense. I'm saying, well, let's try with the buffer capacity of one one producer that I'm just calling P one and one consumer that's called C one. And then I have to tell TLC the model checker, what are actually the initial state predicate and the next state relation that are going to be used. Okay. Okay, once this is done, I can just run the model checker so I'll do that here from the interface so check model with TLC. And here you can see the output of TLC so it has finished and it tells us well everything went fine. I didn't find any problem. And it gives us a few statistics about the number of states, etc. Okay, so with this small configuration everything seems to work. So I'm going to try a slightly larger configuration. Let's, for example, add a second producer. I'm just adding a second producer here. And I'm restarting my model checker. Aha, now we see something it tells us deadlock reached. Okay. So deadlock appears to be possible in this configuration. And not only does it tell us that deadlock exists it gives us a trace. We could follow and explore what it's doing but if we just look at the end of the trace we see that indeed, both producers and the consumer are in the wait set. So nothing can happen anymore, right, because our next state relation says, well you have to pick some running thread and now there's no running thread left so nothing can happen. So this is a deadlock. Okay. Okay, so for that configuration we get a deadlock let's try. And what happens if we increase the buffer capacity if we give it a bit more space. Again. Okay, no deadlock. Okay, let's let's try the example that Marcus had. I believe it was for producers and three consumers deadlock Marcus, have you found the deadlock in your. No, still running. Okay, so TLC is quite fast at finding this right is faster than just running the program and waiting for the deadlock to occur. Okay. So let's let's investigate a bit further what's what's really going on here. Let me just switch to another configuration so by the way, all of these specifications are available on GitHub. Probably post the URL and actually you get the history with the different steps that we are showing and even more steps than we are showing here and and you can actually redo these experiments for yourself. So I'm just switching to a later version of this repository. I should probably take it, force it to do that because I made some changes. Okay. So, this is still the same as before, nothing has really changed here. But there are two predicates that were added to the TLA specification, because often you want to do a bit more than just running TLC without any property typically want to tell it well, verify some properties for us. And so here we give it to properties to be verified. Well the first one is just some bookkeeping basically, as I told you TLA is on type so it's very easy to get the types wrong. And so it's always a good idea to write type correctness predicate. And then let the model checker check it for you. So what this type correctness predicate tells us is, okay buffer will always be a sequence of producers remember that the producer insert their own identity into the buffer. The length of the buffer is actually bounded it's always between zero and capacity so we don't overrun the length of the buffer. And the weight set the other variable. Well it's a subset it's a set of producers and consumers right it's a set of threads. So this is the boring invariant but it's still useful to verify. And the other invariant the one that we are really interested in is that it's never the case. So this hash mark you means different think think of it graphically it's like like you would write not equal in on paper. It looks a bit like a not equal sign. So wait set is different from the Union of producers and consumers. Right and if I go back to my configuration file. Okay we have the same configuration that we just ran it on and now we tell TLC to verify these two invariance and of course we already know that in this configuration I will fail because we get a deadlock and now TLC actually tells us okay your invariant is is violated. And here's a trace that shows how it can be that violated. Okay in this with this relatively large configuration. The trace is much longer than the small one that we had before with our configuration with just two producers and one consumer right so it is usually a good idea to to check your specifications for very small instances. Model checking unlike testing will explore exhaustively in this restricted state space that you indicate by instantiating your parameters. You are likely to find violations of invariance or deadlocks in our case for very small configurations. Okay, and, and again, let's give it a bit more room. Let's move to buffer capacity for rerun the model checker. The error disappears. Okay, so there's a correlation between the size of the buffer and the, and the existence of a deadlock existence on a non existence of a deadlock. We know that there is a problem with our with our system at least for certain configurations. What can we do to get rid of this. So what's the, what's the reason for the, for the deadlock well, you, the good way to find out is to examine the, the country example that you get. So let me, let me go back to our small configuration because then it's easier to read. Right. This was our small configuration. Let's rerun TLC. Okay. It tells us, okay, here we are in this deadlock state where everybody is waiting. And what happened just before. So we had one producer and one consumer waiting. And then apparently P2 tried to the deposit but the buffer was already full so it was just added to the wait set. What happened before. Here is an interesting transition right in this step here. Because here the buffer is empty, and then apparently producer P1 came along and inserted it, and it notified, right, because it was able to, to, to deposit an item. But what happened is here that producer P2 disappeared from the wait set right so the producer actually woke up another producer. And of course that will not be helpful, because it should have woken up the consumer right because the consumer would have emptied the buffer, and then the system would have, would have been able to make progress. Examining the counter example a little bit, we can, we can see that the problem is this non-deterministic notify here, right, that just says, well, wake up some process. So what we will be, now there are several solutions to that. Right, so one possible solution is to wake up all waiting processes. This would be a valid solution and if indeed if you look at the web page you'll you'll find that it is discussed there, but of course it comes with a performance penalty because now you wake up all the processes so you potentially get a hit on on your system because they all try to execute now. So, what we could do is, well, if, if like here, a producer succeeds to deposit an item, what it should actually do is wake up some consumer, right and vice versa the consumer succeeds and removes an item then it should wake up a producer because then the producer has a chance to deposit another item. So let's see how we can, we can model that in, in TLA just jumping ahead in the history. Okay, so what changed here is in the, in the put and also in the gut operations. Instead of just notifying some process, this has been replaced by notify other right so will the ideas that will notify some instance of the other class of processes here. And this is how not if I other is defined. You see down here, what appears inside this lead expressions pretty similar to what we have before. If the set here is empty then we remove some element of the set from the wait set, and otherwise, we leave the wait set unchanged, but instead of applying the S equal wait set as we had before, we are a bit more clever here and we say, okay, if the thread that calls his operation is actually a producer, then we take the consumers in the wait set, which we can define as the wait set with the set of producers removed we could also have written wait set intersect the set of consumers that would be the same. In case if it's a consumer will wake up a producer so we the set of interest is now the waiting producers the wait set without the consumers. Okay, so let's make sure that this is actually a fix. Okay, I'm going back to my configuration here that failed before that had a deadlock, and I'm checking the same invariance as before. I'm going to go back to my own checker. And yep, it says there's no deadlock anymore, at least for this configuration. Right. So that looks good. Apparently we got rid of our of our deadlock. But can we be sure of that. Actually, we just checked for this for this one instance right so let's we could, of course, try the small instance that we have before to reduces and one consumer. Okay, that also works. And you can go on and check some some other instances, but are we convinced that that this is now really true. How do we know that it works now for an arbitrary combination of set of producers set of consumers and buffer size. So one way to do that is to say, okay, we don't find any errors anymore. So TLC is great for finding errors because it's, it's, it's easy to launch and gives you counter examples that you can inspect and understand what actually the source of the arrays, but once TLC doesn't find any problem anymore, they don't know so much, right. So what you can do then is you can, you can switch to theorem proving that takes a bit more effort. Let me warn you, but still, let's let's do it for this example, it's not too difficult to do it here. Unfortunately, I have to change the interface now so so far I was doing everything from the VS code extension. The proof of system for TLA plus is not yet integrated in the VS code extension so I have to switch to another interface to an eclipse based interface the TLA toolbox. Okay, so this is exactly the same TLA specification that we saw before. So to actually prove, I guess I should, I should go one step ahead that makes my life a little easier. I have to don't have to type so much. Okay, let's go back here. The same as I said the same the same specification. What has changed now is this part down here. Where we are writing a lemma in particular at the end here right and the first theorem that we are proving is type correctness right we have to have two correctness predicates one is type correctness the other one is the absence of that look. So let's start with the easy one, the type correctness. Now we have to write an X and explicit proof of that. And the standard way to proving such an invariant is to say well, okay, the invariant is implied by the initial condition that's what the first line here says. Let me just remove that for the moment. So the type invariant is preserved by the next state relation so whenever I start in a state where type invariant is true right I'm assuming type of invariant here. I must also assuming that I'm taking a step of my specification which which is expressed by this formula next here. This implication then type, then the type invariant will still be true in the state after that transition remember these primes mean state in the state it's true in the state after the transition. Right. And once I have proved those two steps I can conclude the theorem which is says spec implies that this type invariant is always true so this empty box here means always. This is just a consequence of these first two steps here right so I have to explain why this is true I have to explain that to the prover. I'm writing a proof here. I'm saying okay, use the first two steps here one one and one two and something that's called propositional temporal logic. Because this always operate there's an operator of temporal logic, and I launched the prover and come on. I thought I got rid of this. Oh yeah and I know what I did I switched to the wrong version. Let me because otherwise we'll have that box popping up all the time. Okay, now we should get rid of these annoying boxes. Okay, so this proof went through. But we still haven't proved the other two steps so if I try to prove the entire theorem. The interface tells me well there's some yellow here, because there are two steps that you haven't proved yet so the QED step is fine but the other two steps you still have to prove. Okay, and in this case these proofs are very easy. They won't automatically. And actually I can just write obvious. So these proofs are obvious. Okay, it is obvious. And this one will also be obvious. Okay, and so we have proved our theorem. So why are they obvious well actually we told the prover up here to use everything that it has as its at its disposal so our assumption about the parameters and also all the definitions that appear here. This is usually not a very good idea to silently expand everything but for small specification such as this one it works. And this saves me some typing otherwise I would have had to tell the prover in these in these steps here, which definitions it should expand so I can rerun the proven on the entire theorem and everything will be green. Okay, great. So our typing variant is correct. That's reassuring but we are what we are really interested in is our invariant and Okay, so I can write it like that, say in what we want to prove is that our specification implies always been very and this true. Right. And so we can try to do the same we can try to prove that our invariant is true in the initial state. And that if we start from a state where the invariant is true and we take a step. Our invariant will be preserved. And then everything should follow as before. Right. So let's check the QED step first. And that is fine. Okay, let's be bold, obvious, obvious, great. So something turned right here, which means the proof has not been able and I will try one back and after the other and I'll just kill it here because it will not be able to prove it. Okay, so this step does not go through it. The proof is unable to prove it. And on the right hand side here you see the actual formula that it's trying to prove. And you can stare at it for a while and try to find out why why this doesn't go through is it the proof that's too stupid or is there a problem. So that's not so obvious. Okay, but fortunately, okay there's one thing we can try. Maybe it's missing the type invariant right maybe the type invariant would have helped here. So let's throw in the type invariant as well. And now I have to justify why I'm, I'm allowed to use this extra assumption that the type invariant holds in the step before. But this is okay because we just proved our type correctness lemma that says well the type invariant is always true right so I'm I add this the justification of the QED step. Yep. Okay these steps still works. And now I can try this step again. But unfortunately it still doesn't work. Okay let's try to understand what the what the issue is here. And now let's go back to TLC because TLC can be a great help for understanding what the problem is. I hope I'm at the same point here right yep. So what we are trying to prove here is that whenever we start from a state in which okay what type correct and our invariant holds and we do a step and the invariant will still be true in the next state. And now we can enlist the help of of TLC improving that by saying, well this is actually something like the invariant checking that we did before we just replace our initial condition by this by this predicate that that we are starting from so this conjunction of type in an invariant. And I think I defined a predicate up here that has just this conjunction so that I don't have to. I'm going to put it here in the configuration file. So I just tell TLC will assume that we are starting from a from a state in which this invariant holds. Do we preserve our invariant in the in the next state. And of course we do that again just for a finite state model, but we are looking for a country example now and as we said TLC is great for finding counter examples. TLC doesn't give us a country something actually couldn't handle what we are trying to do. It tells us the right hand side of in is not innumerable at online 62 which is up here, so this is this definition here that says well the buffer should be a sequence of producers now unfortunately. This is an infinite set right because we say well this is just a sequence this can be of arbitrary length. So we should make this a finite set. So let me just introduce a an auxiliary definition here, and this is a bit of TLA plus Hacker e. So you have to know the expression language of TLA. So I'm saying well it's a finite sequence so it's a function of the type. So I'm going to end to the set of producers for some N. That is between zero and the cardinality of producers. Right. And then let's replace this here by project for the moment. So this problem here it can also cannot handle this subset equal here. So right in subset, because it can handle in expressions, and writing this is exactly the same as writing subset equal. So that doesn't make a difference. Okay, let's try again. TLC was able to run and it gives us a counter example. Right. Well, it says well if you start from this state here, where the buffers empty and almost all the process processes except for one consumer on the wait set. That consumer may come along and try to retrieve an element and it won't succeed so it will be added to the wait set and so your invariant will be false at the successor state. Okay, but wait a minute. I mean, if if the buffer is empty. How can it be that that all the producers are waiting right producers are waiting essentially because the buffer was full when they start when they try to deposit something. Right. So that should not happen when the when the buffer is in it is empty. Right. So let's let's rule that out by adding a clause to our invariant and say well if the buffer is actually empty. Then there should be at least one producer that is not waiting right that is not in the not in the wait set. And that works. Okay, we get a different country example. It's the symmetric one right here so we have the last producer that's coming along along in a state where with the buffers full and all the consumers are waiting okay so we need to probably need the symmetric condition as well so if the buffer is full. The buffer length equal buff capacity. Then there exists a consumer that is not waiting right consumers. Let's make it as a sea that's a little nicer to consumers. Okay, and not till till see at least tell us now you have a chance of proving that because apparently there's no country example anymore at least till see doesn't find a country example. Okay, so that's a candidate now. So let's get go back to a prover. And let's just maybe let me just revert this because maybe the prover will have a hard time with that make for the prove it will be easier to write it in the way it was before. Right. And okay the interface messed up the colors but this theorem still goes through. Let's see what happens with the second lemma. Oh, success. Okay. Okay, so I'm, I rushed you a little bit because we have so little time. I wanted to show you if you're really interested in an arbitrary configurations, then you can use the proof assistant to show correctness for an arbitrary configuration, not just a fixed finite configuration as with the model checker. I'll hand over to Marcus again, who will tell us more about what his Java program is doing and whatever else he wants to show us. Okay. Yes, you can hear me. I think the Java, the Rust program, the test of the Rust program at some point, also deadlocked when I increased it to a bigger configuration kept running forever, forever, forever, no chance. I'm not producing the deadlock with bigger configurations because the contact sample gets so long. And now if you think about this, for example in the system that dynamically scales up and down. You might run your system might run into the deadlock when there's high load, but then it doesn't it's guaranteed to never exhibit the deadlock if there's low load of the system for example, but maybe to switch gears here. Demonstrate it, use TLC to find the contact sample, find the deadlock, then verify that for a finite instance of the system, notifying all producers and all consumers at every step is a fix that also notifying other parties is a fix and then to prove the solution correct that there is no guaranteed no deadlock for any configuration. And we might declare victory and we are almost out of time. Call it a day and go home and are happy. Yeah, contrary to other tools we now know that this thing is deadlock free. But there might actually be a second kind of problem here in our system, even though the system as a whole does not deadlock and there's always forward progress. It's possible that some of the produce producers or some of the consumers do not make forward progress and that they get stuck stuck individual processes get stuck, but not the system as a whole. And TLA plus has first class language to state that the system is starvation free, but it just described we call starvation freedom so that for all producers, they eventually and repeatedly make forward progress. More formally for producer and producers, repeatedly always eventually a put action happens and the variables change. And for all consumers, they repeatedly receive an element taken element get an element out of the of the buffer. And we can even check this kind of property with DLC. And contrary to the deadlock we saw earlier, where the counter example was a prefix of a behavior a finite prefix or finite sequence of states where the last state violated our correctness property invariant. So here we no longer get a prefix sequential sequence, but an actual loop, because here's a behavior where some producer from the set of all of our producers never gets to produce add an element to the buffer. So what's called life is property. That lock is an is a safety property or is a safety property and a violation of it is a finite prefix of a behavior. This one here is a life is property and a violation of a life is property is a sequence an infinite behavior with the desired thing and we can check life is properties with DLC and there's also support to reason about the correctness of life is properties in the upcoming TLA PS release. And I think with that, since we only have 10 minutes left it's probably a good idea if we open the floor for questions. I think we covered some of the questions during the talk offline, but now folks, please feel free to ask more questions. Okay, I see a question about TLC so the languages that they use. So both of them use TLA plus Well, TLA plus is a really expressive language actually this full mathematical set theory. So no tool will be able to handle all of it. And in particular the model tricker has has a subset of the language and the proof assistance of T laps can handle in principle can handle a larger subset of TLA plus the fundamental reasons actually the two fragments are not completely comparable. So there are some limitations to what TLC can handle the some limitations to what T laps can handle. There's another tool that we didn't show here that's called a lecture which is what's called a symbolic model checker, and it has yet another subset of the language. So that comes with tools they, they don't handle the full expressiveness of the language. And then I go next and answer the question is do we write spec for implemented code or write spec and then translate the spec into code. Well, the best, the most bang for your buck you get if you write the specification before you write the code. Yeah, make sure that your designers right before you make the effort of implementing it in some programming language. In some degree you could argue that TLA plus is the ultimate agile and to be considered to prototype without writing code. But obviously, the word is full of existing code that has to be maintained, and many teams use TLA plus to study a certain kind of buck in an existing system by extracting it into a high level specification. And this we've said a couple of times throughout this talk here, because TLA plus is not code, we can abstract away in this mathematical model and abstract away all aspects of the code that don't matter. And then really only focus on the part that we care about. For example in a distributed system. It's super easy to abstract away certain kinds of failure in the TLA plus model that are not relevant for the kind of problem we're interested in right now. We're going to take the next one step on. Okay, next question, when do you use pure TLA plus and one plus car. Okay, so in this presentation we didn't talk about plus cars or plus color is a kind of algorithmic language, and that is a front end to TLA plus so there's a. A text that is more, more similar, I would say to pseudocode to imperative pseudocode. And some people prefer writing the specifications in plus cal some people prefer to write the specification until a plus. So TLA plus is all is the more general language right so you can always write your specification until a plus for certain systems like in particular concurrent programs. And this is probably easier to use plus car we didn't show it here because we didn't want to introduce a second language. Sometimes we call it a gateway truck to TLA plus. Okay, then the next question is, is testing arbitrary configurations for counter examples, the main motivation to move from model checking to proof checking. Like Stefan said, a model checker can only verify a finite instance of the problem for our particular problem here, please a consumer example for some value of n. And now we can run the model checker for multiple values of n. But in reality we can never check it for all values of n, and even for bigger one it will probably run a very long time. If one needs higher assurances than model checking, then you resort to theory improving to make sure that your algorithm works for any number of of n in this particular case. But there's obviously a cost associated to a theory improving it's a more laborious process it's a human process whereas model checking is really throwing computer at the problem and waiting for an answer. So there's really a human effort when it comes to model checking. And at the end of the day it's engineering judgment whether or not you need to go and do theory improving or for model checking that's good enough. Okay, compare TLA plus TLC to other model checkers. Well, so TLA plus is the most expressive language that I know that is supported by a model checker. So if you compare to spin, well the input language, the modeling language that spin uses a promenade is kind of a toned down C dialect. So it's a it's at a much lower level of abstraction and that has advantages and disadvantages. The main advantages that spin can be much more efficient than TLC. So if you if you look at the numbers of states per second that the model checkers process spin will be will be much faster. However, you have to keep in mind that the the steps that that you take tend to be much smaller because you're on a lower level of abstraction. So there's a trade off here. And while there's no no universal answer I mean spin is a great tool. I've been using it a lot in teaching and so on. And if it does what what what you want to do. That's, that's, that's, that's awesome right then use it. TLA plus is more broad spectrum language and you can, you can, you can write your specifications at higher levels of abstraction. And one thing that you can do in TLA plus that you cannot do in spin is comparing two specifications at two different levels of abstraction so we didn't have time to talk about that. But there's a concept of refinement, and you can write a very high level specification of your system and then a lower level specification that is closer to your implementation and show that your implementation corresponds to the abstract specification that you write that you wrote first. And then spin will just not be able to do that because it's input language cannot handle that. Okay, then I go next. I think there are two related questions here. Could you please recommend any good source for extracting specs from existing code basis. Are there any projects that attempt to automate this process, and then related heavy use tools like chat GPT to generate TLA plus specifications. If so, what's your experience. So, lifting a specification out of existing code is as of today in manual task. And tools such as chat GPT they generated TLA plus specification, but there is fundamentally in impedance mismatch here because code has all the detail that's required to efficiently execute an algorithm on silicon. The high level TLA plus specification your high level implies that we don't care about many of the ear synthesis of programming languages and programs. We care about the algorithm you know we care about this diet we think at the level of this diagram that we start this talk with. So, this any tool that could that we would build would have to know what are the details that don't matter that it can drop. And as far as I know there is no tool out there that can actually do that. If you have check GPT generate a specification, your specification will look like the code, just encoded TLA plus, and then there is no way. Okay, so I'll take the other one, you know, use cases of TLA plus other than distributed systems and concurrent algorithms. I mean, indeed, most users of most documented users of TLA plus are for those classes of systems, but for example TLA plus has been used at at Intel and also arm for example for describing processor architecture so it's it's not restricted to that it's just that it's in that community that is true. Okay, other easy ways to visualize a model checking instance and state transitions, similar to alloy. There are a few visualizations built into TLC you can visualize the state graph can also visualize the action space with no effort whatsoever. Perhaps what we also have and for that I will quickly. Oh, I'm still sharing must be in perfect. I think I have it somewhere here. We have a way to animate these counter examples in in TLA plus it requires you writing a little bit of TLA plus that lays out the this diagram here but if you look, this is our visualization generated from the TLA plus counter example that visualizes what's going on with this contact sample. And then we have several frames that show show the actual error trace graphically. For systems for bigger systems, usually it pays off from my experience at some point to come up with these visualizations to reason through through counter examples. We also have integrations with third party tools like interactive communication graph tools to make sense of the communication going on in distributed systems. Okay. When will be next I have no idea. We've been discussing to have some, some other webinars on TLA plus so if you're interested. Please let us know and let us know suggestions. So there's now this TLA foundation within the Linux Foundation that's why we are hosted here by the Linux Foundation today. And indeed, one of the purposes of the of the TLA foundation is to produce more educational material about TLA plus. And of course it will not just be the two of us the other people as well that that could make such presentations, but thanks for your feedback. Thank you so much Marcus and Stefan for your time today and thank you everyone for joining us as a reminder this recording will be on the Linux Foundation's YouTube page later today. We hope you've enjoyed this for future webinars have a wonderful day. Thank you. Thank you very much.