 So, good morning, and welcome to this talk. Today I'll present joint work with John Finley, Sven Schäver, and Lijun Sang. Before we actually come to continuous time Markov games, I want to start with a more simple model, the continuous time Markov chains. Well, you all know discrete time Markov chains, the standard notion of Markov chain. In continuous time Markov chain, we simply add a notion of time, of probabilistic time. The time when the next transition happens is not any more of a unit distance, but of course there's probability distribution, the exponential probability distribution. And the rate from one side to another indicates the parameter according to which the distribution is scaled. We have several transitions like here, in this small example, we let both experiments run in parallel and take the first transition at fives. This is the same as saying we add the two rates and let one experiment run, and when this fires we jump to one or the other depending on their ratio. So if we now consider games, the only thing that comes into play are the actions. So in each location, we offer several actions from which we can choose. And the rate matrix also takes this into account. So for every action, we might offer different rates to our successor states. The problem we consider is the time-bounded reachability problem, or time-bounded reachability probability problem. And here also our two players come to play. We divide the space of locations into the sets of minimizing locations and maximizing locations, or the location of the minimizing player and the maximizing player respectively. And so these players try to maximize or minimize the time-bounded reachability probability. This is that we have a time-bound, and we want to be in a certain state, or in a certain set of states, when the time runs out. The good news is that an optimal solution to this exists, so there exists optimal strategies for the players. And in this talk, we'll now cover the efficient approximation of these probabilities. If we now take a look at the simple model of continuous time mark of chains, well, this kind of corresponds to the transient probability. So there, of course, we have no optimization problem. We just want, well, it is just one value we can compute. And we know that the transient probabilities in continuous time mark of chains, according to the Kolmogorov equations, are also to be more precise to the Kolmogorov backward equations. So we know the probability values at one certain point in time, that is, if the time has run out, if we have no time left, of course, we either are in the go region or we are not. So there, the probabilities are fixed. And then with the Kolmogorov backward equations, we can develop these probabilities backwards in time until we have our time-bound. So if you have not understood what the model is all about, you can also just look at these equations. This is all we are going to approximate, of course, the control version of this. For a fixed schedule, we can easily extend these equations to continuous time mark of games. The only thing that changes is that we ask the schedule which action he wants to choose. And he, of course, may choose a different action for every location and for every point in time. The control problem, I was talking about, the maximal reachability probability problem now looks like this. Well, it's not surprising that the point-wise optimization of the optimal choice of action leads to the optimal reachability probabilities. This is also a special case of the Bellman equations, if you know them. A small example. Here, let's consider this small continuous time mark of game, which has actually only one place. We could call it also a continuous time mark of decision process. If time has run out, so if our time-bound of one has passed, we know that we either are in the winning region or not. And if we consider here, as here in this plot, the reachability probabilities for location A, then, of course, there's a probability of zero to reach it. And from this point on backwards, we can develop these probabilities. Here, I've done it for three schedulers to either always choose action B, to either always choose action A, or the optimal strategy would be to switch at this point in time, and then, well, in this region, I think, to take action B, and when we have more time left, take action A. A brief overview of the results of our paper is that, so we developed an approximation algorithm that hasn't improved worst case complexity. In the maximal error we want to allow for, previous methods had a worst case complexity of t over pi. So that is quite bad if we want to have a really precise result. And we improved it to the kth root of t over pi, which, well, of course, we cannot use just any k. There's no free lunch, but I'll come to that later. Let's for now just say we can take small constants, like 3, 4, 5, maybe 10. Let's look at different approximation methods. For CTMCs, there's this well-known approximation technique of uniformization, which works amazingly fast. However, it does not extend to control problems. So it's difficult to extend this. I know that there is some relative work on this. We will later compare to that. But it is not applicable in the direct sense. Kut and similar numerical methods could be, in principle, applied. However, they assume, or for their analysis of precision, relies on the fact that all the derivatives exist, at least to the fourth derivative. And here we see that the second derivative of our reachability probabilities are not continuous. Therefore, the result, the Ruh and Kutter algorithms could give us, is not essentially better than the Euler method. Speaking of the Euler method, we will take a closer look at the Euler method now. The Euler method, well, you all know it, is to partition the total available time into small intervals of length epsilon, and we, well, assume that we know the reachability probabilities for the right fringe of the interval and want to compute the reachability probabilities of the left fringe of the interval and then iterate over the state space. In our case, this means that we replace the latter part of the equation just by the constants, that is, by the optimal reachability probabilities of the right fringe. So we assume this to be constant throughout this interval, and then we can, well, then we just have the linear combination over a constant. We have, we maximize over different constants. That's easy. We integrate, and we have our reachability probabilities for the left fringe. Assuming a constant estimate introduces a linear error. That is, an error of order epsilon in our, in the derivative of the reachability probability here. If we integrate over that, that becomes a square error, which is rather small, but it forces, or it leads to the fact that we have to choose, so if we want to achieve a global error of pi, we have to choose t over pi in many intervals, so we have to iterate t over pi many times over our state space. If you now imagine we want a precision of, say, 10 to the power of minus 9, that is quite bad. We have to iterate a lot of times. So we might want something, something, something better here. Now comes the clue. And we, so assuming, or assuming this to be constant, or replacing the reachability probabilities and the equation by constant is, one could argue, essentially the same as assuming, or at least one assumption that is behind this, that it works, is that at most one step may happen during the interval. Just one jump in our process may happen within the interval. And how can we justify that? For that, we take a look at the Poisson distributions. The Poisson distributions are the probability distribution that indicate how probably it is to have 0, 1, 2, and n steps within a certain time interval. That is, if we execute repeated probability experiments with an exponential distribution, we get the Poisson probabilities. Of course, we need a uniform rate for these experiments. But here, we can just take the maximal rate that is in our system to have an upper estimation. OK, of course, for small time intervals, like in a small epsilon interval, it is very probable that no step happens. But for this case, we don't have to optimize it all as our actions don't have any effect. Our choices don't have any effect. So we only have to care for the last three cases here. We see that these probabilities decrease really, really fast, even if we have a moderate epsilon. And in fact, for a small epsilon, the Poisson probabilities to take more than k steps are this. It's the lambda times epsilon to the power of k plus 1. The divided by k plus 1 to factorial. So the denominator decreases really fast. And the denominator increases really fast. This really converges amazingly fast to zero. And that was our motivation. So this, first of all, justifies that we just consider that once it happens and just make epsilon small enough, then we know that we only omit a really small proportion of the probability mass. But we thought that it might be an advantage to consider just more than one step in one interval. Well, for example, one could allow two steps per interval. And let's consider what happens if we allow for two steps to happen in an interval. We have these equations again. We want to approximate for the case that two or less steps may happen. That is, if once it happens, we end up in one of the successor states. And so we need an estimate for the successor states that considers the case that another step may happen. But we have computed this already. That was the Euler approximation. So we just make one pass of the state space, compute the Euler approximations. And then we make a second pass, like here, and replace the reachability probabilities by the linear estimations we got from the Euler approximation. What happens in this case? We have then a linear combination over linear functions and maximize over these. So we need the point-wise maximum over a set of linear functions, which is easy to compute. And we can also easily integrate over this function. And this results in an improved error bound. Well, mainly because of the pass-on probabilities, as I showed you before, per interval, the error bound improves from an epsilon-square error to an epsilon-cube error. So this allows us to take far less intervals. That is, the square root of t over pi instead of t over pi. Now, of course, the natural question is, can't we allow more? And indeed, we can. And this leads then to the improvement to the k-th root of t over pi, many intervals which we have to consider. However, the cost of the calculations per interval increases dramatically once we pass a certain threshold. But we'll come to that later. First of all, I want to show you the actual impact this has on a real example or a small example. Here we have one of the standard examples for city NDPs in the literature. We compare our tool, which allows for up to two steps happen within an interval to the state of the R tool, Markov reward model checker. This actually implements some rather, well, some really nice, heuristical speedups. And it is already much faster than the standard Euler approximation. We see that even for allowing up to two steps per interval, we improve over this. However, not so dramatically, one would argue. If we allow for three steps, the picture looks quite different. For example, for a precision of 10 to the minus 7, and we see that speedup is already in the order of 100. For 10 to the minus 8, the picture looks even better for our tool. But I'm not sure whether this is not an artifact or something. So what actually happens if we allow for up to three steps per interval, we have to make another pass over the state space and replace the reachability probabilities by this quadratic estimate, which we computed before. This means that we have to find intersections between small polynomials of degree 2. But this still works fine. This also works fine for polynomials of degree 3. So if we now want to make another pass, another level, this would work as well fine. We could still find the solution analytically. However, if we go to higher degrees, we would have to employ approximation methods for finding the polynomial degrees, for finding the intersections between the polynomials, which would then probably cost a significant, so would lead not to an improved performance, I think. Additionally, it is the case that the less intervals we have, the more intersections per interval we will find. So this will also decrease the performance. And it's even the case that the number of intersections per interval could increase, in theory, dramatically. In practical examples, this has to be found rather irrelevant. However, one could probably construct some of these examples where this is dramatic. So we argue that a small constant like 2, 3, 4, 5, or so should do. OK, that's it. To summarize, we propose a parametric approximation algorithm that improves the worst case complexity. And we have shown that it, on reasonable examples, is practical and can be implemented. So thanks.