 Okay, so what I will be talking about is a popular framework for dealing with uncertainty in optimization called online framework and in particular, I want to focus on a central problem in this area called the case-server problem and to present to you a new algorithm for this problem and this algorithm is joined work with Nikhil Banzal, Nifbukhbinda and Sefinawah. Okay, so the point of start of this talk is the classical view of optimization, okay? So according to this classical view, the way the optimization happens is that, you know, if you have some problems you want to solve, then we are presented with an instance, well, an input instance of this problem. We put this instance in our computer, we run some algorithm on that and, well, this produces some answer. It might be an optimal answer or maybe only approximately optimal answer, okay? So, you know, this is the way we think about optimization nowadays and sort of this point of view was, you know, it's extremely influential in literature science and beyond it, but, well, the point I want to make today is that, you know, despite all of its success and, you know, and its influence, there is one aspect of, you know, quite a fraction of real-world problems that this framework, you know, completely misses, that it's not present in this view and, you know, this aspect is the fact that, you know, uncertainty is a part of life and then as a result is a part of many optimization problems. So to give you an example, you know, imagine that you are a, you know, trader at Wall Street and, you know, and if you, well, and so, you know, so your task is to sell and buy stocks, to maximize profit and, you know, and if you look at the stock quotes from last month, it's very easy for you to see, you know, well, to see, you know, how should you, well, which stocks you should sell and which one you should buy to maximize your profit? But that's not the core of the problem because the core of the problem is that you have to do it for the next month. So you have to decide what to buy, what to sell, you know, before you see the full impact of, you know, of your decisions. So in particular, you know, the common, the common characteristic of the problem we try to capture here is that, you know, we don't know the whole input in advance, we only learn it gradually, but still, you know, as we learn this input, we already have to make some irrevocable decisions, well, along the way. So, you know, we have to make decisions even though sometimes we cannot really evaluate what will be the impact on the performance of the final solution. Okay? So this is clearly an aspect of optimization that, well, classical model, classical view does not capture. So, you know, the question is, you know, how can we capture this kind of challenges? How can we deal with that? Okay? So as you might imagine, you know, this is not, this is not a new question and there are already some well-known answers. In particular, there are two prevailing frameworks to kind of settle this kind of questions. So the first one is called, you know, stochastic optimization and what this framework tells us is, you know, the following that, you know, okay, well, you don't know what your input will be, really, but you at least know that it will be coming from some, you know, underlying distribution. And furthermore, you actually know what the distribution will be. So, you know, all you have to do is just to devise a procedure that, you know, will leverage this knowledge of this distribution to make sure that you will try to well optimize your expected cost with respect to the distribution. Okay? So this is, this is the, well, so this is the framework and, you know, and the nice thing about this framework is that, you know, quite often it allows you to get a positive result that, you know, but give you a performance guarantee that it's very close to what you will be able to do if you, there was no uncertainty involved at all. Okay? So that's a nice thing, but the problem is that, you know, well, it's not really clear how meaningful this kind of positive results are because if you have your problem, you know, how do you know that, you know, the input you are seeing comes from some fixed distribution? And even if this distribution exists, you know, how can you access it? You know, how, well, how can you tell us, you know, that, you know, that you will know what it exactly is? So this is the, well, this is the problem and, you know, and this problem brings us to the second framework for dealing with entertaining optimization called online optimization. So what this framework does, well, it takes exactly the opposite stance. So what it tells you is that, you know, you make absolutely no assumptions on, you know, how your input will look like. Yes, it can be even adversarial and, you know, really it can be as bad as possible. And your task is, you know, to be able to still optimize the worst case cause of the solution. So there is no notion of expectation because you don't know what, what is the distribution. So you really try to, you know, optimize the worst case here. So, you know, so clearly, you know, the problem here is that, you know, once you are in such a restrictive setting, then, you know, more often than not, than not, you will end up having not a positive result, but negative results and pretty strong at that. But, you know, on the flip side of this is, you know, if you are able to get some positive result, then this is an extremely meaningful statement. Yes, because it just holds without any assumptions and you don't have to worry about, you know, how it's applicable to your particular problem. So as you might guess from the title of the talk, you know, the framework I want to focus on today is online optimization framework. And in particular, let's start by discussing one of the classical problems in the area, namely the caching problem. Okay? So the caching problem tries to, you know, tries to capture the difficulty of maintaining a cache in a situation where the data access, well, data access patterns are unknown to us. So, you know, probably all of you know, but caching is a very popular technique for speeding up memory access in a situation when the memory has very large latency. So what you are trying to do, you know, whenever you fetch some data from the memory, you try to store it locally in your cache. So whenever you need it again, you can quickly access it. But of course, the catch here is that cache has only limited size and you cannot just afford to simply store everything you ever, you ever fetch. So you have just to, well, to make sure that you keep the most important stuff. But now, if you don't know what the data access memory pattern is, well, it's hard for you to assess what is the important, important stuff really. So there is some difficulty coming in and, you know, this problem tries to capture exactly that. So formally, the way we do it is as follows. So the setup here is that, you know, we have our, you know, we have a universe of N pages. So this is the, you know, we have, we can at most see N different pages throughout the execution of the algorithm. And we have the cache, which can hold, you know, at any given time at most K pages. And to make this interesting, we assume that K is strictly smaller than N. Okay? So this is the setup and now we are playing the following game. So in each round, we get the page request coming in. Okay? And now two things can happen. So either we have already, well, we already have this page in the cache and then we are happy because we can fetch it quickly and we don't have to do anything. Or, well, we are slightly more unlucky and the page that is requested actually is not in our cache. And well, then we have a cache miss and what we have to do, well, first of all, we have to fetch this page, you know, from the memory. And also now we want to, you know, to put it in our cache and, you know, if we have a room in this cache to put this page and that's all we have to do. But sometimes, you know, the cache might be full already. So we have to decide which page to evict to make room for this new page. Okay? So this is the game we are playing and this, you know, this keeps repeating. And, you know, know that the only decision that you are making in this, you know, in this model is just, you know, which page to evict whenever we have to evict the page. Okay? So everything else is already predefined by the input. So now our goal is, you know, to devise an addiction policy that will lead to the minimal, you know, total number of cache misses over the sequence that we are presented with. Okay? So this is the problem. And, you know, now once we know the problem, you know, you might actually wonder, you know, what do we actually mean by a good solution to it? Okay? So what are the good algorithms we are looking for here? So, you know, clearly, you know, if we knew all the future requests in advance, then it's very easy to get an optimal solution. It's like, it's a very simple, there is a very simple policy that allows you to do that. But of course, you know, the point here is, you know, well, like the point here is that it's not, that's not what we are after here. So we are after here, we are trying to do well despite the fact that we have no idea what the future will be. So this is the online setting. And in particular, once you have no idea what is the future, then there is no hope really for always be able to match the offline optimum. So offline optimum is the best solution when you know the whole sequence of requests in advance. So, you know, so we cannot hope to match the offline optimum. So what we are after here is to, well, what we are after here is, you know, to design an online algorithm that tries to never do much worse than offline optimum. OK, so we don't hope to be always able to be optimal, but at least we want to be always sure that we are not far from the optimal solution. And, you know, to make it precise, you know, we introduce a notion of competitiveness and we say that the algorithm is, you know, C-competitive for some value C, if on every request sequence, the number of cache misses that we suffer is never bigger than C times the optimal number of cache misses that we could have here. OK, and, you know, what we will be after, we will be trying to find an algorithm who have, you know, minimal competitiveness. OK, so now, you know, once we know, you know, OK, I should also mention that, you know, competitiveness is only one measure of performance of the algorithm. There are also other measures that I will not talk about, you know, in particular, there is so-called, you know, minimal regret measurement, but I will not go into that in this talk. So, OK, so now we know what we do, what we mean by, you know, by a good algorithm. So, you know, so what is known about, so how can you get some, you know, some nice algorithm for this problem? So, you know, if you think about good algorithm for this problem, then probably one of the first reasonable approaches that you will find is so-called LRU policy in which you are, you know, evicting the page that is, you know, that was least recently used. OK, and indeed, Citer and Tarjanin, in their seminar paper, they analyzed this policy and they prove that it is K-competitive. OK, so the thing is that this is a K-competitive is an interesting statement, because, you know, note that the competitiveness does not depend on the length of the sequence, OK? So the sequence can be very long, but this, you know, this sort of this competitiveness ratio depends only on the size of the cache, so that's very interesting. On the other hand, you know, if your cache has like 10,000 lines, then it's not, you know, it's not the most satisfying performance ever, so, you know, so the obvious question is, you know, can we do better than that? OK, and Citer and Tarjanin thought about this question, and they actually answered that, you know, you cannot beat this, you know, this factor of K at least, you know, as long as your algorithm is deterministic, OK? Sort of the, you know, the key reason why this is the case is that actually, deterministic algorithms are completely predictable. So, you know, if you give me a, well, if you fix an algorithm and you, you know, and you look at any particular input sequence, then, you know, you are able to predict exactly what will be the state of the cache of the algorithm. At every given time step, OK, so everything is uniquely defined, so as a result, it is not hard to design a sequence that causes this algorithm to have a cache miss every single run. And then when you do, you know, appropriate math, you will notice that, you know, you can make the sequence that, you know, it will, it will cause the algorithm to have a cache miss every single time, but the optimal solution will only have a cache miss at most every, every Kth run, so this will lead to this K lower bound, OK? So this is, you know, so this is, this is a result, but of course the natural question now is, you know, so how about randomness, can randomness help you? And well, it turns out that it indeed can, so, you know, so what you would like here, what we are considering here is, you know, we are thinking about eviction policies that can use, well, can make random choices, yes? So, so what we do, well, our algorithm can, you know, can, can toss random coins, and what we are measuring for a given sequence is the expected number of cache misses, OK? And once we look at this model and we give more power to our algorithm, then we could, then we could do much better, actually, actually exponentially better. In fact, you know, there is a very, you know, very simple algorithm that allows you to obtain O of log K competitiveness, and you can prove that this O of log K is actually the best possible here. So clearly for, you know, for caching problem, we essentially know everything, so we know the best competitiveness for, you know, for a randomized model and for the termistic model. So probably this is time to move to a slightly more challenging problem, namely the case-server problem, which will be the focus of the reminder of the stock, OK? So, yeah, so probably, you know, the easiest way to think about what, you know, what the case-server problem is, is to view it as a, you know, as a task of, you know, that fire track dispatcher is facing, OK? So the setup here is that we have n locations over some metric, so, you know, there might be different distances between any two points. OK? And we have case-servers of, or fire tracks here, OK? So, and we want the number of servers to be, well, strictly smaller than the number of locations to make it interesting again. And now, you know, we are playing the following game. So in each round, we have a request or a fire appearing at one of the locations. And once again, one of two things can happen. So, you know, either, you know, well, we have a server or a fire track, you know, at this location already, so, you know, so we are happy with Modulo putting out the fire. Or actually, you know, we don't have a, we don't have a fire at this location. And then what has to happen is, well, we have to decide, you know, which of the servers or fire tracks to move to this location, well, to take care of this, of the fire. OK? So here is the fire, and we move this track, and there is another fire, and we move this track. So, you know, so, so this is the game. And now, you know, our task is to devise a way of, you know, moving the servers around that will, you know, lead to the minimum total distance traveled by the, by this fire track. You know, while serving the, while serving the request. And of course, we want to do all of this in online settings. So in a setting when we don't know where the future fire will be. OK? So this is the problem. And, you know, and now a good question is, you know, why do we even care about this problem? OK, so what is the motivation for starting this kind of problem? And, well, there are a couple of reasons for that. So, one reason is that actually, you know, this problem captures, well, even though it's very artificial and sort of, and sort of very, very simple. Or maybe, well, maybe because of that, it captures quite a variety of online scenarios. So you can imagine, you know, fire, well, the service to be like technicians, service trucks, well, copies of frequently changing data, et cetera, et cetera. And, you know, and each of these institutions will, you know, will yield some interesting online scenario, and they all will be captured by this case-server problem. And in particular, you can view caching as a special case of the case-server problem. So essentially, caching is the case-server problem on your uniform metric. So to see why this is the case, you know, you just, you know, you just identify locations with pages, and, you know, and you interpret the server location, having server location I, well, you interpret it as you have page I in your cache. And then it's easy to see that whenever you have a request, you know, it corresponds to, well, you can see it as a fire or as a request to a page, and, you know, this, you know, this operation of moving the servers corresponds exactly to, you know, eviction and fetching, fetching procedure. And since the metric is uniform, you know, every such a move counts the same. So this will just count the number of cache misses while you are serving the request. Okay. So, so this is one type of reasons why we care about this problem. But also, you know, the second type of reason that, that makes this problem interesting, at least to me as a theoretician, is that, you know, despite the very simple formulation, this problem has extremely rich structure. So in particular, you know, the more you look into this problem, you are allowed to, you know, it seems to capture something about optimization, that, you know, we don't really understand, but, you know, we should understand. So that's another reason why we really want to understand what's going on in this problem. Okay. So now, you know, so now once we know what is the problem and why we might care about it, you know, let's talk about, you know, what is known about this problem. So clearly, you know, as caching is a special case of the case error problem, then all the lower bounds, you know, transfer over. So we know that we cannot hope for a better than K-competitive, you know, deterministic algorithm and better than overflow K-competitive randomized algorithm. Okay. And the interesting thing is that, you know, despite the fact that the case error problem seems to be much, much, much more general than that, we don't know any single better lower bound. So all the lower bounds for this problem come directly from the lower bounds for the caching problem. Okay. So this is, you know, these people were quite surprised by that and, you know, they couldn't believe it first. But then, you know, well, this set of the affairs made them, you know, put quite a bold conjecture. So, okay, so if we cannot prove better lower bounds than we have for caching, so well, maybe, you know, maybe, you know, maybe this is actually the right bounds here. So maybe, you know, we can, you know, these bounds are actually tight and, you know, and this question has, you know, two versions. So one is deterministic case error conjecture, which asks about exactly K-competitive deterministic algorithm and the other is, you know, the randomized case error conjecture that asks about, you know, overflow K-competitive algorithm. So these are the big questions that we are after here, too. And, you know, so let's talk about, you know, what is known about both of these conjectures. So, you know, so let me give me just a very brief history of what's going on because, as you might imagine, there is actually a multitude of work on this problem and I don't have time to really, you know, present in any meaningful manner. So, you know, in terms of deterministic case error problem, then the conjecture, well, that you can hope for, you know, K-competitiveness was already put forward in the paper by Manasseh et al. in 88. But they didn't provide any general case error algorithm. They only had some indications to believe that this is the right answer. And, you know, it took a while before, you know, the first algorithm whose competitiveness depends only on K emerged. And as you can see, you know, this competitiveness is quite large. It's actually exponential in K. So it seems to be very far off from, you know, what we hope is right, is correct. But shortly thereafter, there was a breakthrough paper by Kustupets and Papadimitriou, who showed that actually you can get an deterministic algorithm whose competitiveness is 2K-1 only. So, clearly, you know, here we are done up to a factor of 2, and in fact, you know, for some special cases of the problem, for instance, when the metric comes from a tree metric, we know that K is actually the right answer. So, you know, we are, the state of affairs is pretty good here already. So, how about the, you know, the randomized case error problem? So, you know, in case of the randomized algorithm, there was really a lot of work on various special variants of this problem. In particular now, quite recently, we learned that you can get an O of log K competitive algorithm for an extension of the caching problem called weight-weighting paging problem. And this is the result of BANZAL at all. But, you know, if you ask about the, you know, the BANZ for general metric, then the best-known result is just, you know, the deterministic algorithm of Kustupets and Papadimitriou. So, essentially, you know, despite, well, it's intuitively obvious that we know how that randomization should help here, we have no idea how to take advantage of it. We don't understand how to use it here. And in fact, you know, the state of the affairs is even more embarrassing than that, because we, we, we essentially, we don't know any O, little O of K guarantee for, you know, for even, well, such simple metrics like a two-level tree or a line. So, really, you know, despite a lot of effort, the state of the affairs is not very promising here. So, you know, once you look at this, well, at this, you know, you start to wonder, you know, maybe, you know, this bold conjecture that you are able to get, you know, O of log K competitiveness is just, you know, it's just too far-reaching and it's really too optimistic. So maybe, you know, maybe actually K is the bound here. So, you know, so why, why are we optimistic about, you know, getting O of log K? Well, there is no good reason, but we are. And, you know, what I, and the result that I want to present to you today gives you actually some hope that maybe, you know, this poly-log competitiveness is not that much far, well, that far-fetched after all. So what we do, well, we present an algorithm for the general K server problem, the randomized algorithm for the general K server problem that is, you know, poly-log competitiveness, well, poly-log competitive, but, of course, the catch here is that this poly-log depends on K and N, so the number of points. So as long as the number of points is, you know, sub-exponential in K, then we are improving over this 2K minus 1 bound, but, you know, in general, these two points are incomparable, okay? So there is this dependence on R. So, you know, this is the result, so let me just, you know, proceed to explaining to you while giving you some idea how this kind of result can be obtained. So, well, clearly, you know, when you look at the, you know, state of the affairs, well, the history of the problem and what people struggled with, then it's pretty clear to everyone that sort of the hard part here seems to be dealing with an arbitrary, you know, arbitrary metric, okay? Because, well, if metric is uniform, we know what to do. We know exactly what to do. So, sort of the guiding principle of what you will be doing here is, well, we will try to trade the complexity of the underlying metric, well, we will, for the complexity of the problem we are solving. So what do I mean by that? Well, what we will try to do, we will try to reduce the K server problem over an arbitrary metric to a different, more difficult problem, but over a very simple metric, namely the uniform one, okay? So how can we do this kind of reduction? So how, well, how can we go about that? Well, it, you know, well, this reduction comes essentially in two big steps. So the first step, you know, has something to do with so-called HST metric. So what is an alpha HST metric is? Well, it is a metric that is induced on the list of the following, well, type of a tree. So this tree is a level, it's a level three, and, you know, and all the edges on the first level have, you know, have length one. All the edges on the second level have length alpha. All the edges on the third level have length alpha squared. And we have this kind of, you know, and we have this kind of exponential growth as a rate of alpha, you know, for each of this level. So, you know, so any metric that is induced, well, on the list of such a tree will be an alpha HST metric. And, you know, the reason why we, you know, bring up this quite, you know, bizarrely looking metric is that actually there is a well-known beautiful theorem that tells us that, you know, that actually, as long as we are okay with losing, you know, a factor of, you know, alpha times log n, then, well, any metric on n points can be embedded, can be embedded into this kind of metric. So, essentially, if you want to, you know, if you want to, well, get an algorithm for the case in a problem for general metrics and you are willing to pay this, this, this, well, alpha log n factor in our competitiveness, then we can just constrain ourselves from now on just to alpha HST metrics, okay? So, because whenever we get an algorithm for this kind of metrics, it will immediately give us an algorithm for general metric here, okay? So, that's the first step, and so, from now on, we just look at, you know, at alpha HST metrics. So, now, you know, to understand the second step, well, we have to just take a look at, you know, how a solution to a case server looks like on an alpha HST, okay? So, clearly, by definition of the metric, you know, you know, we have, you know, that both requests and servers reside in the leaves of this tree. And, you know, if you observe a case server solution, then, you know, like the, and if you look just at the trees, then, you know, the, well, when you, the process that we'll imagine there, like, sort of the way the servers are moving around here might be very difficult and not really, well, easy to grasp. So, what we will try to do instead to sort of understand what's going on a bit better, well, we will try, you know, we will try to, you know, view this, you know, complicated process on leaves as a process that is generated by relatively simple processes or, well, on happening for each of the internal nodes, okay? So, we will just try to break down this complicated process into simple processes that are happening for every internal node. So, you know, so this might be a bit mysterious, what do I mean by that? So, let's, well, let's try it slowly here. So, in particular, you know, so let's focus our attention on one particular internal node. Let's say the root, okay? And let's look at, you know, what does this root see when he just, you know, constraints himself to his local neighborhood? So, you know, in his local neighborhood, he does not see anything except, you know, its children. So, it particularly doesn't see the leaves. And, you know, what's going on there from his point of view? So, well, he doesn't see where the requests arrive, to which leaves the request arrive. He just sees to which, you know, subtree corresponding to his children, well, this request shows up. So, if there is a request, well, somewhere in the, you know, in the subtree corresponding to the child, he only knows that, you know, there is a request corresponding to this child. And that's all that he sees. And similarly, you know, it doesn't see, you know, where the server exactly are located. All it sees is just, you know, how many servers are in each of the respective subtree. So, from his local point of view, he only sees this, you know, configurations when, you know, just assign number of servers to each of his child's subtree. And, well, in particular, you know, whenever there is a movement of servers around, well, he doesn't see anything of this movement, as long as, you know, this movement does not, you know, move servers, well, between different, different subtree. This is the local view of the route. And now we wonder, you know, what does this route see when he observes an optimal solution on the K server? Well, what it will see is that, you know, there will be some dynamic emerging from, you know, from these configurations he's seeing. Namely, there will be like sort of two conflicting processes happening. So, on one hand, you know, what will be happening is that the trees that tend to get a lot of requests will also tend to get more servers there, because it will make, you know, taking care of this request easier. But on the other hand, of course, you know, the number of servers is fixed, so it kind of doesn't, you know, give a lot of servers to every tree and also moving servers around is expensive. So there will be some balancing act that will try to, you know, to just decide which are the trees that need most, you know, most of the resources. So we see that, you know, we will have these two conflicting goals and now when we want to actually recover a good, you know, a good solution to the K server problem, we will try to, you know, set up an algorithm that will try to guess what would be the best assignment of servers throughout the time. So we will try to, you know, to capture this kind of, well, algorithmic task as an algorithm, as an algorithmic problem. And when you try to do that, then, you know, the problem you are likely to, the problem you are likely to end up with is so-called the allocation problem, which is the third and last of the problems I want to introduce today. Okay? And by the way, I should mention that, you know, that, you know, like this general approach was pioneered here by, you know, Kota and Myersen and Poplowski, and in particular, they were the first to introduce this allocation problem in the context of the K server problem. Okay? So what is the allocation problem? So in the allocation problem, well, you can think that this is a job of some manager that oversees, you know, while taking care of some projects, you know, well, by some team of, you know, workers in a couple of offices. So the setup here is that we have a delocations over a uniform metric now. So the metric is uniform again. And now, you know, well, we have K servers or workers. And now it will actually make sense to have more than one worker at a given location. Okay? And now, you know, the game it displays is as follows. So in each round, we get a request. So the request is just, you know, what it consists of. Well, it consists of a location to which it corresponds to. And also, you know, together with this location, we are presented with K plus one values that are non-increasing. So there is no ST0 up to STK. And the meaning of these numbers is, you know, STJ is just the cost of completing the project that's happening at this, you know, requested location if we have exactly J workers there. Okay? So essentially, you know, what's going on in each round. So, you know, the manager, whenever he sees the, you know, the request and the corresponding service costs, he has to decide, you know, whether I want to move some workers around and, you know, any moving of workers will cause a move cost. And then once he says, okay, I'm done now, then, you know, we just look at the location that was, you know, that was requested and we pay the service cost given by the respective value of STJ. Okay? So here, you know, maybe seeing this request at first location, he decides to, well, to move, you know, to move two workers there. And now he says, I'm done. And he pays the cost of ST3 as a service cost here. Okay? And also he paid the move cost of moving two servers. So this is the game that we are playing once again in online setting. And our goal now is to play this game in such a way that we will minimize the sum of the move cost at the service cost here. Okay? So this is the problem. And note, you know, that I mentioned that, well, that, well, at some point I mentioned that, you know, the problem we'll end up working with will be, you know, will be a generalization of case error problem. And indeed it is. So it's easy to see that if we just consider a special case of this problem in which, you know, the request or of the form will pay infinity service cost. If there is no server at the location and zero service cost will never there's at least one server at the location, then what we will recover here is exactly, you know, caching problem, which is the case error on the uniform metric. But of course, you know, we see that allocation problem is much more than that. And sort of, that will be the price we will repay. And just one, just one thing to note. So one thing that I ignored here that actually, you know, to make this problem useful to us, I'm sorry, namely we also, you know, well, in addition to all of, you know, the service costs and the request, we also should allow the number of available workers to change over time. So like in each round, we also are told, okay, this round, we have only this many workers, you know, available to you. So if you have more than that, you have to return some of them back to the headquarters. Or, you know, if you have less than that, then you will get some free workers here. Okay? But I will ignore this. I will ignore this part in this talk. So this is the problem we are, you know, we are interested in. And, you know, the question is, you know, why do we care about this problem? Well, and the reason why we care about this is the problem because like Kota, at all, showed that if you are able to get a good enough algorithm for this allocation problem, then it allows you to get, you know, well, what we are looking for. I mean a poly-algorithmic competitive algorithm for the case error problem over general metric. So note here that this poly-log depends on K and N as we expect. But also it depends on delta, which is the aspect ratio of the underlying metric. So, well, this is the theorem. And essentially, you know, what it means is that, you know, from now on, all we really need to care about is just, you know, finding the good enough algorithm for the allocation problem. So note that now we have a problem over uniform metric. But, well, it seems to be slightly more difficult than the original problem we were dealing with. And so this is the theorem that, you know, Kota had all proved. And also they sort of, they sort of made the first step. And they showed that, you know, there indeed exists this good enough algorithm, at least if, you know, if your problem corresponds to exactly two locations. So D is equal to two. And this, you know, doesn't really give us anything in general case. It only gives us some, well, algorithm for very special type of metrics. But this is the first step. So, well, all we have to do is just, you know, make the next step and just get, you know, the good enough algorithm for, you know, for the general case. So that's, you know, that's what we set out to do. And that's exactly what we failed. Because, you know, after quite a lot of effort, we realize that we don't know how to do that. Like, it seems to us that the problem, well, in the, well, in the course of this reduction, the problem that we ended up with just becomes too hard. We just don't know how to get this good enough guarantee for this, you know, for the problem that I just stated to you. So that's, that's not good, yes? Because, well, essentially it seems that this is a dead end. So because we reduce a problem to another problem that we don't know how to solve, so that means, you know, that all the effort is wasted here. But, well, maybe it's not that bad. So let's try to recover. And so, so let's try to, you know, sort of to, yeah, to prolong the agony here. And let's try to, you know, okay, we don't know how to solve, you know, the actual problem. So let's just try to look at some fractional relaxation of it, okay, and see what we can do there. So what kind of fractional relaxation can we consider here? Well, it's probably, you know, the simplest possible fractional relaxation you could imagine. So what we will be doing, so, well, instead of keeping track of the actual integral configuration of the, of the servers, what we will do, we'll just keep track of, you know, of some marginals. So we'll just keep track of marginal probabilities xij for any i and j, well, which is the probability that we have exactly j servers at location i, okay? So what kind of constraints you want to enforce here? Well, one type of constraints we'll just say that for any location i, this is indeed a probability distribution. So this xij is for fx i sum up to 1. And the second constraint we'll just say that, you know, if we sample independently the number of servers from each of these configurations, then the expected number of servers that we will get with this procedure will be exactly what it should be, namely k, okay? And once, now, once we are dealing with a more general configuration, we have to extend the notion of service cost and move cost, and we do it in an absolutely, well, natural manner. So we just look at the expected service cost, and we look at some, well, earth mover distance between configurations as they change. So now we have this fractional relaxation, you know, and the good news here is that, whoa, actually for this relaxation, we know how to get this good enough algorithm. So, so far, so good. But, well, the problem is that this fractional relaxation actually, well, we can't really use it to get the good solution for the integral case because the problem is that this fractional relaxation actually has very bad integrality gap. So it has an integrality gap of omega of k, so, you know, there is no hope that we will be able to get a running procedure that will always, well, give us a good enough solution for the integral case. So, well, we tried to do something, and we got stuck again. So what to do now? So do we abandon this whole approach? Well, not really. So now, you know, the key observation that we have to make is that actually, you know, why do we even care about this allocation problem? It's like, we don't really want to solve this allocation problem. This is just some kind of proxy for what we really want to solve, namely the case-server problem. So we want to get an integral solution for the case-server problem, not necessarily the integral solution for the fractional problem, for the allocation problem. So, once we realize that, then what we do, and this is the outline of our algorithm, what we do, well, we start from the solution for the fractional allocation problem, and then we, you know, extended the reduction of quota at all on the HST. So we get a solution that is fractional for case-server problem. But now, we show that once you are, no, once you are dealing with a case-server problem, like fractional case-server problem on HST, actually there is a simple rounding procedure that will give you, you know, the integral solution to the case-server problem on the HST, and will just incur some, you know, some acceptable overhead. So, well, so this is the way the algorithm works, and you know, and it seems to work, and then we did the integrity gap go. Well, essentially what happened here is that, you know, once we were able to defer the rounding step to the moment when we deal with case-server problem and not the general allocation problem, then, you know, then for the, for the configuration corresponding to the case-server problem, actually this integrity gap, well, it's not present. It only is present for some configurations for the allocation problem. So, in this way, you know, by being able to defer this step, we just, you know, got around this integrity gap, and we could get what we wanted. And let me just know that one thing that I omitted here is that, you know, as probably you noted, you know, the reduction of COTA at all, it introduced also a dependency on the aspect ratio of the metric, and actually we had a way of getting rid of that. So we were able to not, well, to get rid of this dependence, but I will not talk about this here. So this is the way the algorithm works, and, you know, and the remaining time I just want to spend on, you know, on giving you some glimpse of how this fractional allocation problem actually works like. Yes, because maybe it wasn't clear from the way I was explaining things, but actually this is the heart of our algorithm. Yes, like this is actually where the most integral things, you know, are happening. So let me try to give you a glimpse of that here. So, well, I will do it, you know, I will give you these glimpse, well, just for a special case of the problem. So in the special case, you know, well, I will just assume that all the service cost vectors are of the form that you either have to pay while some cost CT for not having a server at the location, and, you know, once we have at least one server at the location, we just pay zero. So this special case corresponds to caching with renting, where we have a caching problem in which, you know, whenever we have a request and we don't have a server there, we have an option of paying CT for just taking care of this particular request and not have to move the server. So not having to move the server. So this is the special case we will focus on here, but really, once we are in the special case, then really it doesn't make sense to put more than one server at any given location. So we can assume that, you know, that all xijs for, you know, for j bigger than one are just zero. And essentially, you know, once we are in the special case, you know, we can, you know, the complete description of the configuration is just given by saying, for instance, you know, what is the value of x i zero for every location, and x i zero is the probability of having no servers and what is the probability of having one server at location i. So we know everything we want to know. Okay? So this is the special case we will focus on and also, well, just to get rid of corner cases, let me also assume that, you know, even though these probabilities x i zero are numbers between zero and one, they actually never are exactly zero or are never exactly one. They are always, you know, between, strictly between these two boundaries. Okay? So this is the special case we will be dealing with here. I just reproduced it over here. And, you know, so the picture we should look at is something like this, so this, you know, this brown, well, this brown volume corresponds to the volume of the server that we have at every given location and x i zero are just the gaps that we have, you know, in each of the locations. So now, you know, so how the algorithm works now? Well, so, so how do we describe what our algorithm works? So in each round t, it is presented as some, you know, that corresponds to a location and some service cost vector. And well, usually what I should just tell you is that, you know, if our old state was this, then our new state will be this. But that's not what I will do. What I will do instead, you know, the way our algorithm is defined is just it provides a way of evolving the old state into a new state via some, well, continuous process that depends on the request. So this, so what will be just happening, you know, whenever we see, you know, the new request, then, you know, we will just set up some evolution of the current state that, you know, that along which our state will evolve for some period of time. And, you know, whatever it evolves into will be our new state. So this evolution will be, you know, so this infinitesimal evolution will be a two-step process. So what we will be doing is, you know, as a first step, we will have well, so-called rising step in which we will just decrease the value of XIT0 by, well, the following quantity. So just to decipher what this quantity is. So know that, you know, XIT0 is just the probability that we have zero servers at the requested location and CT is just the penalty that we pay for not having a server at the location. So essentially what this expression is, it's proportional to the expected service cost that we will pay in current configuration. So what we are doing, well, we are trying to decrease, well, decrease the value of XIT0 quantity. So essentially, we will try to reduce the service cost that we will incur and we will do it professionally to the service cost that we are actually well, are likely to incur at this moment. So this is sort of like a gradient descent step here. So that's good because, you know, reducing the service cost is something we definitely want to do. But, of course, the problem is that, you know, that this will make the state infeasible, yes, because we just will decrease the, we just put more volume into our system, but, you know, now we have to withdraw to make the solution feasible again. And the way we do it, well, we do it in the fixing state. And the way we do it is by increasing each of XIT0s proportionally to the value of XIT0 plus some additive factor. So we just do it, you know, we just do it infeasibly as long as, you know, the volume, we have too much volume in the system. So sort of, you know, so sort of clearly, you know, by doing that we will make the system, the state feasible again. And this, you know, fixing step is that sort of like we do this withdrawal proportionally to the XIT0 because, you know, whenever we have a location that has, you know, almost full server there, we want to be very conservative with withdrawing volume from there. We are much, much more happy to, you know, to be more, you know, to be more courageous. We draw out from a location that already have not that much server already. So this kind of, well, withdrawal rule it comes, you know, multiplicative ways of this method then this kind of withdrawal rule will be, you know, what you will get out of that. So this is the algorithm. So we just keep repeating this, you know, the steps for some period of time. And, you know, and now, you know, well, and now the question is, you know, how, you know, how to analyze it. So how to prove some competitiveness here. So, well, so, well, so I will, I want you to give you an example. Actually, the competitive analysis of this problem is quite simple. And, you know, I, well, and it's based on having some potential. So it's a potential based argument. So what our potential is, well, our potential is this expression written over there. So this expression depends on two things. On the configuration of the algorithm and the configuration of some fixed optimal solution. So we don't know, our algorithm doesn't know what the solution is, but, you know, there is some fixed optimal solution and we just want to measure some, you know, configuration from the configuration of this optimal solution. So, so, just to decipher what this, you know, what this potential is, well, known that here, essentially like phi is just the sum of, you know, contributions of each particular location and each particular location, what it contributes is, you know, it contributes zero if the optimal solution at a given time has, you know, has a server at i. So all, at all locations that, you know, that the optimal solution has a server, the contribution of this location to the potential is always zero, no matter what is, whether we have server there or not. And now, on the other hand, if, you know, if optimal solution does not have a server at the location, then this, you know, then this contribution is logarithmic, you know, given by this expression. So just to decipher what this expression is, yes. So it's over there. So this is, well, an example, an example graph of it. So what will happen is that, you know, so this expression, well, well, decreases as x i zero grows. So if x i zero is one, then, well, this logarithm just evaluates to zero. And, you know, if x i zero is zero, which corresponds to having a server at location i, then this expression is, well, it's as something that is all of log k. Okay? And no, and we have everything in between for fractional values of x i zero as logarithm function dictates. Okay? So, you know, for those, for a few that are sort of familiar, there is some similarity here between, you know, relative entropy and, you know, this potential, this symmetry is not exact, but sort of the feeling is sort of the same. Okay? So this is our potential, and now, you know, how can we use it to prove the competitiveness? So assume we want to establish all of log k competitiveness of our algorithm. And so, you know, as the potential based argument, well, as potential based arguments go, all we have to show is that, you know, at every given, like, throughout the whole execution of the algorithm, the following inequality is, you know, differential inequality is preserved. And, you know, just to get rid of some, you know, of too much of expressions, let me just, you know, prove to you that this inequality holds in a special case where this corresponds to caching when all the, you know, all the costs for not having a server are actually infinity. So we essentially, what happens there, that we never want to pay any service cost. So we always make sure that we don't pay any service cost at all. So then, you know, this inequality that we have to show, it just simplifies to the following thing. And, you know, whatever we show, we just, this will be a proof of log k competitiveness for caching problem. Okay? So how can we show this inequality here? Well, what do we do? Well, so we will just fix some round of our algorithm that corresponds to, you know, to a request being at location I.T. And we will just divide, you know, to hold this around the way what algorithm and optimal solution does into three stages. So in the first stage, you know, we just look, you know, whether opt moves or not. So opt only moves if he doesn't have a server at I.T. already. And, you know, if he doesn't have a server at I.T. already, then he will move one server there from some location I.T. So how does this move affect our inequality? Well, clearly, you know, our algorithm doesn't move. So the change of the movement of algorithm is zero. And clearly, well, just optimal solution just moved one server. So delta move opt is just one. And now what is the change in the potential? Well, in the worst case, what will happen is that, like, once we draw a server from location I.T. then these locations will start to contribute something to the potential. But, you know, no matter what the value of X I.T. is, in the worst case, it will be just some, you know, of log k factor. So we know that in the worst case, you know, from which location the optimal will draw the servers, this will be at most a factor. So we see that, you know, on the right we have O of log k and on the left we have O of log k, so everything works out. Okay? So this step is okay. Now, for the second step, well, what happens is that the algorithm tries to increase, well, decrease the value of X I.T. zero. And since it wants to have a zero service constant, essentially it sets X I.T. zero to zero. Okay? So that's what it does. So how does it reflect in our inequality? So, well, clearly optimum doesn't move, so we don't have to worry about that. Also, the potential also doesn't change, because we know that opt has to have a server to location I.T. And we know that locations that opt have a server, they don't contribute anything to the potential. So nothing changes here. The only thing that changes is sort of the move, because we, you know, we brought some volume to the location I.T. But, well, we can actually charge this increase in the volume to the withdrawal step, because, you know, whatever we bring here, we will have to withdraw in the next step. So we will just, you know, we will just charge this increase to the withdrawal in the next step. So we don't have to worry about this here. Okay? So this stage is okay. And now there is only one stage left, in which the algorithm withdraws the volume from the locations. And remember, the way it happens is that like, as long as the state is infeasible, so it has too much volume, we just increase each XI0 proportionally to the XI0 plus 1 over K. Okay? So what's happening? So once again, opt doesn't move. So we don't have to care about it. So what is the move cost of the algorithm? Well, each location contributes, you know, this factor of XI0 plus 1K times the proportionality factor d tau. And now when we sum it up, then we will see that, you know, this 1 over K will sum up to n over K, clearly. And also, you know, this XI0 they will sum up to n minus K. And why is it so? Well, it is so because we know that if state is infeasible, what does it mean that the state is infeasible? It means that, you know, the sum of XI0 is too small. So it's actually strictly smaller than n minus K. So we use here to upper bound our move cost by this quantity. And now all that remains to do is just, you know, to estimate what is the change in the potential once we withdraw. So what is the potential, what the change in the potential is? Well, so we just sum over all the locations. And, you know, and what we have here, we just look at the derivative of our potential and, you know, times the change in each of the variables. And, you know, when you do the math, when you take this derivative, then the magic happens and actually each of the terms will be exactly what equal to 1. Because, of course, this is the way it was chosen to. But sort of a nice fact. So we see that the change in the potential is just the sum. Well, we will get some minus. And, you know, and this is, you know, just the sum over all locations of, you know, of the XI0 star. So this just describes the configuration of the optimal solution. And, well, we know what this has to sum up to. It has to sum up to exactly n minus K. So we see that, you know, the change in the potential has been minus 2, well, times n minus K. And now, you know, once we want to compare this to, well, these two things, the move cost and the change in the potential. And, you know, it's a easy, it's an easy exercise to see that indeed, you know, this quantities work the way we want them. So essentially that the sum of change in the potential and the change in the move cost is, you know, at most 0. So the inequality is preserved. And what we use here is the fact, so to have this inequality hold, we have to use the fact that n is at least K plus 1. Because if it's exactly K, then the problem is not interesting anymore, so we assume that n is always strictly bigger than K. Okay? So that's all. That's the whole analysis. And, you know, just extending this analysis to the, you know, to the case when the CT can be actually arbitrary or when xi 0s can be exactly 0 or exactly 1 is very easy. What is not that easy is actually, you know, extending this algorithm to be able to handle the general cost vectors. So there is some natural way of extending the general cost vectors, but, you know, the first natural extension will not work. There will be some difficulties. So you have to get some further ideas to make everything work, but it can't be done and please see the paper for details. So, yeah, so this is it. So let me just conclude and state some open problems. So, well, what happened here is like we provide the first polylog-competitive algorithm for the case-server problem. And, you know, the grand challenge and natural question, open question here is, you know, can we finally settle this case-server conjecture, yes? So, you know, either by proving, you know, the right upper bounds or maybe finally improving the lower bounds, you know, even by something small. It would be very interesting. And, you know, and maybe a less ambitious challenge but still very interesting would be just, you know, to decide whether, you know, we'll either get rid of the dependence on n or actually maybe prove that it's necessary, yes? So, you know, having a lower bound that tells you that, you know, that you have to have some, you know, well, log star and dependence on the, well, on n in your competitive test, it would be very interesting because I think the general belief is that there should be no dependence on n at all. So, that would be very interesting. So, this is, these are questions regarding the case-server problem, but in general, sort of, you know, the one thing that I think is very important in the grand deem of dealing with uncertainty optimization is that actually I'm not exactly sure whether, you know, stochastic model or online model are the right models to begin with. So, I think that still, you know, sometimes these models work out, but I think they're not the ultimate best models that we could have. And, you know, the big open problem here is actually, like, to find some, you know, meaningful model that will on one hand allow us to get, you know, allow us to get any future results, but then it will also correspond quite well to what's happening in the reality. Okay, thank you.