 Okay, so what I will be talking about is a popular framework for dealing with uncertainty in optimization called online framework and in particular, I want to focus on a central problem in this area called the case server problem and you know present to you a new algorithm for this problem and this algorithm is joint work with Nikhil Banzal, Nef Bukhbinda and Sefi Naoh. Okay. So the point of start of this talk is the classical view of optimization, okay? So according to the classical view the way the optimization happens is that you know if you have some problems you want to solve then we are presented with an instance, well an input instance of this problem. We put this instance in our computer, we run some algorithm on that and well this produces some answer. It might be an optimal answer or maybe only approximately optimal answer, okay? So, you know, this is the way we think about optimization nowadays and sort of this point of view was, you know, it's extremely influential in English science and beyond it. But the point I want to make today is that, you know, despite all of its success and you know, and its influence there is one aspect of, you know, quite a fraction of real-world problems that this framework, you know, completely misses. That it's not present in this view and, you know, this aspect is the fact that, you know, uncertainty is a part of life and then as a result is just a part of many optimization problems. So to give you an example, you know, imagine that you are a, you know, trader at Wall Street and, you know, and if you, well and so, you know, so your task is to sell and buy stocks, to maximize profit and, you know, and if you look at the stock quotes from last month, it's very easy for you to see, you know, well, to see, you know, how should you, well, which stocks you should sell and which one you should buy to maximize your profit. But that's not the core of the problem because the core of the problem is that you have to do it for the next month. So you have to decide what to buy, what to sell, you know, before you see the full impact of, you know, of your decisions. So in particular, you know, the common, the common, the common characteristic of the problem we try to capture here is that, you know, we don't know the whole input in advance. We only learn it gradually. But still, you know, as we learn this input, we already have to make some irrevocable decisions, well, along the way. So, you know, we have to make decisions even though sometimes we cannot really evaluate what will be the impact of, on the performance of the final solution. Okay. So this is clearly an aspect of optimization that, well, classical model, classical view does not capture. So, you know, the question is, you know, how can we capture this kind of challenges? How can we deal with that? Okay. So as you might imagine, you know, this is not, this is not a new question. And there are already some well-known answers. Particularly, there are two prevailing frameworks to kind of settle this kind of questions. So the first one is called, you know, stochastic optimization. And what this framework tells us is, you know, the following that, you know, okay, well, you don't know what your input will be, really. But you at least know that it will be coming from some, you know, underlying distribution. And furthermore, you actually know what the distribution will be. So, you know, all you have to do is just you devise a procedure that, you know, will leverage this knowledge of this distribution to make sure that you will try to well optimize your expected cost with respect to the distribution. Okay? So this is, this is the, well, so this is the framework and, you know, and the nice thing about this framework is that, you know, quite often it allows you to get a positive result that, you know, that give you a performance guarantee that it's very close to what you will be able to do if you, there was no uncertainty involved at all. Okay? So that's a nice thing. But the problem is that, you know, well, it's not really clear how meaningful this kind of positive results are. Because if you have your problem, you know, how do you know that, you know, the input you are seeing comes from some fixed distribution? And even if this distribution exists, you know, how can you access it? You know, how, well, how can you tell us, you know, that, you know, that you will know what it exactly is? So this is the, well, this is the problem. And, you know, and this problem brings us to the second framework for dealing with uncertainty optimization called online optimization. So what this framework does, well, it takes exactly the opposite stance. So what it tells you is that, you know, you make absolutely no assumptions on, you know, how your input will look like. Yes, it can be even adversarial and, you know, really, it can be as bad as possible. And your task is, you know, to be able to still optimize the worst case cause of the solution. So there is no notion of expectation, because you don't know what is distribution. So you really try to, you know, optimize the worst case here. So, you know, so clearly, you know, the problem here is that, you know, once you are in such a restrictive setting, then, you know, more often than not, you will end up having not a positive result, but negative results and pretty strong at that. But, you know, on the flip side of this is, you know, if you are able to get some positive result, then this is an extremely meaningful statement, yes, because it just holds without any assumptions and you don't have to worry about, you know, how it's applicable to your particular problem. So as you might guess from that rule of the talk, you know, the framework I want to focus on today is online optimization framework. And in particular, let's start by discussing one of the classical problems in the area, namely the caching problem. Okay? So the caching problem tries to, you know, tries to capture the difficulty of maintaining a cache in a situation where the data access, well, data access patterns are unknown to us. So, you know, probably all of you know, but caching is a very popular technique for speeding up memory access in a situation when the memory has very large latency. So what you are trying to do, you know, whenever you fetch some data from the memory, you try to store it locally in your cache. So whenever you need it again, you can quickly access it. But of course, the cache here is that cache has only limited size, and you kind of just afford to simply store everything you ever, you ever fetch. So you have just to, well, to make sure that you keep the most important stuff. But now, if you don't know what the data access memory pattern is, well, it's hard for you to assess what is the important, important stuff really. So there is some difficulty coming in and, you know, this problem tries to capture exactly that. So formally, the way we do it is as follows. So the setup here is that, you know, we have our, you know, we have a universe of n pages. So these are the, you know, we have, we can at most see n different pages throughout the execution of the algorithm. And we have a cache which can hold, you know, at any given time at most k pages. And to make this interesting, we assume that k is strictly smaller than n. Okay? So this is the setup, and now we are playing the following game. So in each round, we get the page request coming in. Okay? And now, two things can happen. So either we have already, well, we already have this page in the cache, and then we are happy because we can fetch it quickly, and we don't have to do anything. Or, well, we are slightly more unlucky, and the page that is requested actually is not in our cache. And well, then we have a cache miss. And what we have to do, well, first of all, we have to fetch this page, you know, from the memory. And also, now we want to, you know, to put it in our cache. And, you know, if we have a room in this cache to put this page, then that's all we have to do. But sometimes, you know, the cache might be full already. So we have to decide which page to evict to make room for this new page. Okay? So this is the game we are playing, and this, you know, this keeps repeating. And, you know, know that the only decision that we are making in this, you know, in this model is just, you know, which page to evict whenever we have to evict the page. Okay? So everything else is already predefined by the input. So now our goal is, you know, to devise an addiction policy that will lead to the minimal, you know, total number of cache misses while over the sequence that we are presented with. Okay? So this is the problem. And, you know, now once we know the problem, you know, you might actually wonder, you know, what do we actually mean by a good solution to it? Okay? So what are the good algorithms we are looking for here? So, you know, clearly, clearly, now, if we knew all the future requests in advance, then it's very easy to get an optimal solution. It's like, it's a, it's a very simple, there is a very simple policy that allows you to do that. But of course, you know, the point here is, you know, well, like the point here is that it's not, that's not what we are after here. So we are after here, we are trying to do well, despite the fact that we have no idea what the future will be. So this is the online setting. And in particular, once you have no idea what is the future, then there is no hope really for always be able to match the offline optimal. So offline optimum is the best solution when you know the whole, the whole, the whole sequence of requests in advance. So, you know, so we cannot hope to match the offline optimum. So what we are after here is to, well, what we are after here is, you know, to design an online algorithm that tries to never do much worse than offline optimal. Okay? So we, well, we don't hope to be always able to be optimal, but at least we want to be always sure that we are not far from the optimal solution. And you know, to make it precise, you know, we introduce the notion of competitiveness. And we say that the algorithm is, you know, C competitive for some value C. If on every, every request sequence, the number of cache misses that we suffer is never bigger than C times the optimal number of, you know, of cache misses that we could, you know, that we could have here. Okay? And, you know, what we will be after, we will be trying to find an algorithm who have, you know, minimal competitiveness. Okay? So now, you know, once we know, you know, okay, I should also mention that, you know, competitiveness is only one measure of performance of the algorithm. There are also other measures that I will not talk about. You know, in particular, there is so-called, you know, minimal regret measurement, but I will not go into that in this talk. So, okay, so now we know what we do, what, what, what we mean by, you know, by a good algorithm. So, you know, so what is known about, so how can you get some, you know, some nice algorithm for this problem? So, you know, if you think, if you think about good algorithms for this problem, then probably one of the first reasonable approaches that you will, that you will find is so-called LRU policy in which you are, you know, evicting the page that is, you know, that was least recently used. Okay? And indeed, Slyther and Tadrin, in their seminar paper, they analyzed this policy and they proved that it's, that it is K-competitive. Okay? So the same as this is a K-competitive is an interesting statement because, you know, note that the competitiveness does not depend on the length of the sequence. Okay? So the sequence can be very long, but this, you know, this sort of this competitiveness ratio depends only on the size of the cache. So that's a very interesting. On the other hand, you know, if your cache has like 10,000 lines, then it's not, you know, it's not the most satisfying performance ever. So, you know, so the obvious question is, you know, can we do better than that? Okay? And Slyther and Tadrin thought about this question and they actually answered that, you know, you cannot beat this, you know, this factor of K at least, you know, as long as your algorithm is deterministic. Okay? So, most of the, you know, the key reason why this is the case is that actually, deterministic algorithms are completely predictable. So, you know, if you give me a, well, if you fix an algorithm and you, you know, and you, and you look at any particular input sequence, then, you know, you are able to predict exactly what will be the state of the cache of the algorithm at every given time step. Okay? So everything is uniquely defined. So as a result, it is not hard to design a sequence that causes this algorithm to have a cache miss every single run. And then when you do, you know, appropriate math, you will notice that, you know, you can make the sequence that, you know, it will cause the algorithm to have a cache miss every single time, but the optimal solution will only have a cache miss at most every, every kth run. So this will lead to this k lower bound. Okay? So this is, you know, so this is, this is a result. But of course the natural question now is, you know, so how about randomness? Can randomness help you? And well, it turns out that it indeed can. So, you know, so what you would like here, what you are considering here is, you know, we are thinking about eviction policies that can use, well, can make random choices. Yes? So, so what we do, well, our algorithm can, you know, can, can, can toss random coins. And what we are measuring for a given sequence is the expected number of cache misses. Okay? And once we look at this model and we give more power to our algorithm, then we could, then we could do much better, actually, actually exponentially better. In fact, you know, there is a very, you know, very simple algorithm that allows you to obtain O of log k competitiveness. And you can prove that this O of log k is actually the best possible here. So, clearly for, you know, for caching problem, we essentially know everything. So, we know the best competitiveness for, you know, for a randomized model and for the termistic model. So, probably this is time to move to a slightly more challenging problem. There'll be the case error problem, which will be the focus of the reminder of the stock. Okay? So, yeah, so probably, you know, the easiest way to think about what, you know, what the case error problem is, is to view it as a, you know, as a task of, you know, that fire track dispatcher is facing. Okay? So, the setup here is that we have n locations over some metric. So, you know, there might be different distances between any two points. Okay? And we have case servers of, or fire tracks here. Okay? So, and we want the number of servers to be, well, strictly smaller than the number of locations to make it interesting again. And now, you know, we are playing the following game. So, in each round, we have a request or a fire appearing at one of the locations. And once again, one of two things can happen. So, you know, either, you know, well, we have a server or a fire track, you know, at this location already. So, you know, so we are happy with Modulo putting out the fire. Or actually, you know, we don't have a, we don't have a fire at this location. And then what has to happen is, well, we have to decide, you know, which of the servers or fire tracks to move to this location well, to take care of this, of the fire. Okay? So, here is the fire and we move this track, and there is another fire and we move this track. So, you know, so this is the game. And now, you know, our task is to devise a way of, you know, moving the servers around that will, you know, lead to the minimum total distance traveled by the, by these fire tracks, you know, while serving the, while serving the request. And of course, we want to do all of this in online setting. So, in setting when we don't know where the future fire will be. Okay? So, this is the problem. And, you know, and now a good question is, you know, why do we even care about this problem? Okay? So, what is the motivation for starting this kind of problem? And, well, there are a couple of reasons for that. So, one reason is that actually, you know, this problem captures, well, even though it's very artificial and sort of, and sort of very, very simple, or maybe, well, maybe because of that, it captures quite a variety of online scenarios. So, you can imagine, you know, fire, well, the servers to be like technicians, service trucks, well, copies of frequently changing data, et cetera, et cetera. And, you know, and each of these institutions will, you know, will yield some interesting online scenario and they all will be captured by this case-server problem. And in particular, you can view caching as a special case of the case-server problem. So, essentially caching is the case-server problem on uniform metric. So, to see why this is the case, you know, you just, you know, you just identify locations with pages and, you know, and you interpret the server location, having server location I, well, you interpret it as you have page I in your cache. And then it's easy to see that whenever you have a request, you know, it corresponds to, well, you can see it as a, as a fire or as a, or as a request to a, to a page and, you know, this, you know, this operation of moving the servers corresponds exactly to, you know, eviction and fetching, fetching procedure. And since the metric is uniform, you know, every such a move counts the same, so this will just count the number of cache misses while you are serving the request. Okay? So, so this is one type of reasons why we care about this problem, but also, you know, the second type of, well, reason that, that makes this problem interesting, at least to me as a theoretician, is that, you know, despite the very simple formulation, this problem has extremely rich structure. So, in particular, you know, the more you look into this problem, you are, you know, it seems to capture something about online optimization that, you know, we don't really understand, but, you know, we should understand. So, that's, you know, another reason why we really want to understand what's going on in this problem. Okay? So, now, you know, so now once we know what is the problem and why we might care about it, you know, let's talk about, you know, what is known about this problem. So, clearly, you know, as caching is a special case of the case error problem, then all the lower bounds, you know, transfer over. So, we know that we cannot hope for a better than k competitive, you know, deterministic algorithm and better than overflow k competitive randomized algorithm. Okay? And the interesting thing is that, you know, despite the fact that the case error problem seems to be much, much, much more general than that, we don't know any single better lower bound. So, all the lower bounds for this problem come directly from the lower bounds for the caching problem. Okay? So, this is, you know, these people were quite surprised by that and, you know, they couldn't believe at first, but then, you know, while this set of the affairs made them, you know, put quite a bold conjecture. So, okay, so if we cannot prove better lower bounds than we have for caching, so, well, maybe, you know, maybe, you know, maybe these are actually the right bounds here. So, maybe, you know, we can, you know, these bounds are actually tight and, you know, and this question has, you know, two versions. So, one is deterministic case error conjecture which asks about exactly k competitive deterministic algorithm and the other is, you know, the randomized case error conjecture that asks about, you know, overflow k competitive algorithm. So, these are the big question that we are after here, too. And, you know, so let's talk about, you know, what is known about both of these conjectures. So, you know, so let me give me just a very brief history of what's going on, because as you might imagine, there is actually a multitude of work on this problem and I don't have time to really, you know, present in any meaningful manner. So, you know, in terms of deterministic case error problem, then the conjecture, the world that you can hope for, you know, k competitiveness was already put forward in the paper by Manas et al in 88, but they didn't provide any general case error algorithm. They only had some indications to believe that this is the right answer. And, you know, it took a while before, you know, the first algorithm whose competitiveness depends only on k emerged. And, as you can see, you know, this competitiveness is quite large, it's actually exponential in k. So, it seems to be very far off from, you know, what we hope is correct. But shortly thereafter, there was a breakthrough paper by Kustupets and Papadimitriou, who showed that actually you can get an deterministic algorithm whose competitiveness is 2k minus 1 only. So, clearly, you know, here we are done up to a factor of 2, and in fact, you know, for some special cases of the problem, for instance, when the metric comes from a tree metric, we know that k is actually the right answer. So, you know, we are, the state of affairs is pretty good here already. So, how about the, you know, the randomized case error problem? So, you know, in case of the randomized algorithm, there was really a lot of work on various special variants of this problem. In particular now, quite recently, we learned that you can get an O of log k competitive algorithm for an extension of the caching problem called weighting paging problem, and this is the result of BANZAL at O. But, you know, if you ask about the, you know, the BANZ for general metric, then the best-known result is just, you know, the deterministic algorithm of Kustupets and Papadimitriou. So, essentially, you know, despite, well, it's intuitively obvious that we know how that randomization should help you, we have no idea how to, well, how to take advantage of it. We don't understand how to use it here. And in fact, you know, the state of the affairs is even more embarrassing than that because we, essentially, we don't know any O, or a little O of guarantee for, you know, for even, well, such simple metrics like a two-level tree or a line. So, really, you know, despite a lot of effort, the state of the affairs is not very promising here. So, you know, once you look at this, well, at this, you know, you start to wonder, you know, maybe, you know, this bold conjecture that you are able to get, you know, O of log K competitiveness is just, you know, it's just too far reaching and it's really too optimistic. So, maybe, you know, maybe actually K is about here. So, you know, so why, why are we optimistic about, you know, getting O of log K? Well, there is no good reason, but we are and, you know, what I, and the result that I want to present to you today gives you actually some hope that maybe, you know, this polylog competitiveness is not that much, well, that far fetch after all. So, what we do, well, we present an algorithm for the general K server problem, the randomized algorithm for the general K server problem that is, you know, polylog competitiveness, well, polylog competitive, but of course, the catch here is that this polylog depends on K and N, so the number of points. So, as long as the number of points is, you know, sub-exponential in K, then we are improving over this 2K minus 1 bound, but in general, these two points are incomparable, okay? So, there is this dependant on R. So, you know, this is the result, so let me just, you know, proceed to explaining to you while giving you some idea how this kind of result can be obtained. So, well, clearly, you know, when you look at the, you know, state of the affairs, well, the history of the problem what people struggled with, then it's pretty clear to everyone that sort of the hard part here seems to be dealing with an arbitrary, you know, arbitrary matrix, metric, okay? Because, well, if metric is uniform, we know what to do. We know exactly what to do. So, sort of the guiding principle of what you'll be doing here is, well, we'll try to trade the complexity of the underlying metric. Well, we will, for, you know, for the complexity of the problem we are solving. So, what do I mean by that? Well, what we will try to do, we will try to reduce the k-server problem over an arbitrary metric to a different, more difficult problem, but over a very simple metric, namely the uniform one, okay? So, how can we do this kind of reduction? So, how, well, how can we go about that? Well, it, you know, well, this reduction comes essentially in two big steps. So, the first step, you know, has something to do with so-called HST matrix. So, what is an alpha HST metric is? Well, it is a metric that is induced on the list of the following, well, type of a tree. So, this tree is a level, it's a level three, and, you know, and all the edges on the, on the first level are, you know, have length one, all the edges on the second level have length alpha, all the edges on the third level have length alpha squared, and we have this kind of, you know, and we have this kind of exponential growth as a rate of alpha, you know, for each, for each of this level. So, you know, so any metric that is induced, well, on the list of such a tree will be an alpha HST metric. And, you know, the reason why we, you know, bring up this quite, you know, bizarrely-looking metric is that actually there is a well-known beautiful theorem that tells us that, you know, that actually, as long as we are okay with losing, you know, a factor of, you know, alpha times log n, then, well, any metric on n points can be embedded, can be embedded into this kind of metric. So, essentially, if you want to, you know, if you want to, well, get an algorithm for the case error problem for general metrics, and you are willing to pay this, this, well, alpha log n factor in our competitiveness, then we can just constrain ourselves from now on just to alpha HST metrics. Okay? So, because whenever we get an algorithm for this kind of metrics, it will immediately give us an algorithm for general metric here. Okay? So, that's the first step. And so, from now on, we just look at, you know, at alpha HST metrics. So, now, you know, to understand the second step, well, we have to just take a look at, you know, how a solution to a case server looks like on an alpha HST. Okay? So, clearly, by definition of the metric, you know, you know, we have, you know, that both requests and servers reside in the leaves of this tree. And, you know, if you observe a case server solution, then, you know, like the, and if you look just at the trees, then, you know, the, well, when you, the process that you imagine, they're like sort of the way the servers are moving around here might be very difficult and not really, well, easy to grasp. So, what we will try to do instead to sort of understand what's going on a bit better, well, we will try, you know, we will try to, you know, view this, you know, complicated process on leaves as a process that is generated by relatively simple processes well, on happening for each of the internal nodes. Okay? So we just try to break down this complicated process into simple processes that are happening for every internal node. So, you know, so this might be a bit mysterious. What do I mean by that? So let's, well, let's try it slowly here. So in particular, you know, so let's focus our attention on one particular internal node. Let's say the root, okay? And let's look at, you know, what does this root see when he just, you know, constrain himself to his local neighborhood? So, you know, in his local neighborhood, he does not see anything except, you know, its children. So it particularly doesn't see the leaves. And, you know, what's going on there from his point of view? So, well, he doesn't see where the request arrives to which leaves the request arrive. He just sees to which, you know, subtree corresponding to his children where this request shows up. Yeah, so if there is a request, if there is a request, well, somewhere in the, you know, in the subtree corresponding to the child, he only knows that, you know, there is a request corresponding to this child. And that's all that he sees. And similarly, you know, it doesn't see, you know, where the server exactly are located. All it sees is just, you know, how many servers are in each of the respective subtree. So from his local point of view, he only sees this, you know, configurations when, you know, just assign number of servers to each of the, each of his child's subtree. And, well, in particular, you know, whenever there is a movement of servers around, well, he doesn't see anything of this movement as long as, you know, this movement does not, you know, move servers, well, between different, different subtree. Okay. This is the local view of the route. And now we wonder, you know, what does this route see when he observes an optimal solution on the K server? Okay. Well, what it will see is that, you know, there will be some dynamic emerging from, you know, from, from these configurations he's seeing. Namely, there will be like sort of two conflicting processes happening. So on one hand, you know, what will be happening is that the trees that tend to get a lot of requests will also tend to get more, more servers there because it will make, you know, taking care of this request easier, easier. But on the other hand, of course, you know, the number of servers is fixed. So we cannot just, you know, give a lot of servers to every tree and also moving servers around is expensive. So there will be some balancing act that we'll try to, you know, to just decide which are the trees that need most, you know, most of the resources. So we see that, you know, we will have these two conflicting goals and now when we want to actually recover a good, you know, a good solution to the K server problem, we will try to, you know, set up an algorithm that we'll try to guess what would be the best assignment of servers throughout the time. So we will try to, you know, to capture this kind of, well, algorithmic task as an algorithmic, as a particular algorithmic problem. And when you try to do that, then, you know, the problem you are likely to, the problem you are likely to end up with is so-called the allocation problem, which is the third and last of the problems I want to introduce today. Okay. And by the way, I should mention that, you know, that, like this, this general approach was pioneered here by, you know, Kota and Myersen Poplowski and in particular, they were the first to introduce this allocation problem in the context of the K server problem. Okay. So what is the allocation problem? So in the allocation problem, well, you can think that this is a job of some manager that oversees, you know, while taking care of some projects, you know, well, by some team of, you know, of workers in a couple of offices. So the setup here is that we have a delocations over a uniform metric now. So the metric is uniform again. And now, you know, well, we have K servers or workers. And now it will actually make sense to have more than one worker at a given location. Okay. And now, you know, the game that is placed is as follows. So in each round, we get a request. So the request is just, you know, what it consists of, well, it consists of a location to which it corresponds to. And also, you know, together with this location, we are presented with K plus one values that are non-increasing. So there is no ST0 up to STK. And the meaning of these numbers is, you know, STJ is just the cost of completing the project that's happening at this, you know, requested location if we have exactly J workers there. Okay. So essentially, you know, what's going on in each round. So, you know, the manager, whenever he sees the, you know, the request and the corresponding service cost, he has to decide, you know, whether I want to move some workers around and, you know, any moving of workers will cause a move cost. And then once he says, okay, I'm done now, then, you know, we just look at the location that was, you know, what that was requested. And we pay the service cost, given by the respective value of STJ. Okay. So here, you know, maybe seeing this request at first location, he decides to, well, to move, you know, to move to workers there. And now he says, I'm done. And he pays the cost of ST3 as a service cost here. Okay. And also he paid the move cost of moving to service. So this is the game that we are playing once again in online setting. And our goal now is to do, to play this game in such a way that we will minimize the sum of the move cost at the service cost here. Okay. So this is the problem. And note, you know, that I mentioned that, well, that, well, at some point, I mentioned that, you know, the problem we'll end up working with will be a, you know, will be a generalization of case server problem. And indeed it is. So it's easy to see that if we just consider a special case of this problem in which, you know, the request or of the form that you pay infinity service cost, if there is no severed location and zero service cost will never there's at least one severed location, then what we will recover here is exactly, you know, caching problem, which is the case server on the uniform metric. But of course, you know, we see that allocation problem is much more than that. And sort of, that will be the price we will repay. And just one, just one thing to note. So one thing that I ignored here that actually, you know, to make this problem useful to us, I'm sorry, you have to generalize this slide. Namely we also, you know, well, in addition to all of, you know, the service costs and the request, we also should allow the number of available workers to change over time. So like in each round, we also are told, okay, this round, you have only this many workers, you know, available to you. So if you have more than that, you have to return some of them back to the headquarters. Or, you know, if you have less than that, then you will get some free workers here. Okay? But I will ignore this, I will ignore this, this part in this talk. So this is the problem we are, you know, we are interested in, and you know, the question is, you know, why do we care about this problem? Well, and the reason why we care about this is the problem, because like Kota at all showed that if you are able to get a good enough algorithm for this allocation problem, then it allows you to get, you know, well, what we are looking for, I mean a polylogarithmic a competitive algorithm for the K-serial problem over general metric. So, note here that this polylog depends on K and N as we expect, but also it depends on Delta which is the aspect ratio of the underlying metric. So, well, this is the theorem and essentially, you know, what it means is that, you know, from now on, all we really need to care about is just, you know, finding the good enough algorithm for the allocation problem. So, note that now we have a problem over uniform metric, but well, it seems to be slightly more difficult than the actual problem we were dealing with. And so, this is the theorem that, you know, Kota at all proved, and also they sort of, they sort of made the first step and they show that, you know, there indeed exists this good enough algorithm at least, at least if, you know, if your, if your, if your problem has, corresponds just to exactly two locations. So, D is equal to two. And this, you know, doesn't really give us anything in general case, it only gives us some, well, algorithm for very special type of metrics. But this is the first step. So, well, all we have to do is just, you know, make the next step and just get, you know, the good enough algorithm for, you know, for the general algorithm case. So that's, you know, that's what we set out to do. And that's exactly where we failed. Because, you know, after quite a lot of effort, we realize that we don't know how to do that. Like, it seems to us that the problem, well, in the, well, in the course of this reduction, the problem is that we ended up with, just becomes too hard. We just don't know how to get this good enough guarantee for this, you know, for the problem that I just stated to you. So that's, that's not good, yes, because, well, essentially it seems that you, well, this is a dead end. So because we reduce a problem to another problem that we don't know how to solve. So that means, you know, that all the effort is wasted here, but, well, maybe it's not that bad. So let's try to recover. And so, so let's try to, you know, sort of to, yeah, to prolong the agony here. And let's try to know, okay, we don't know how to solve the, you know, the actual problem. So let's just try to look at some fractional relaxation of it, okay, and see what we can do that. So what kind of fractional relaxation can we consider here? Well, it's probably the, you know, the simplest possible fractional relaxation you could imagine. So what we will be doing, so well, instead of keeping track of the actual integral configuration of the, of the servers, what we will do, we'll just keep track of, you know, of some marginals. So we'll just keep track of marginal probabilities xij for any i and j, well, which is the probability that we have exactly j servers at location i. Okay. So what kind of constraints you want to enforce here? Well, one type of constraints we'll just say that for any location i, this is indeed probability distribution. So this xij for fx i sum up to 1. And the second constraint we'll just say that, you know, if we sample independently the number of servers from each of these configurations, then the expected number of servers that we will get with this procedure will be exactly what it should be, namely k. Okay. And once, now once we are dealing with a more general configurations, we have to extend the notion of service cost and move cost and we do it in an absolutely, well, natural manner. So we just look at the expected service cost and we look at some well, earth mover distance between configurations as they change. So now we have this fractional relaxation, you know, and the good news here is that, oh, actually for this relaxation, we know how to get this good enough algorithm. So, so far so good. But, well, the problem is that this fractional relaxation actually, well, we can't really use it to get the good solution for the integral case, because the problem is that this fractional relaxation actually has very bad integrity gap. So it has an integrity gap of omega of k. So, you know, there is no hope that we will be able to get a running procedure that will always will give us a good enough solution for the integral case. So, well, we tried to do something and we we got stuck again. So what to do now? So do we abandon this whole approach? Well, not really. So now, you know, the key observation that we have to make is that, that actually, you know, why do we even care about this allocation problem? It's like, we don't really want to solve this allocation problem. This is just some kind of proxy for what we really want to solve, namely, the case-server problem. So we want to get the integral solution for the case-server problem, not necessarily the integral solution for the fractional problem, for the allocation problem. So, once we realize that, then what we do is, and this is the outline of our algorithm, what we do, well, we start from our solution for the fractional allocation problem and then we, you know, extended the reduction of quota at all to, well, to get out of this fractional allocation problem a fractional solution to the case-server problem on the HST. So, so we get a solution that is fractional for case-server problem. But now, we show that once you are, you know, once you are dealing with a case-server problem, like fractional case-server problem on HST, that's why there is a simple rounding procedure that will give you, you know, the integral solution to the case-server problem on the HST and will just incur some, you know, some acceptable overhead. So, well, so this is the way the algorithm works and, you know, and it seems to work, so the question is, you know, where did the integrity gap go? Well, essentially, what happened here is that, you know, once we are able to defer the rounding step to the moment when we deal with case-server problem and not the general allocation problem, then, you know, then for the, for the, for the configuration corresponding to the case-server problem, actually this integrity gap, well, is not present. It only is present for some configurations for the allocation problem. So, so by, by, in this way, you know, by being able to defer this step, we just, you know, got around this integrity gap and we could, and we could get what we wanted. And let me just know that one thing that I omitted here is that, you know, as probably you not noted, you know, the reduction of quota at all, it introduced also a dependency on the aspect ratio of the metric. And actually, we had a way of getting rid of that. So we, we are able to not well, to get rid of this dependence, but I will not talk about this here. So this is the way the algorithm work and, you know, and the remaining time I just want to spend on, you know, on giving you some glimpse of how this fractional allocation problem actually works, works like this, because maybe it wasn't clear from the way I was explaining things, but actually this is the heart of our algorithm. It's like, this is actually where the most integral things, you know, are happening. So let me try to give you a glimpse of that here. So, well, I will do it. You know, I will give you these glimpse well, just for a special case of the problem. So in the special case, you know, well, I will just assume that all the service cost vectors are of the form that you either have to pay well, some, well, some cost CT for not having a severed location. And you know, once we have at least one severed location, we just pay zero. So this special case corresponds to caching with renting when, when we have a caching, when we have a caching problem in which, you know, whenever we have a request and we don't have a server there, we have an option of paying CT for just, you know, for just, you know, taking care of this particular request and not have to move the server. So not having to move the server. So this is the special case we will focus on here. And you know, clearly once we are in the special case, then really, it doesn't make sense to put more than one severed any given location. So we can assume that, you know, that all xijs for, you know, for j bigger than one are just zero. And essentially, you know, once we are in the special case, you know, we can, you know, the complete description of the configuration is just given by saying, for instance, you know, what is the value of xij0 for every location? And xij0 is the probability of having no servers at location i. Because once we know this, we know what is the probability of having one server at location i. So we know everything we want to know. Okay? So this is the special case we will focus on and also, well, just to get rid of of corner cases, let me also assume that, you know, even though this probability is xij0 might, are numbers between zero and one, they actually never are exactly zero or are never exactly one. They are always, you know, between, strictly between these two boundaries. Okay? So this is the special case we will be dealing with here. So I just, you know, I just reproduced the, it, sorry, I just reproduced it over here. And you know, so the picture we should look at is something like this. So this, you know, this brown, well, this brown volume corresponds to the volume of the server that we have at every given location. And xij0s are just the gaps that we have. You know, in each of the locations. So now, you know, so how the algorithm works now? Well, so, well, what does our, so, so how to describe what our algorithm works? So in each round T, it is presented with some request and some, that corresponds to a location and some service cost vector. And well, usually what I should just tell you is that, you know, if our old self was this, then our new state will be this. But that's not what I will do. What I will do instead, you know, the way our algorithm is defined is just, it provides a way of evolving the old state. Into a new state via some, well, continuous process that depends on the request. So this, so what will be just happening, you know, whenever we see, you know, whenever a new round happens, you know, and we see the new request, then, you know, we will just set up some evolution of the current state that, you know, that along with our state will involve some, for some, for some period of time. And, you know, whatever it will evolve into will be our new state. So this evolution will be, you know, so this is in infinitesimal is, well, evolution and it will be a two step process. So what we will be doing is, you know, as a first process, as a first step, we will have, well, so-called rising step in which we will just decrease, well, the value of XIT0 by, well, the following quantity. So just to decipher what this quantity is. So know that, you know, XIT0 is just the probability that we have zero servers at the request location. And CT is just the penalty that we pay for not having a server location. So essentially what this expression is, it's proportional to the expected service costs that we will pay in current configuration. So what we are doing, well, we are trying to decrease, well, decrease the value of XIT0 by this quantity. So essentially, we will try to reduce the service costs that we will incur. And we will do it professionally to the service costs that we actually, well, are likely to incur at this moment. So this is sort of like a gradient descent step here. So that's good because now, reducing the service costs is something we definitely want to do. But of course, the problem is that, you know, that this will make the state invisible, yes, because we just, well, put more volume into our system. But, you know, now we have to withdraw some volume to make the problem, to make the, to make the solution feasible again. And the way we do it, well, we do it in the fixing state. And the way we do it is just by increasing each of XIT0s proportionally to the value of XIT0 plus some additive factor. So we just do it, you know, we just do it infinitesimally as long as, you know, the volume, we have too much volume in the system. So sort of, you know, so sort of clearly, you know, by doing that, that we will make the system, the state feasible again. And the intuition behind this, you know, fixing step is that sort of like, we do this withdrawal proportionally to the XIT0 because, you know, whenever we have a location that has, you know, almost full favor there, we want to be very conservative with withdrawing volume from there. We are much, much more happy to, you know, to be more, you know, to be more, more courageous, like, well, we withdraw from a location that already have not that much server already. So this kind of, well, withdrawal rule, it comes, you know, when, if you are familiar with multiplicative weight update method, then this kind of withdrawal rule will be, you know, what you will, what you will get out of that. So this is the algorithm. So we just keep repeating this, you know, these steps for some period of time. And, you know, and now, you know, well, and now the question is, you know, how, you know, how to analyze it. So how to prove some competitiveness here. So, well, so, well, analysis of this problem is quite, quite simple. And, you know, I, well, and it's, it is based on having some potential. So it's a potential-based argument. So what our potential is, well, our potential is, is about this expression written over there. So this expression depends on two things. On the configuration of the algorithm, and the configuration of some fixed optimal solution. Yeah, so we don't know, our algorithm doesn't know what the solution is, but you know, there is some fixed optimal solution. And we just, we just want to measure some, you know, some distance of ours, our configuration from the contribution of this optimal solution. So, so just to decipher what this, you know, what this potential is, well known that here, essentially like phi is just the sum of, you know, contributions of each particular location. And each particular location, ah, what it contributes is, you know, it contributes zero if the optimal solution, at a given time, has, you know, has a server at i. So all, at all locations that, you know, that the optimal solution has a server, the contribution of this location to the potential is always zero, no matter what is, what, whether we have server there or not. And now, on the other hand, if, you know, if optimal solution does not have a server at the location, then this, you know, then this contribution is logarithmic, you know, given by this expression. So just to decipher what this expression is, yes. So it's over there. So, so this is, well, an example, an example, graph of it. So what will happen is that, you know, so this expression, well, well decreases as x i zero grows. So if x i zero is one, then, well, this logarithm just evaluates to zero. And, you know, if x i zero is zero, which corresponds to having a server at location i, then this expression is something, it's as something that is o of log k. Okay. And no, and we have everything in between for fractional values of x i zero as, you know, as logarithm function dictates. Okay. So, you know, for those for, for a few that are sort of familiar, you can see that there are, there is some similarity here between, you know, relative entropy and, you know, and this potential, this similarity is not exact, but sort of the feeling is, is sort of the same. Okay. So this is our potential. And now, you know, how can we use it to prove the competitiveness? So, well, so assume we want to establish o of log k competitiveness of our algorithm. And so, you know, as the potential based argument, well, well, well, as potential based arguments go, all we have to show is that, you know, at every given, like throughout the whole execution of the algorithm, the following inequality is, you know, differential inequality is preserved. And, you know, just to, just to get rid of some, you know, of too much of expressions, let me just, you know, prove to you that this, that this inequality holds, in a special case, where this corresponds to caching when all the, you know, all the costs for not having a service are actually infinity. So we essentially, what happens there that we never want to pay any service costs. So we always make sure that we don't pay any service costs at all. So then, you know, this inequality that we have to show, it just simplifies that the following, to the following thing. And, you know, whatever we show, we just, this will be a proof of, of, of low pay competitiveness for caching problem. Okay. So how, how can we show this inequality here? Well, what we do, well, so we will just fix some round of our algorithm that corresponds to, you know, to a request being at location I.T. And we will just divide, you know, we will just divide the, this round, the way what algorithm and optimal solution does, into, well, into three stages. So in the first stage, you know, we just look, you know, whether opt moves or not. So, so opt only moves if he doesn't have a server at I.T. already. And, you know, if he doesn't have a server at I.T. already, then he will move one server there from some location I.T. So how is it? Well, how does this move affect our inequality? Well, clearly, you know, our algorithm doesn't move. So, so the change of the movement of algorithm is zero here. And clearly, well, well, just optimal solution just moved one server. So, so delta move opt is just one. And now, what is the change in the potential? Well, in the worst case, what will happen is that like once we, we draw a server from location I.T. Well, then these locations will start to contribute something to the potential. But, you know, no matter what the value of X I.T. Zero is, in the worst case, it will be just some, you know, O of log K factor. So we know that in the worst case, you know, no matter from which, from which, from which location the server, the optimal will withdraw the servers. This will be at most that other factor. So we see that, you know, on the right, we have O of log K and on the left, we have O of log K, so everything works out. Okay? So this, so this step is okay. Now, for the second step, well, what happens is that the algorithm will try to increase, well, decrease the value of X I.T. Zero, and since it wants to have a zero service cost, then essentially, it sets X I.T. Zero to zero. Okay? So that's what it does. So how, how does it reflect in our inequality? So well, clearly, optimum doesn't move. So we don't have to worry about that. Also, the potential also doesn't change, because we know that, opt has to have a severed location I.T. And we know that locations that have, like opt have a server, they don't contribute anything to the potential. So, so nothing changes here. The only thing that changes is sort of the move cost, because we, you know, we, we brought some volume to the location I.T. But well, we can actually charge this increase in the volume to this, to the withdrawal step, because, you know, whatever we bring here, we'll have to withdraw in the next step. So we will just, you know, we will just defer, we'll just charge this increase to the, to the, to the withdrawal in the next step. So we don't have to worry about this here. Okay. So this, the stage is, it's okay. And now there is only one stage left, in which the algorithm withdraws the volume from the, from the locations. And remember, the way it happens is that like, as long as the set is infeasible, so it has too much volume, we just increase each I, XI0 proportionally to the XI0 plus 1 over K. Okay. So what's happening? So once again, opt doesn't move. So we don't have to care about it. So what is the move cause of the algorithm? Well, each location contributes, you know, this factor of XI0 plus 1K times the proportionality factor, the, the, the tau. And when we sum it up, when then we will see that, you know, this one over K will sum up to N over K, clearly. And also, you know, this XI0s, they will sum up to N minus K. And why is it so? Well, it is so because we know that if state is like, what does it mean that the state is infeasible? It means that, you know, the sum of XI0 is too small. So it's exactly strictly smaller than N minus K. N minus K. So we use here to upper bound our move cause by this, by this quantity. And now all that remains to do is just, you know, the estimate what is the change in the potential while once we withdraw. So what is potential, what the change in the potential is? Well, so we just sum over all the locations. And you know, and what we have here, well, we just look at the derivative of our potential. And, you know, times the change in the, in each of the variables. And, you know, when you do the math, when you, when you take this derivative, then the magic happens. And actually each of the stands will be exactly what equal to one. Because of course, this is the way it was chosen to. So this is not a coincident, but sort of a nice fact. So we see that the change, the potential is just the sum. It's, well, we will get some minus. And you know, and this is, you know, just the sum over all locations of, you know, of the XI0 star. So this is just described the, the configuration of the optimal solution. And well, we know what this has to sum up. It has to sum up to exactly n minus k. So we see that, you know, the change in the potential has been minus two, well times n minus k. And now, you know, once we want to compare this, this two, well, these two things, the move cost and the change in the potential. And, you know, it's a, it's a easy, it's an easy exercise to see that indeed, you know, this two quantities work the way we want them. So essentially that the sum of change in the potential and the change in the move cost is, you know, at most zero. So the inequality is preserved. And this is, and what we use here is the fact. So to have this inequality hold, we have to use the fact that n is at least k plus one. It's because if it's, if it's exactly k, then the problem is not an interesting anymore. So we assume that n is always strictly bigger than k. Okay. So that's all. That's the whole analysis. And, you know, that's extending this analysis to the, you know, to the case when the city can be actually arbitrary, or when XI0s can be exactly zero or exactly one, it's very easy. What is not that easy is actually, you know, extending this algorithm to be able to handle the general cost vectors. So there is some natural way of extending this algorithm to general cost vectors. But, you know, the first natural extension will not work. There will be some difficulties. So you have to get some further ideas to make everything work, but it can be done and please see the paper for details. So, yeah. So this is it. So let me just conclude and state some open problems. So, well, what happened here is like we, we provide the first polar competitive algorithm for the case error problem. And, you know, the grand challenge and natural question, open question curious, you know, can we finally settle this case error conjecture? Yes. So, you know, either by proving, you know, the right upper bounds or maybe finally improving the lower bounds, you know, even even by something small, it would be very interesting. And, you know, and maybe a less ambitious challenge, but still very interesting, would be just, you know, to decide whether, you know, well, either get rid of the dependence on N or actually maybe prove that it is necessary. Yes. So, you know, having a lower bound that tells you that, you know, that you have to have some, you know, well, lock star and dependence on the, or well, on N in your competitive test, it would be very interesting because I think the general belief is that there should be no dependence on N at all. That would be very interesting. So, this is, these are questions that are regarding the case error problem, but in general, sort of, you know, the one thing that I think is very important in the grand theme of dealing with uncertainty optimization is that actually I'm not exactly sure whether, you know, stochastic model or online model are the right models to begin with. So, I think that still, you know, sometimes these models work out, but I think these are not the ultimate best models that we could have. And you know, the big open problem here is actually like to find some, you know, meaningful model that will, on one hand, allow us to get, you know, well, allow us to get any theoretical result, but then we will also correspond quite well to what's happening in the reality. Okay, thank you.