 Okay. Thank you very much for coming. And this is joint work with Morita and Sina, both from Georgia Tech. So let's start by seeing what circuit switching is. So the motivation for this comes from switching or routing in data centers. And you'll see in a second why there's like the, either it's optical or wireless, for example. But on a high level, what we have here is a collection of senders and a collection of receivers, right? So these are the senders and these are the receivers. And at any given time, we want it, we want actually to decide which sender sends information to which receiver, right? And each receiver can receive information from at most a single sender, and each sender can send information to at most a single receiver. So essentially what we're looking for is a matching, right? So this matching, for example, dictates which sender sends information to which receiver. And we keep this matching for some duration of time. And then at some point, we want to switch to a different matching, okay? Which dictates the different fronting in the data center. But what happens in modern, let's say, data centers and networks, is that if you use optical methods or wireless methods, this switching actually incurs a cost. And this cost is actually cost, what we call the switching cost, which means that there is some fixed amount of time where nothing can be routed or essentially nothing can be sent. And the goal, the high level goal in circuit switching is actually find a sequence of matchings that maximizes it, maximizes the total data that you can send in a given fixed time interval, okay? So let's see what is the formal definition of the problem. So we get a collection A of senders and a collection B of receivers. We also get a demand matrix. So for every sender I and receiver J, DIJ denotes how much data needs to be sent from sender I to receiver J. We also get the size of the interval or what we call the time window, which is W. And we also get the fixed switching cost. So every time we change a matching, there's an interval of delta time that nothing can be sent in that time. So before we formally define the objective, let's see what the solution is, or in other words, what is the schedule in this case. So this is our time window from 0 to W. And let's say we first choose matching M1, you can see it here, and we wish to send data over this matching for duration of alpha one time. And once we're done with that, we need to pay the fixed switching cost or delta time. And then we can choose any other matching we like. Let's say for example M2, so which is a different matching. And let's say we wish to keep this matching, transmitting over this matching for a duration of alpha two. And then again we need to pay the fixed delta switching cost, choose a different matching, pay delta, and so on and on. But we cannot go beyond the interval or W time on total. So this is what the schedule is. So in other words, the schedule is just a collection of pairs, or you might call these configurations. And a pair is a matching, let's say ML, and a duration alpha L. And a matching, sorry, a schedule is feasible if the total durations plus the time it takes to switch doesn't go beyond W. So this is when a schedule is called feasible. And the goal is to maximize the total transmitted data. So think of a specific edge, let's say you can think of AMD as a complete bipartite graph. So let's say edge ij. So this tells you the total duration that are matchings that contain the specific edge were chosen, right? So this is the total alpha L of all matchings ML that contain the specific edge. But it can happen that for some edges this duration is more than the demand. So in this case we actually cap the total transmitted data by the demand because we cannot send over a specific edge more than the demand. So this is essentially a truncated linear function. And this is our goal. One interesting point to note is that this function, if you think of it as a set function over all these configurations, is actually a similar function, a monotone similar function. So what is known about this problem? So first of all, Leon Humdy actually proved that this problem is NP-hard. And they actually proved that the different problem is NP-hard. They proved that the problem of minimizing the total time that is needed in order to satisfy all the demands, that is NP-hard. But once that problem is NP-hard, this immediately implies that our maximization problem is NP-hard. Otherwise, if this is in P, then we can solve that problem as well. So we know that probably we cannot get an exact solution. So recently, Borja, Alizade and Vishwanath had several interesting observations about the maximization problem. The first observation, as some of you might have guessed already, is that this problem is actually a special case or it's captured by what is called the samolar knapsack problem. So in the samolar knapsack problem, we're giving a universe of elements and a monotone samolar function over the elements and sizes to each of the elements. And the goal is to pack elements whose total size doesn't go beyond the size of the knapsack and maximizes the samolar function. And in our case, all these configurations are actually the elements. So this is the first observation they had. And the second observation they had, that if all the demands are very small, and by small, I mean they're at most, let's say, an epsilon fraction of the entire interval, then the greedy algorithm gives you almost one minus one over e approximation. And this is kind of interesting because the greedy algorithm actually doesn't work for the samolar and knapsack problem. But here, they had the extra assumption that actually the demands are very, very small. Okay? So another thing that will be important to today's talk is what is known in general for the samolar knapsack problem. So this problem, to some extent, has been resolved. All these, let's say, four algorithms that are written here essentially give you a tight one minus one over e approximation. So there's a classic work of a cooler most and all, which was immediately extended afterwards by Sviridenko, that actually showed that if you enumerate, let's say, over most three elements in the ground set, and then extend that, greedily, then this gives you a tight approximation of one minus one over e. This line of works, and actually there are more works, but these are just, let's say, the more well-known works in this area of Badmidiur, Vondrak, and Neneguyen, they aim to find test algorithms that also provide a tight approximation of one minus one over e. The problem with all these, at least in our setting, is that all these algorithms, including these test ones, require some sort of enumeration over elements from the ground set, or in other words, they need to know at least one or two elements that are in the optimal solution. So what can we show about the problem? So we can show that for every constant epsilon, there's a polytime algorithm that gives you an approximation of one minus one over e minus epsilon for the circuit switch problem. Again, for now, we're only dealing with the offline case. So what is our approach? So first, let's start with what is wrong with using or what is hard with using the known techniques. So the first difficulty I mentioned is that if you look at some of our knapsack, then the ground set has exponential size, even if you enumerate over all the durations, because the number of matchings, this is a complete bipartite graph, is exponential. So all of these algorithms on all their algorithmic, let's say, techniques actually fail, because if we need to guess or enumerate at least one element that is not or two, this cannot be done in polytime. The second approach is that, okay, someone can say, okay, this is some other knapsack, but we have a very specific some other function. It's not a general one. And the function inside just truncated linear, right? So for every edge, we have a linear function and we truncate it at the demand. So maybe we can write an LP around this directly, but at least there are two natural ways of formulating linear program relaxations for the offline problem and both have an unbounded integrated gap. And the reason is that both of these relaxations can actually cheat by smearing the solution and actually ignoring delta, the switching cost. Because without the switching cost, if delta is zero, then the problem is easy. So the first relaxation actually finds a distribution over matching for every time step. And the other one is more directly related to the knapsack problem where we choose these configurations that satisfy the knapsack constraint, meaning that the total, the sum of the alpha plus delta for all these, the chosen configuration times the CW, but all of these actually fail. So what we do is actually look at how large the switching cost is. So if the switching cost is small, let's say not more than an epsilon fraction of the entire window, we can show that greedy, the simple greedy can be analyzed in a better way than it usually is and we can get the required approximation. And if the switching cost is large, more than an epsilon fraction of the window, then in this case, the first thing you can note is that the number of matchings is at most one of our epsilon, right, the number of configurations you choose and epsilon is fixed. So there are a fixed number of matchings. This is not enough because even if we know that the optimal solution has a fixed number of matchings, we still cannot enumerate over those because there's an exponential number of matchings. But instead we show that you can actually enumerate over the durations of the matchings of the configurations. And once you know the durations, then the problem becomes very easy to tackle. Okay, so I'll show you these two ingredients. So what happens when the switching cost is small, let's say it's most epsilon fraction of the entire time window. So what is the, so I said that the greedy can be analyzed better in this case. So what is greedy here? So we start with an empty solution, there are no pairs we chose. So as long as we did not exceed that time, we choose the best element and an element is a pair of a matching and a duration. So how much do we gain if we choose M alpha? So for every edge in the matching, we gain the duration, but we need to cap it at how much demand is still remains for that edge, right? We cannot exceed the total demand. So if RJ is what you might call the residual demand, how much demand, how much unsatisfied demand there still is for this edge, then this is how much we gain for that edge. And we divide it by the size in the net stack or in other words, the duration, which is alpha plus delta. Okay, so this is what the greedy algorithm implies for this problem for circuit switching. And it was shown by Bodger et al in the same work that actually solving this is easy, finding the best configuration that can be solved exactly and it just reduces to maximum weight matching. Okay, so the question is, are we done? So we have greedy, easy to well define. So the answer is not, is that greedy is not well defined yet. So the question is, what happens when the last configuration we choose that's called MR with duration of R exceeds W? Okay, so let's see how that can happen. So let's say we chose R minus one configuration so far. And now we choose this configuration, as you can see the duration of R plus delta exceeds W. So in this case, we use another property of the problem, a very specific one, is that in session we can inflate or deflate elements. So this is not a general net stack problem. So we just in this case deflate the duration of the earth matching. So once we add the switching cost delta, we end exactly at W. Okay, another thing that can happen, actually the only other thing that can happen is if we entered here and we add the earth matching, we still exceed the W, but there's no way to fix this, right? Because W on its own is enough to cause the violation of the length of the time interval. So in this case, we just throw away the earth element. And throwing this throwing away of elements actually is what ruins the greedy algorithm for the net stack problem. But we still do this anyway. And so what we can prove, so for abbreviation, if we instead of writing that expression, I'll just say that f of the schedule S is the value of the schedule. What we can prove is that the value of the greedy algorithm is at least one minus one over E minus, think of this as order epsilon factor of the optimal solution. So always some optimal solution. And how do we prove this? So we prove this by using what you might call resource augmentation or actually looking at optimal solutions that have smaller time interval allowed. So the first step is to show that the greedy is actually very good compared to the optimal solution on a smaller time window, smaller by how much by an additive delta. So we can just remove one delta out of the allowed time and look at the optimal solution for the reduced interval, then greedy works very well, gives the one minus one over E. So this is what you might call O tilde. But the question is how good this O tilde is. And you can prove that O tilde is almost as good as the optimal solution. And actually the way we prove this, it actually works not only for optimal solutions, take any interval S for W and let's say S tilde is the optimal value for the truncated interval, sorry, it's truncation of S, then you don't lose more than a factor of one minus two delta over W. So if delta is small, this actually is close to one. This is what happens. So when you plug these two things together, you get this result. So this takes care of the first case where the switching cost is small. What happens when the switching cost is large? So in this case, because let's say K, the number of configurations we have in the optimal solution, is at most one over epsilon, which is a constant, we can enumerate over the durations. So let's say the optimal durations are alpha star one up to alpha star K. So how do we handle the fact that we know the durations, but we have no idea what the matchings are. So now that we know the durations, I'll show that it's very easy to write a linear programming relaxation. And then we're just going to use randomized rounding. And the relaxation, what it does, it assigns for every duration instead of a single matching distribution over matchings. So what will be the relaxation? So L is the index of the durations or the number of configurations we have in the optimal solution. So for every, let's say for the Lth duration, we have something that is at most a distribution. If this is strictly smaller than one, then think that with the remainder, the one minus the sum, we return the empty matching. So there's a distribution for every duration alpha star L. And now we just need to see what is the contribution of an edge ij from center i to receiver j with respect to this distribution. So first of all, it cannot exceed the demand. And here it cannot exceed. We go over all, essentially all matchings, over all distributions for all durations, the probability of choosing that matching that contains the edge ij and the actual duration which we guessed, which is a constant now. So this is why we can formulate this linear programming relaxation because these, the durations are known, are not variables. So this is it. This is a very simple relaxation. We just minimize the total contribution of all the edges, maximize, sorry, the total contribution of the edges. And so one thing that is left before we get to the programming is how we solve this because there's an exponential number of variables. So what you do is you just write the dual and see that the dual separation we're called is maximum weight matching. So this can be solved precisely using, for example, the ellipsoid algorithm. So once we have the distributions for every duration, alpha star l, from this relaxation, we use randomized rounding, which means that for every duration, we pick matching m with probability exactly what the distribution from the relaxation dictated. Okay, so how do we analyze this? So we need to lower bound this expectation. And the expectation is the contribution of the edge ij, which is, let's say if i is the indicator that matching m was chosen for duration l, right? So I sum over all, all matchings in this duration that contain the edge and their duration. Right? So this is the total time that was transmitted, which is random variable over edge ij, and we capited the demand. The issue here is that we want expectation of a minimum of two things. And the min function is a concave function. So this will give us, if we change the order of the mean and the expectation, this will actually work in the other way than what we want. So the question is, how do we analyze this? So you can analyze. This is a very nice exercise, but at some point we figure out that this was already done. So we're not sure this is even the first place this was done, but this gives you exactly what you want. You have a very specific concave function, and you want to lower bound its expectation as opposed to upper bounding its expectation. And what do I mean by that? I just want to switch the expectation and the mean function. So this, for example, was done by Andelman and Mantour. This is a paper about some AGT problem. So if you have N indicators, independent indicators, some the capping parameter B and B, capital B and B want to be in the non-negative coefficients, everything here is non-negative. So I want to lower bound the expectation of the linear function of the indicators, capped to B. All I want to do is actually change the order of the mean and expectation. So you can do this with the loss of 1 minus 1 over E. But this is a very nice exercise. For example, it was done here. So once you get that, we're done. We show that even in the case where the switching cost is large, we get an approximation of 1 minus 1 over E. Okay, so this handles the offline case. So the title of the talk promised something also about the online case. So let's see what it means online circuit switch scheduling. So with every time T, a new demand matrix, let's call it DT, is revealed and is added to all the unsatisfied demand we had before, right? And once this demand matrix is revealed, we choose a matching and translate it for some period of time. But we, unless we're already in this, let's say, delta time interval that nothing can be transmitted because we already switched a matching, okay? So for simplicity of presentation, and again this can be done without loss of generality, let's assume that time is discrete. So it's one, two, three up to some capital T, and that the demands are also integral, okay? So if the demands are integral, we can actually view the demand matrix as a multi-graph, a bipartite multi-graph, okay? So let's see what that means. And we'll see what that means. Let's say as a thought experiment it will be easy to see this in the online case where there's no switching cost, okay? So these are, let's say, the senders and these are the receivers as before, and say the first demand matrix we get is this. So for example, you see that the demand from sending from this sender to this receiver, there's a demand of three because there are three parallel edges. And here, for example, sending from this sender to this receiver, there's a demand of two, and here a demand of one, and there's also demands of zero, etc. So we view the demand matrix as a multi-graph, a bipartite multi-graph. So first of all we add it to the unsatisfied demand wave so far, so there are no demands, so this is what we have so far. So intuitively, if there's no switching cost, let's say delta is zero, what seems like the best thing to do is to choose the largest matching, right? So we choose the largest matching, let's say, these red edges, and we for, because time is discrete, we broadcast over these, over this matching for a single step, so we remove one edge. So every edge in the matching is actually removed. What happens now with t equals two? There's a new demand matrix, d2, let's say it looks like this. So we add it to the unsatisfied demand, so this is what we have so far, and again we choose a matching, let's say the red edges, and because time is discrete, we remove the chosen edges, and there's d3, and we add it, there's a matching, and so on and on and on. So this intuitively seems what you should do when there's no switching cost. What you can actually prove is that when there's no switching cost, this greedy online algorithm gives you a competitive ratio of half, okay? And the proof of this, which I won't go exactly into the details, actually relies on the fact that the offline version, when there's no switching cost, is easy. You can solve it exactly. So the question is, at least in the way we prove things, it seems like half is like a bound we cannot go beyond, get something better than that. So this raises the question, is there a connection between, say, the online problem and the offline problem? And what we can actually prove is that you can reduce the online problem to the offline problem, even in a black box manner. So what does it mean? So it means that if you have an offline beta approximation for the offline problem, when you choose some integer, say k, which needs to be at least three, then there is an online algorithm that gives you an approximation like this. The point is that this algorithm uses a window which is slightly larger than the W allowed. We deviate by an additive k times delta. One thing that is important to note about this is that this, let's say reduction, is poly time. So if this beta approximation runs in poly time, you actually get a poly time algorithm, an online algorithm, which is not always the case, because sometimes when in the online setting you just want to handle unknown input and you don't care about the running time of the algorithm. So if you actually have a poly time offline algorithm, you get a poly time online algorithm with this guarantee. And another thing to note, what happens, let's say, if we don't care about the running time, so offline we can always solve the exponential time. So beta equals one, right? So if beta equals one, you get something that is very close to half, essentially, right? Then this is the half we got when there was no switching cost. So if you plug in the result we have for the offline case, we get a competitive ratio which is roughly e minus one over two e minus one. There is some epsilon loss and we deviate, additively deviate from W by time of order delta over epsilon. And another thing that we can show, and that's not very hard, is that one can ask, okay, this is cheating. You gave the online algorithm more power, a larger window. The fact is that this is unavoidable. So there's no non-bi criteria online algorithm with finite competitive ratio, right? So if you want any finite competitive ratio, you must deviate from the time window W, okay? So I don't know, again, if there's some lower bound on the size of the deviation, but there must be some deviation in the window size, otherwise there's no finite competitive ratio. So this in some sense justifies this. Okay, so a few interesting open questions. So there are more questions than this, but I think these two questions are very interesting. So the first one actually relates to the hardness of the offline problem. So I mentioned to you that Somolar knapsack has a tight 1 minus 1 over e approximation, tight in the sense that there's a hardness result. Our problem in some sense is a special case of that, right? There's a specific Somolar function and we can also inflate and deflate the elements. We don't know whether the hardness result of the Somolar knapsack applies to our special case. One issue here which makes this problem interesting is that the hardness of Somolar optimization problems is usually information-based and we're talking about the offline case. So maybe we need to translate this to some complexity or something it's not clear. So these are like two incompatible things in some sense. And the second interesting question is it seems that even if you have beta equals one, at least in our method, we cannot break the half barrier for the competitive ratio. And I think it will be very interesting to show that there's actually better algorithm in the sense that probably you won't reduce the online problem to the offline one, but then you gain by showing an approximation better than half. So thank you. So what is the delta of the function of how different the two languages are for each other? Actually that's a good question. I don't know. The reason delta is fixed is because I didn't go over all the previous works and there are like also practical works that deal with this from people that actually work on data centers, people that say publish in sitcom and things like that. For them delta is fixed and they have like physical reasons like that. But from a theoretical point of view it's very interesting. Actually maybe at some point I should talk with those people and see if it makes sense in terms of the application. But I have no idea what happens when the switching time depends on let's say some distance measure between the two matchings. Thank you.