 I think we're good. Welcome everyone for this next edition of TCS Plus. We're very happy to have Eric Balkanski from Harvard today. Before I introduce Eric, let me first thank the organizers that are helping out behind the scenes to bring you TCS Plus. So let's come on and on. And then your day who's with us today, Gautam Kamat, Ilya Reshenstein and Odette Regev. So let me continue and go around the table so that everyone who's here today can say hi. So first of all, we have Andreas with the group from Hasok Latner Institute. So welcome everyone. And India is joining us from Northwestern. Then Santiago is joining us from EAFIT. Welcome. Then we have Ben Lee from Caltech and finally seeing you on a nice group from Toronto enjoying lunch over there, I guess. So bon appetit. And all right. So let me also just say that there's going to be one more TCS Plus this semester. In two weeks from now, Julia Chuzoi from TTIC will be our last speaker for the fall. And so today we're happy to have Eric Balkanski give the talk. So Eric is a PhD student with Yaron Singer at Harvard. And he's done work generally in machine learning and algorithms. He's done a series of works on the sample complexity of optimization algorithms. And today he's going to tell us about adaptive complexity of maximizing submodular function. So thanks Eric. Yeah. Hi everyone. Thank you for the introduction and the invitation to speak at TCS Plus. So yes, I'm going to talk about the adaptive complexity of maximizing a submodular function. And this is going to work with my advisor Yaron Singer. And through the talk, feel free to interrupt me at any time with questions. I'm really happy to take any. So the standard algorithm for submodular optimization is a simple greedy algorithm. And I would like to start this talk by illustrating this algorithm with the example of maximum coverage, which is a special case of submodular function. So an instance of maximum coverage can be represented as a bipartite graph with top nodes and bottom nodes. And the goal is to pick the top K nodes for some integer K that covered the largest number of bottom nodes. So for example, this node covers one bottom node and these two nodes cover three bottom nodes. And so what the greedy algorithm does is that it has a current solution that every iteration is going to add to the solution the one element with the largest margin of contribution to the solution. So in the case of maximum coverage, what that means is that at every iteration, the greedy algorithm is going to add the one top node that covers the largest number of bottom nodes not yet covered. So for example, with this maximum coverage instance, what the greedy algorithm would do is first pick this one node that covers five bottom nodes, then this second node that covers three new nodes, and then finally this node that covers one new node. And so good algorithm has strong performance guarantees. But I want to argue that one main drawback of the good algorithm is that it is highly sequential. So just for Eric, like for someone who hasn't seen this problem before, so what's the guarantee, I mean, is this? Yeah, so I will mention that after, but the maximum coverage is especially an instance of a similar optimization. And we know that the greedy algorithm obtains the one minus one over approximation for maximizing a similar function under quality constraint K. Okay, thanks. Yeah. So we know it has good guarantees. So one minus one over multiplicative approximation compared to the optimal solution. But I want to argue that the main drawback is that it has it's highly sequential. So the one main takeaway of this illustration is that when you observe that at every iteration, the node that was picked crucially dependent on the node that was picked at the previous iteration. And so that's why we say that greedy has K adaptive steps for picking the K best for picking the K best elements. Okay. And so in this talk, I'm going to be interested in this tradeoff between rounds of adaptivity and performance of an algorithm, which is going to be measured by its multiplicative approximation compared to the optimal solution. Okay. And so for for the greedy algorithm, we just saw that it's a K adaptive way. If we want where we have a quality constraint K, so for picking the K best elements. So this can be linear and N. So the size of the problem size of the ground set of elements. And again, it's well known that the greedy algorithm obtains the one minus one over approximation for maximizing a monotone submodular function and the quality constraints. And we also know that this approximation is the best possible for any point in your time algorithm. Okay. And so now let me make this concept of adaptivity more formal. So we're going to see that an algorithm is our adaptive. If it makes our sequential rounds, we're in each round, the algorithm may evaluate multiple queries in parallel. Okay. So what it means is that there's going to be some, we're going to have Oracle access to some function F that we wish to optimize. And for the meaning of this curve, we just made a set function. So given some set of elements, it's going to return the value of these elements. And so then the algorithm is going to proceed in rounds, where in each round, the algorithm may query, let's say, most polynomially many different sets. And so these are going to be simultaneous queries that are going to be evaluated in parallel by the value or call. And then your call will return to the algorithm, the value of these sets. And then based on the value of these sets, the algorithm can adaptively query a new batch of sets. So submit, ask again for point in many sets and the algorithm returns its values. And then the algorithm, and then it's going to proceed into one like this. And then an algorithm is our adaptive if it makes our rounds of queries. Okay. So just one more question. So the value of the set is the number of elements not that the set covers, or that haven't been covered yet that the set covers? Yeah, good. So this is, we're going to consider general function F, but for the case of maximum coverage, here a set will consider a collection of top nodes. And then the function will return just the number of bottom nodes that are covered by the sets. Okay. That's the question. Okay. And so this concept of that activity is important because it's going to correspond to the parallel runtime of an algorithm when an algorithm can evaluate multiple queries in parallel. Okay. So again, do multiple function, it can execute multiple function evaluation in parallel. So Eric, you're not putting any constraints on the number of queries made in one particular round? Good. Yeah. So there's also this question of like how many queries I need. And so the main question I'm going to be interested is like the adaptivity versus the approximation guarantee. So I'm just going to allow probably many queries per round. But it's true that there's also like a question of how many queries I do in every round. And for now, we're just going to allow probably many per rounds. Good. And so now we're going to say that the adaptive complexity of a problem is going to be the minimum number of round, such that there exists an R adaptive algorithm that obtains a constant factor approximation. So what we're saying is that we want to obtain a constant factor approximation and we want to understand how many rounds we need to obtain a constant factor approximation. Okay. So a related line of work, related to adaptivity that's recent is on optimization from samples. And that's at the intersection of running and optimization. And so here the high level motivation is that we want to optimize some function, but the true underlying function is unknown. And instead, we only have access to some erroneous function that has been learned from sample data. And what we would like to know is what kind of optimization guarantees we can have when we optimize such an erroneous function that has been learned from data. Okay. And so to answer this question, we have the optimization from samples model that is as follow. The underlying function f that we want to optimize is unknown. And we're given the input is going to be the same as in learning. So it's going to be some sample data. And so what is a sample? A sample is a set S drawn from a distribution and its value according to the function. And we're going to be given probably many sets drawn ID from some distribution D. And so this is similar as in PAC OP Mac learning. And then the output is going to be the same as in optimization. We're going to want to find a set S that maximize the function under some constraints. Okay. So this is optimization from samples. And so the main result for sample is actually an impossibility results. It says that it's actually impossible to get a good approximation for some modular maximization when giving samples drawn from a distribution. So to make this a bit more formally, it says that there's no algorithm that can obtain an approximation better than n to the one negative one fourth. And we would like to get a constant factor approximation. And this is for maximizing a monotone sub-modular function under continuity constraints when given probably many samples drawn from any distribution. Okay. The quantifiers are a little bit ambiguous. So what do you mean by from any distribution? You mean that from for any fixed distribution that is known to the algorithm, the algorithm does not exist or for every algorithm there's distribution? Yeah. So what it means is that I'm going to let you pick a distribution of your choice. And for any distribution that you might pick, it's impossible to be able to optimize the family of some modular function for this distribution. But the algorithm can depend on distribution. So the algorithm will be given samples on this distribution. It knows the distribution. Yeah, it can know the distribution. Good. And so this impossibility results, it also, we also have similar possibility results for the case of like maximum coverage, which I mentioned at the beginning, which is a special case of some other optimization. Also for convex optimization. And for some other minimization. And so these results, they are so whole for like some of the functions that are actually pack learnable. So a first thing with an application of this result is that there are some functions that given sample data are learnable. And when given the exact functions, they're optimizable, but they cannot be optimized when giving poly many samples. So this should be a bit surprising. We would expect that a function that is both learnable and optimizable should maybe be optimizable from samples. Okay. And then the second implication is that it's, we can actually get in general, we can get no desirable guarantee for optimization of problems in P when the objective is learned from poly many samples from distribution, even when this function is pack or PMAC learnable. Okay. So these are strong impossibility results. And I'd like to argue that the main reason for this impossibility results is because these optimization from samples algorithm on non-adaptive algorithms. And what I mean by that is that the algorithm only we see that input poly many non-adaptive samples. And so immediately algorithm can look at these samples, learn something about the function, but then they cannot add it adaptively asks for additional information to about the function cannot query additional sets based on the value of the sample that has observed. Okay. And so if we come back to our original question of like what is this relationship between adaptivity and approximation, the results about Australian from samples imply the following that we cannot get a constant factor approximation in one round. So with non-adaptive algorithm. So these impossibility results for algorithm that are given as input poly many samples also extend for any algorithm that can query poly many non-adaptive sets in one round. So for any non-adaptive algorithm. Okay. So we know we can get a good approach so constant approximation with linearly many rounds. We know that it's impossible in one round. But until very recently, we didn't know anything of what was happening between these two points. And in particular, every other known existing constant factor approximation for maximizing a monotone submodular function under any constraints had adaptivity at least linear adaptivity. At least linear adaptivity. Okay. And so this is going to be the main question we're going to want to ask. What is the adaptive complexity of maximizing a submodular functions? How many rounds do we need to obtain a constant factor approximation for submodular optimization? Okay. So before I dive into the main result, are there any questions about the main question in the setup? Okay. Good. So I'm going to present the main result. Actually, sorry, I do have one question. I'm just slightly confused as to the when is it that you have enough information to solve the problem. It's still thinking about the original set cover, like top set or whatever thing that you had. So just to make sure. So when you make a query, you query sets at the top and you returned the number of elements at the bottom that they cover, but not which elements they cover. So if you query all possible sets, that's enough to solve the problem. Good. So yeah. So one thing that we could do is just like query all of the sets in one round in parallel. And then we could just pick the best one. But that would be inefficient because there's exponentially many top sets and we're just going to restrict to like having efficient algorithm and we allow and most probably many queries per round. Yeah. So this previous slide that you showed us that said there exists no algorithm, you meant there exists no efficient algorithm. That's exactly. Yeah. Okay. Yeah. Okay. So thanks. So no, no efficient algorithm. And here, yeah, we want to ask for this case where we allow probably many queries per round. Good. Okay. So the main results. So the main result is that we showed that the adaptive complexity of maximizing a monotone sub-modular function under the constraint is apt to lower the term log n. So this main result, it consists of two major parts. So the first part is a new algorithm that obtained a constant factor approximation in log n rounds. And the second part is a harness result that shows that it's impossible to do, to obtain this in almost in less, in less rounds. So what it says is that it's impossible to get a one over log n approximation, approximation in log n over log log n rounds. Okay. So to make these results more formal, we first show that there exists a log n adaptive algorithm that obtains with high probability, an approximation of habitually close to one third. And we're also sure that there's no log n of a log log n adaptive algorithm that obtains even with low probability, a one over log n approximation. Okay. And so the, the main implication, the main takeaway of this result, I think is that this algorithm, it gives us an exponential improvement in the adaptivity. So in the power runtime over any previous constant factor approximation algorithm for some modular maximization. Okay. So if we're in the setting where we can evaluate, if we can do function, if we can execute function evaluation in parallel, then we can now optimize of some, we can now do some automation exponentially faster. So importantly, the, the algorithm, it relies on a new, on a new technique that we're going to call adaptive sampling. And so let me tell you a bit more about this new technique. So if we come back to the impossibility result I previously mentioned, what this result said is that it's impossible to get an end to the negative one for type of summation, giving poorly many non adaptive samples drawn from any distribution. Okay. But then given this impossibility results, one, similarly as in active learning, one might ask what happens if I can get multiple batches of sample. So get one batch of sample from some distribution, then let the algorithm design some distribution and then get another batch of samples. Okay. And so that's exactly what the algorithm is going to do. What the adaptive sampling technique does is that at every round, based on the previous sample, so the values of previous sets observed, we're going to update a new distribution and sample sets on this distribution. So we're going to have one distribution at every, at every round. And so I'm going to denote the marginal probability of each element being in a set drawn from this distribution by p1 through pn for each element a1 through an. Okay. And so what's the main idea here is that if we look at the space of potential solutions, in the first round, we don't have any information. So what we're going to do is just like the best thing we can do is just to evenly sample the space of feasible solution. So here you can imagine that this red cross is the optimal solution. And then based on these samples, we can slowly zoom in towards regions that have high-valued sets. And so at every iteration, we can have a distribution that's a bit more precise and slowly zooms in to the optimal solution. Okay. And so now phrasing this previous result of an adaptivity in terms of like sampling, what it's saying is that we roughly need log n batches of adaptive samples to approximately learn to optimize some modular functions. Okay. So this is going to be the outline for the remaining of the talk. So I'm going to spend most of the talk about like so presenting this adaptive sampling algorithm that obtains this constant factor approximation in log n rounds. Then I will mention the lower bound, so the almost tight lower bound. And then I will show some experiments that show that adaptive sampling is a technique that can also be used in practice. And then I will mention there's been several recent results about adaptivity and some modular optimization that I will mention at the end of this talk. Okay. But before I move on, are there any questions about the main result? Okay. Good. So I'm going to move on to present the algorithm. So I haven't yet formally defined some modular functions. So let me start by doing this. So set function f over ground set of element n is some modular, if it satisfies the property of diminishing returns. I think I'm sorry, there's a quick question. Andreas is asking if the algorithm that you're going to give us for the unconstrained problem, or there's a set cardinality constraint. Good. It's going to be for the case of cardinality constraints. So this main result is for cardinality constraints. And at the end of the talk, I will mention recent results for both the unconstrained and different types of constraints as well. Thanks. Yeah. Good. So there's some other functions or these functions that satisfy this diminishing returns property. And so what this property says is that the marginal contribution of an element a to a set s, so that's the value added by a, an element a when I add it to a set s, then the contribution have to be diminishing. So what that means is that as the set s is going to grow, the contribution are going to be decreasing. Okay. And so these are similar functions. And so the, the canonical problem for in some other optimization is maximizing a monotone, some other function under a matriot constraint aim. So that's like the, the, the general setting where we know how to obtain good approximations. And so I mentioned that the main result is for cardinality constraints. So that's a special case of like a uniform matriot. And I will come back to the general case of matriots at the end of the talk. And so, even though there was not much that is known about the activity for this problem of some modular maximization, that's in construct, that's in contrast to like a line of work on distributed similar optimization in matriot style settings. So here there's been a lot of work recently about this. And here the, the motivation is related, but different than the activity. So whereas adaptivity is more studies like the, the change of sequentiality, the distributed setting addresses like the issue when the, the function and its elements are too large, too large to fit on a single machine and has to be distributed over multiple machines. Okay. Okay. So these are some of the functions and we're going to be looking at this problem of monotone, some modular maximization. So as I mentioned, the algorithm, it's relies on this idea of adaptive sampling and in particular it's going to depend on two simple sampling primitives. And I'm going to start by presenting the sampling permitted at a high level. And then I will give some more formal descriptions of the sampling primitives. Okay. So what are these two sampling primitives at a high level? So first we're going to have down sampling. And so what done something that is the following. So the first, the distribution of the first round, it's going to include each element in a set from distribution with equal probability, because we don't have any information. And then given the first batch of samples, what it's going to do is that it's going to identify bad elements. So these are elements that we want to, they want to discard and ignore in the future. And so we're going to set the motion probability for future distributions to be zero. So we want some, we're going to ignore them and not sample them anymore. And so now we have a new distribution and we may sample again and find additional bad elements that we will discard for the future. And it's going to, and so down sampling, which is iterating at every round, discard some bad elements. And so with this approach, we can get an algorithm with log n adaptivity, but it's impossible to get better than a one of a log n approximation with this approach. Okay. So we're happy with the adaptivity that this approach gives us, but that was the approximation. So then on the other hand, there's up sampling, which really can be thought as like the opposite of down sampling, but every iteration, instead of finding bad elements, we're going to identify good elements. And so these good elements, they're going to be added to our, to a current solution and added in all of the samples in the future. And so then we first find some good elements, and then again, at every iteration, we're going to find some new good elements that we're going to add to the current solution and so on. And so in particular, if you look at the special case of up sampling, where at every round, we find the one best elements, then that's exactly the greedy algorithm at every round, find the best element and add it to the current solution. And so it's possible to get a constant approximation with this approach. But if you want to get the constant approximation, this requires linear adaptivity. So here we were happy with the approximation, here we were happy with the approximation, but unhappy with the adaptivity. And so what's going to be the main idea behind the main algorithm, so the adaptable sampling algorithm, is that it's going to appropriately combine these two sampling variables, down sampling and up sampling. So at every iteration, depending on the context, it's either going to up sample and add good elements to the solution, or it's going to down sample. And this calls element from further consideration. And by combining these two primitives, we can get the best of both worlds. We can get logarithmic adaptivity and constant approximation. So now let me make all of this a bit more precise. So let me first precisely define the down sampling primitive. So we're going to have some threshold T and a constant C that's a little bit larger than one. And we're going to have a set X. These are the sets of elements that are not yet discarded. So they're called the surviving elements. That's initially the ground set, so all of the elements. And our distribution D that we're going to have every iteration is going to be the uniform distribution of all subsets of surviving elements of size K. So here again, K is the continuity constraints. We want to find a solution of size K. And so we're going to have this distribution. And then what we're going to do is just some of the distributions that we're going to identify the bad elements that we're going to remove. So what are these bad elements that we're going to remove? These are going to be elements that have a low contribution to a random set drawn from D. Okay. And so precisely, these are all of the elements that have module contribution to a random set drawn from D that is at most the threshold T divided by K since we're sampling sets of size K. And we're going to slightly higher this a little bit with the constant C. Okay. And so we're going to do that until we there's either less than K surviving elements. Or until we have a random set of distribution that has value at least T. So once we have a random set that has high value, we will just return this random set. Okay. So this was done sampling. So we want to show two things about done sampling. We want to show that it has, we want to show its adaptivity and its approximation. So the main benefit of done sampling is it has this logarithmic adaptivity. And so to show that it has this logarithmic adaptivity, this relies on this main lemma that shows that at every round, the algorithm is going to discard the constant fraction of the surviving elements. So if at every round, I can remove, I can discard the constant fraction of the surviving elements, then after logarithmically many rounds, I will have at most K elements remaining and then the algorithm will terminate. And so what are the main steps for the proof of this main lemma? So it's going to explore that at every iteration, we know that a random set has value at most T. Otherwise the algorithm will have returned a random set. And so what this low value of a random set implies by exploiting superiority is that in average, a surviving element is going to have contribution to a random set. That's at most T over K. So this is because a random set has K elements and we know it has value at most T. So in average, the elements can have contribution at most T over K. And so now why is this useful? We know that by definition, the algorithm is going to discard elements that have contribution to a random set at most C times T over K for some constant C. Okay. And so if in average, the elements have contribution at most T over K, then that means that there has to be at least a constant fraction of them that have contribution at most C times T over K if C is a constant greater than one. Sorry, just a quick question. Why does that not discard a lot of elements? So I would have expected that this T by K on the last line would be the actual value that you get on the previous line instead of just the upper bound. So why couldn't every element A satisfy the last constraint? Good. So this slide is about showing that we have good adaptivity. So if every element was below the threshold, then we would just remove all of the elements and be done in one round and be happy. And so the next slide I will mention why we get a good approximation. So why we don't remove all of the elements. Okay. But right now, I just want to argue that we remove sufficiently many elements at every round. Okay. Is that answered your question? Oh, it'll be answered on the next slide, I think. Okay. Yeah. So that's true. There's a straight up where we both want to remove a lot of elements to have an adaptivity. But we also don't want to remove too many elements because then we will have a bad solution. And so here I'm arguing that we remove sufficiently many elements to have the correct mix activity. And now I'm going to argue why we can also get a good approximation. Okay. And so the main lemma to show that we get a good approximation is that at every round, if we look at the optimal solution O, so this is the set of size K that has the highest value. And we look at which are the optimal elements that were discarded as a round, then we're going to be able to bound the value of these optimal discarded elements. Okay. And so this bound is going to be one plus this constant C times the threshold T. And so what this is saying is that if we pick a threshold T that is at most roughly like one over log N, then after logarithmically many rounds, this lemma will guarantee that there is some optimal surviving elements that have still value, meaning that after logarithmic many rounds, there will still be some optimal elements remaining that have high value. And this will be returned by the algorithm and we will get that value. So here what I just want to say is that the value that we lose from the optimal elements at every round is bounded when we discard elements. Okay. And so what are the main steps and the high level of a view of this main lemma? So first by similarity, we can bound the value of the optimal discarded elements by the expected value of a random set plus the sum of the contributions of these optimal discarded elements to a random set. Okay. And now we can bound the value by a random set by the threshold T. Again, this is by the algorithm because if a random set has value more than T, then the algorithm returns this set. And so then we want to bound the contribution of the discarded optimal elements. So we know that the algorithm is going to discard elements that have low contributions, so they have contribution at most C times T over K. But we know that the optimal set is a set of size K. And so what it implies is that there's at most K optimal elements that are discarded at any round. And so if there's at most K elements at every round, the sum of the contributions of these optimal discarded elements is at most C times T. Okay. And so now we've bound each of these terms. And we get this one plus T over T. And so as I mentioned before, for this to imply a good approximation with the surviving elements at the end, we have to pick T to be roughly up to the log N. So up is the value of the optimal solution. But now if we need, if we pay the threshold T to be up to the log N, whenever a random set has value at least T, it is returned. So then we would get one of a log N with the random set returned. So this is why this algorithm gets a one of a log N approximation. Okay. And Eric, when you say log N, you actually mean log N or you mean log K? So for the previous slide, it was log K. I think the number of iterations that you needed. And now. So this is actually going to be log N because we're going to be removing a constant. So K is the number of elements we want to pick. And the approximation is log N, but the adeptivity, is that not log K? No, it's going to be log N because so we have a ground set of N elements. And K is the number of elements we want to pick. And we're going to be removing a constant fraction of element from the ground set of elements. That's initially N. Okay. Thanks. Yeah. So this is for done sampling. So now let me do some more formally up sampling. So up sampling, instead of maintaining a set of surviving elements that have an NB discarded, it's going to maintain a set, a current solution where we add elements to it. And so up sampling is going to proceed in our rounds for some number R. And here the distribution is going to be as follow. It's going to be the uniform distribution of R subsets of elements that have not been added to the current solution of size K over R. So now we're not sampling sets of size K anymore, but sets of size K over R. And what we're going to do is just add a random subset of high value to the current solution at every round. And so what is this random subset that we're going to sample multiple sets? And we're just going to look at this. We're going to look pick the sample that has the highest contribution to the current solution. And so over all of the sample, we pick this, the sample has the highest contribution to the solution and add it to the solution S. Okay. And so at each of our rounds, we're going to add K over R elements to the solution. We're going to finish with K elements and we're going to return the solution S. And again, so if this here, if we have R, which is K, if R is K, we're going to have K rounds and at every round, we're going to pick one element. So the one best element, this is going to be the greedy algorithm. And so in fact, if you want to get a constant approximation, and if you want to get a constant approximation with this algorithm, we need linear adaptivity. Okay. So this is app sampling. So now let me present the main algorithm. So the main algorithm is adaptive sampling, and it's going to combine these two sampling parameters. So we're going to have both the set X of surviving elements that have not yet been discarded and the set S, which is the current solution. And so now the distribution is going to be the uniform distribution over surviving elements that have not been added to the current solution and sets of size K over R. Okay. And so now I mentioned that depending on the context, we're either going to have sample or down sample. So what is the context? We're going to look at the marginal contribution of a random set from the distribution to the current solution. And so if a random set has high contribution to the current solution, we're going to want to have sample. Otherwise we're going to down sample. And then we're going to do that until we have a current solution S of size K and then return the current solution. Okay. So why intuitively this condition for up sampling and down sampling makes sense is that if I'm in the iteration where I have a random set that has high value, then I'm going to want to add elements to my solution because I know I can find a random set of high value. On the other hand, if I'm in an iteration where a random set has low value, there must be the case that there's lots of elements that have low marginal contribution to this set. Otherwise they will have low value. And so then I want to be able to discard the large number of elements. Okay. And so two other minor points I want to mention about the algorithm is that what I've presented is like an idealized version of the algorithm because the threshold T is going to depend on opt as I mentioned, which is the optimal value. And we don't know the optimal value and we don't know the optimal value. So one thing we can do for that, we can guess opt. So we can pick log n increasing different guesses for opt. And then we can run in parallel the algorithm with each of the different guesses. That won't increase the adaptivity because all of these algorithms are running parallel. And we know that at least one of the guesses will be arbitrarily close to opt. So at least one of the instances of the algorithm will work well. Then we can just return the best solution from all of these algorithms. And this will be guaranteed to have a good solution. So this is the first of the two idealized assumptions for the algorithm. The second one is like with the expectations. We can't evaluate exactly these expectations. But one thing we can do is that we can just use sampling. So with enough sampling from the distribution, we can estimate these expectations that we usually want. So just by sampling and have sampled that iteration, we won't be able to estimate all of the expectations that we need to have up to an arbitrarily good estimate. Okay, good. And so now let me imagine a high level and how the analysis of the main algorithm goes. So first we have the adaptivity. We want to show log n adaptivity. So we have two types of iterations. Each iteration is either down sampling iteration or up sampling iteration. In terms of the down sampling iterations, we can argue similarly as for down sampling that we're going to remove a constant fraction of the surviving elements at every round. So there's going to be at most logarithmically many down sampling iterations until we have at most k elements that are surviving. Then for up sampling iterations, we're adding k over r elements to the solution at every round. And what we're going to do is we're going to pick r to be logarithmic in n. So we're going to be adding k over log n elements at every round. And so after logarithmically many iterations of up sampling, we will have a solution of size k. So since there are at most logarithmically many iterations of down sampling and at most logarithmically many iterations of up sampling, together we have at most logarithmically many iterations for adaptive sampling. So that's for the adaptivity. Next what we have for the approximation. And now the main idea for the approximation is to use this similar approach as for down sampling, where we're going to bound the value of the optimal elements that are discarded at every iteration. But now we're going to be able to obtain a much better bound on the value of optimal elements that are discarded. And what's the main reason why we get a much better bound? The main reason is because now we're going to be sampling sets of size k over r, so k over log n. And this is going to be much smaller than the sets of size k that we were sampling for down sampling. And so very intuitively, if we're sampling much smaller sets, the overlap between the value of a random set and the optimal elements is going to be much smaller. And so we're going to maintain much more of the value of the optimal elements. And so really it's really this idea of combining up sampling and down sampling that allows to obtain this constant factor approximation. So that was the end of the overview of the analysis of the algorithm. So now just to recap, the main result that is obtained by this algorithm is that we obtained that for the problem of maximizing a monotone symmetric function under quantity constraints, a log n adaptive algorithm that obtains with high probability, an approximation of usually close to one third. Okay. So before I move on to the lower bound, are there any questions about the algorithm, the analysis of the result that obtains? So do you care about making this better for small values of k? So if k is small, you can get, I mean, if k is really small, then you can just do it in one or a very small number of rounds, right? Yes, exactly. So if k is tiny, you can just brute force over all of the sets. But they're still like a small range where you can brute force over all sets, and it might be a bit smaller than log n. And that's an open question. It's not clear what to do in that case. We're going to see the harness result, but it doesn't rule out that same theorem with log k instead of log n. Yeah, the harness result is going to be meaningful for the case where k is greater than log n. k is greater than log n or k is greater than, okay, well, let's just see it. Sorry, let's just see it. So the harness result, so what the harness result show is that if we want to get a constant factor approximation, we can add to that in much smaller than log n rounds, basically. So for the problem of maximizing a monotone similarity function under a quantity constraint, there's no log n over log log n adaptive algorithm that obtains even with low probability, a one over log n approximation. Okay. So here, if I come back to the question about, if I understand correctly, the question asks, well, what happens for the case where k is small? So maybe k is on the order of log n or a bit smaller. And then maybe it's possible to get a better activity. And so, yeah, so I think, again, like if k is very small, we can just do brute force. So I think the interesting case is really when k is much smaller, much, much bigger than log n. But it's only a small window that is open for the values of k. Okay. So if you look just how it depends on n, we cannot get better than this log n over log log n adaptivity. And so how we show this hardness result is that we're going to construct some hard functions. And so these hard functions are going to be defined in terms of a partition of the ground set into layers. So I'm going to denote these layers L0 through LR and L star. And the main limit we're going to show for that is that we're going to argue that it's impossible for any algorithm to learn which elements are in layer Li at around IOP4. So in particular, they would apply that after all rounds of adaptivity, the algorithm won't know which are the elements in LR and L star. So it won't be able to distinguish these two layer LR and L star. And that will be problematic because the optimal solution will be L star. So it won't be able to find the good solution. And so to argue that the algorithm cannot learn which elements are in Li at around IOP4, we're going to use a round elimination technique that's similar as in communication complexity. Okay. So this is really like just in the interest of time, this was just like a super high level overview of the hardness results. And there's many more details and I encourage you to look at the paper if you want to look more about the hardness results. Okay. Are there any questions about the hardness results? Okay. So if there's no question, I can move on to the experiments. And again here, again, I think that the main benefit of this new algorithm is that it gives an algorithm with power runtime, which is exponentially faster than any previous constant factor approximation for similar optimization. And here we just want to convince you that this is not just a theoretical algorithm that they can also be used in practice and obtain good performance. And so we have done several experiments in different papers, but we also have started this collaboration with a bio lab at Harvard. And I want to show you results, preliminary results from this collaboration. And so here what's the setting is that so in this lab, they have like these really large collections of like gene sequences. And so what they would like to do is cluster these gene sequences for DNA analysis. And so one approach to clustering is this approach of example clustering, which formulates the clustering problem as a sub module objective. And so this is what we're going to look at. So first here, I'm showing you the performance of greedy. And so really what I'm here, what I'm measuring is that I'm looking at a run of greedy where they want to pick 50 elements. And at every iteration of greedy, I'm plotting its current objective value, the value of the current solution at that iteration. So this is at the end, it obtains close to 80 in the objective value. And here I'm just plotting how the value of the current solution increases at every round of the algorithm. And so now if I show you the same plot, but for adaptive sampling, what you observe is that adaptive sampling gets near identical performance to greedy algorithm, but it's really able to get this in a much, much, much smaller number of rounds. Okay. So that was the, this is just one experiment. I wanted to show you, we have some others in some papers and I'm going to take a look at it, if you want to look at more experiments with that. Okay. And so now I would like to mention some recent results. So I think since this first result has been several other new results for adaptivity and some other optimization. So I think the first and most natural question is that the previous algorithm I mentioned, it gets a one-third approximation. And we know that the best approximation obtainable for similar optimization is one with Hanoveri. And so it's very natural to ask, well, is it actually possible to get this optimal Hanoveri approximation in logarithmically many rounds? So get the best of both worlds in terms of like approximation and the adaptivity. And the answer to that is yes. So there was two independent papers this year at Soda that showed that. So for the problem of maximizing a monotone submodular function under cognitive constraints, there is a log n adaptive algorithm that obtained with high probability and approximation arbitrarily close to Hanoveri. So what this implies is that we can actually get this exponential speed up in parallel running time with only an arbitrarily small sacrifice in the approximation that's achievable in parallel time. Okay. So we really get the best of both worlds. Like the adaptivity that goes from linear to logarithmic, we're still maintaining the optimal Hanoveri approximation. And so one of some of the main ideas that goes in this algorithm, so it's going to be, it's also going to use this idea of like adaptive sampling, but it's going to have some important differences. So two of those are like one is that the element that we're going to discard, they're not going to be discarded permanently. Sometimes we're going to bring back the discarded elements back to life. And then the second difference is that we're going to have an adaptive thresholding technique. We're now going to have this fixed threshold P. We're going to have thresholds that are going to vary and be adaptive for each round. Okay. So that's for the optimal approximation with log n and activity. Now, as I previously mentioned, the arguably the most general setting for which we know that some constant approximation for semi-optimization is for matured constraints instead of quality constraints. And so very recently, there's been three independent papers that got this one-minus-one-over-year approximation for not currently continuing for matured constraints. And this is, there's like another log n factor in the activity. So this is like a log square n, these are like log square n adaptive algorithm that obtains this high probability of one minus one over e minus epsilon approximation. Okay. And so our algorithm for this, it's going to depart from this idea of like adaptive sampling where at everyone, we're going to sample a large number of random sets. And the sampling random set is going to sample a single random sequence. And doing this sequencing and considering elements in certain order is going to be crucial for the new algorithm and for matured constraints. Intuitively, it's very hard to generate random feasible sets when we have a complex matured constraints, which is not uniform anymore. And so with this random sequence, we're now able to navigate randomly to the matured. Okay. So that's for the case of matured constraints. There's a lot of other results. So the first one is better approximation guarantees for similar functions that have the property of bounded curvature. Another one, and I was mentioning some questions previously about like how many queries do we allow in every round. I mentioned that we allow probably many queries in every round in general. For our algorithm that I mentioned previously, if you do the concentration bounds for how many samples you need to get good estimates, you need n times k squared queries for the algorithm. So this query complexity has been improved to a quasi linear and exactly linear, which is optimal. What it is saying is that you can actually get the best approximation, the best adaptivity, and the best query complexity all together. Packing constraints have also been studied in the adaptive complexity model. There's also been multiple approximation that have been obtained for the case of non-monotone instead of monotone. So modular maxi for summary function under quantity constraints. So now if you also want to consider the unconstrained case, as it was asked in the beginning of the talk. So here we can consider, in the case of unconstrained, what's interesting is like non-monotone functions. So for unconstrained on monotone summary maximization, it was the reason that I got a one-half minus epsilon approximation with constant adaptivity. So here the lower bound does not hold because it was for currently constrained. And also for unconstrained on monotone maximization, we know that the best adaptivity is one-half. So here what's interesting is that they get an optimal approximation and only a constant adaptivity. And then finally, there's also been this recently, this lower bound that shows this interesting trade-off between number of queries in adaptivity for sub-modular minimization. And in particular, what it shows is that a lower bound of roughly n square over r to the fifth queries for any r-adaptive algorithm for sub-modular minimization. So what this is saying is that if we want an algorithm for sub-modular minimization that has sub-quadratic time, sub-quadratic running time, then we need at least polynomial adaptivity. So these are all of the very recent results on adaptivity and sub-modular optimization. So just to conclude our main result, what we show is that for the problem of maximizing a monotone sub-modular function under a quantity constraint, we show that up to global data log-in rounds of queries or samples are necessary and sufficient to obtain a constant approximation. So the main implication is that that means that when we allow for parallelization, so when functional evaluations can be executed in parallel, we get an algorithm that achieves a constant factor approximation exponentially faster than any previous constant factor approximation for sub-modular maximization. And also this relies on a very new algorithmic technique that we call adaptive sampling. Okay, that's it. Thank you. Thanks. So do we have, if anyone wants to ask a question, you can speak up or type it up. Okay, maybe just what you think about it. I have one question, which is, maybe you said this, but in terms of just brute sample complexity, do you, so greedy, I can compute, greedy probably makes like n queries for iteration and then number of iterations. In your case, you have to estimate all these expectations. So it seems like it wouldn't be too high in the end, but is it, does it, how does it scale? Do you know? Yeah, so if you want to compare the query complexity of our algorithm compared to greedy, so exactly as you mentioned, the greedy algorithm needs to compute all of the motion contribution of the elements at every round. So n queries per round and this k rounds. So this n times k total query complexity. For our algorithm for the concentration balance for the expectation, we need to get n times k square queries. So we lose a factor k compared to greedy for a naive analysis of standard concentration bound of algorithm. But there's two things is that first in the experiments, we observe that we really need that many queries. Only a very small number of queries per element was sufficient to get to the algorithm to perform well. And actually there's also been these recent results that do like some more sophisticated concentration analysis and some new algorithm that show that you can actually get this linear number of queries, which is actually even better than the greedy algorithm. Thanks. Other questions? I mean, if not, I'll take us offline. We can stay a little hung out a little bit offline. So thanks to everyone for joining in. I remind you that two weeks from now we'll have Julia Truzoy for the last talk of the fall and today this was Eric Balkanski. So thanks, Eric, and I'll take it offline. Thank you.