 Let me just introduce our first speaker. So Nisarg is somebody who knows this place very well. He did his BTEC in CSC from IIT Bombay in 2011. He was also the presidential gold medallist at the time, which means that he had the highest GPA in his batch at the time of graduation. He went on to do his PhD at CMU, where he also won the Victor Lesser Award for Distinguished Dissertation. His work spans many fields, but one of the interesting things about his work is that he's also put into applications. So there are at least two particular applications that I know of, Split It and RoboVote, where his work is being used every day by people to divide things, to figure out better ways to vote. He's an assistant professor at University of Toronto, and I'm very glad that he could join us today. All right. So the topic that I'm going to talk about is communication distortion trade-off in voting. And so as you can see from the name in the title, this is going to be about communication complexity and its application to voting theory. And this is based on joint work with a few of my collaborators, Devmalia Mandal, who is a postdoc at Columbia Data Science Institute, Ariel Procaccia, who is a professor at Carnegie Mellon, is now going to move to Harvard. And David Woodruff, who is also a professor at Carnegie Mellon. So the topic of this talk is voting. And as many of you know, the roots of voting go back to the rise of democracy in ancient Greece. But voting has also been studied formally for many centuries, starting from the work of philosopher Ramon Lal, whose lost manuscripts were discovered in 2001. And since then, he has been credited for discovering some of the, for first discovering some of the popular voting methods, like board account or Condorcet method, for those of you who might know this. And from there, all the way to the work of Marcus the Condorcet in the late 18th century to the celebrated work of Nobel Prize-winning Kenneth Arrow in the 20th century. So all the way to the work of Kenneth Arrow in the 20th century. And so across this entire sort of centurious worth of research on voting, there was one model of voting that was pretty common. And I would even go as far as to say that this is the canonical model of voting that was studied. And my apologies to any approval or range voting fans in the audience. And this is the model known as rank voting. And so in this model, you have a bunch of voters. And these voters are represented by these black silhouettes here. And these voters have ranked preferences over a set of candidates. And so here we have the red candidate, the blue candidate, and the green candidate. And each voter has a rank preference over these candidates. So for example, voter one prefers the red candidate more than the green, more than the blue candidate. Whereas voter two has a slightly different preference. She might prefer the blue candidate over the green, over the red. And when you actually conduct voting, all these voters are going to simply report their rank preference. And these preferences are going to be aggregated by a voting rule, F, which is going to aggregate these rankings to select a single candidate as the winner of the election. So this is the model that people studied for many centuries. And one question that voting theorists tried really hard to answer is how do we compare two different voting rules that aggregate these sort of rankings into a single winning candidate? And one method that received a lot of attention in the literature was this axiomatic method, which essentially here we try to define a few natural axioms that we would want our voting rules to satisfy. And then we would try to search for voting rules that are in the intersection of all of these axioms, which means that they satisfy all the axioms at the same time. So this was the hope, but many of the celebrated results on this line of work actually derive impossibility results, which mean that they actually show that for many natural combinations of these axioms, there isn't any voting rule that satisfies all of them at the same time. So this method does not give us sort of a unique voting rule to use in practical situations. It doesn't answer this question that if I have these rank preferences, what is the best way to aggregate these rank preferences into a single winning candidate? Now, with the rise of computer science influence and voting theory, and with the rise of computational social choice, there was another approach to voting that was picking up steam. And this approach is called implicit utilitarian voting. So let's go back to this picture where we have these rank preferences that this voting rule aggregates into a winning candidate. But now this approach says that these voters don't really have rank preferences underneath. They actually have numerical utilities for the different candidates underneath. But they're still going to report the rank preferences. So whenever I mention this, immediately people start asking questions, what does these utilities even mean? We don't have any money in the system in this particular context. So what is even the scale of these utilities? Why is this 7 to 1 instead of, say, 700, 200, and 100? And indeed, there is no absolute scale of the utilities in this moneyless system. So really, what these numbers represent is the relative intensity of the preference of a voter for one candidate over another. So for example, voter 2 here is relatively indifferent between the green and the orange candidate. Here it's orange. In my slide, it's red. But Slightly prefers the blue candidate over the green candidate. Whereas for voter 1, he substantially prefers the green candidate over the blue, and even more so the red over the green. And so these are really just trying to represent the intensity of the preference. So you can think of them as ratios as well if you prefer. Or I can just normalize the sum of all these values to 1 to make sure that they are on the same scale. You can think of giving a dollar to each voter and asking them to divide this dollar between the different candidates in proportion to how much they like the different candidates. So as you can see, the voting rule is still the same. The voting rule still essentially aggregates these rankings into a single alternative. But now that we have this underlying numerical information, we can try and ask how good is this voting rule. So we need an objective measure that the voting rule should in principle maximize. And one measure that is very common, but you don't always have to go by this measure is the social welfare, which is the sum of the utilities to different voters given by each candidate. So if you look at the sum of the utilities for the red candidate, then that would be 1.5. The sum of the utilities for the blue candidate here is 1. And for the green candidate, it's 0.5. So our voting rule doesn't see this information, but it makes some decision. And then you can ask for this decision, what is the ratio between the highest social welfare that I could have guaranteed versus the social welfare that my voting rule generates. And this ratio in this context is 1.5 divided by 1. But of course, this is for these particular set of numbers. Our voting rule doesn't actually see these numbers. So any numbers that are consistent with these rankings could potentially be the underlying numerical preferences of the voters. So what we want to do here is we want to look at all possible sets of numbers that the voters could have that could generate these rankings. And across all those sets of numbers, we want to look at the worst case ratio of the social welfare that our voting rule guarantees. And this is what's known as the distortion in this line of work. So this approach gives a different way of viewing what a voting rule is trying to do. And it does require you to make a few subjective assumptions. So for example, it does require you to assume that the voters do have these numerical utilities underneath their rank preferences. And it does require you to assume that you are trying to maximize some sort of objective function here, for example, the social welfare. And it also requires you to assume that when you cannot maximize this social welfare function because you don't know these utilities, you want to go for the best worst case approximation ratio, which is a very standard concept in theoretical computer science. But it does require you to make those assumptions. But once you make these three assumptions, the nice thing about this approach is that it gives you a single uniquely optimal voting rule to use. And this is the voting rule that is going to minimize this so-called distortion on every possible rank preference profile that it sees. So this is a relatively recent development in voting theory. And while this was going on, there was another development that was also going on, which is that people were starting to look at applications of voting beyond just political elections, beyond the standard scenario where you have M candidates and you want to select one of them. In the previous setting, you did also assume that the number of players or some number of candidates has a bound or something. So there are M candidates. There is no, I mean, yeah, it's just there are some N voters. There are some M candidates. And each voter has some numerical utility functions over the candidates. And at least three of them. Yeah, if there are two candidates, then there is the majority's, yeah, it's the optimal rule. So it only becomes interesting for three or more. So as I was saying that people also started looking at applications of voting in some other context. And one such context was participatory budgeting. So this is a very popular paradigm. For those of you who may not have heard it, I'll just give a brief overview. So it's a paradigm whereby a city sets aside some portion of the public budget to fund some public projects. So if the residents want a certain park to be built in a neighborhood, then this money can be used for that purpose. And the key distinction is that we want to allocate this money based on the preferences of the residents rather than some city officials making the decisions by themselves. And this paradigm has been used already to allocate hundreds of millions of dollars of public money, just in North America alone, but it is a worldwide phenomenon. And so for example, you have this symbolic city here. And the city has three different projects. So you could build a skyscraper. And this project has a cost of four in some units. And there is a voter. And this voter has utility six for this particular project. There is another project which is repairing the roads. This has a small cost of one and a smaller utility of two for this voter. And a third project, which is to build a park. And this has cost three and utility three. So one thing that people realized when looking at this context is that there isn't just a single kind of question. There isn't the unique kind of question that we might want to ask voters to get their preferences as input. For example, you can just do the standard thing that we have been doing, which is to ask voters to rank these different projects and rank the projects by their utility. So you can ask the voters to just ignore the cost of the projects and just ask them to rank the projects by the utilities. And in this case, this particular voter would say that he prefers building the skyscraper more than building the park, more than building or repairing the roads. But as you can see that this is more like a multi-agent knapsack problem. So taking inspiration from the optimal knapsack solution, you might also want to ask voters to rank the projects by their bank for buck or value divided by the cost. And so if the voter does this, then the voter would provide a different ranking, where the repairing the roads has this ratio of 2 is to 1. This is better than the ratio for the skyscraper, which is 1.5, which is better than the ratio for the park, which is 1. Yeah. Right. So these utilities are going to change for the different voters, but the costs are going to be the same. Another approach that came up in work of Ashish Gowal at Stanford is they looked at this knapsack voting, where you show the total budget to each voter, and then you ask this voter to solve their own knapsack problem, which is that if you had the control over this entire budget, which subset of projects would you execute? And so in this case, the voter would think about implementing the skyscraper versus repairing the roads and building the parks at the same time. And since 6 is more than 2 plus 3, they would actually vote for building the skyscraper. Another input format that came up in one of my work with my collaborators is this threshold approval voting, where you ask voters, for example, approve all the projects that you like at level at least 10% of your total happiness for all the projects. So if you had to divide a dollar between all the projects, select all the projects that would receive at least 10 cents. And in this case, if this threshold turns out to be let's say 3 in this particular normalization, then the voter would approve the skyscraper and the park. So as you can see, the important takeaway here is that in this context, there isn't a single kind of question that we might want to ask. There isn't a canonical question that you might want to ask. There are all these different questions that make sense. And we wanted to see sort of how do you compare these different kind of input formats. And so one nice thing that this implicit utilitarian voting approach gives us is that it not only allows us to compare different voting rules, that all take in ranked preferences as input and output a single candidate, but they also allow us to compare two different voting rules that take different kind of preferences as input. So you can compare a voting rule that takes rankings by value as input against a voting rule that takes knapsack votes as input. And in fact, you can go beyond that. You can even assign a distortion for each input format. So you can say that if I have rankings by value, I'm going to aggregate it using the best possible aggregation rule that minimizes my distortion. And I'm going to look at the smallest distortion that I can achieve. And here this turns out to be roughly square root of m up to log factors. You can similarly assign a distortion for ranking by value for money, which is also square root of m. You can show that knapsack voting has exponential distortion and threshold approval voting has logarithmic distortion. So here you can essentially assign this single number to each input format. And that in some sense tries to convey how much information is contained in responses in this particular format, which is useful for maximizing social welfare. So this is good. But certainly comparing these input formats on just this one number each doesn't really make sense. Because if you wanted to minimize your distortion, then you would just ask voters to report their exact utility functions. And then you would be able to get distortion one. So the reason we are not doing that is because there is this implicit notion of cognitive burden. We think that asking voters to report this direct sort of intensities of preferences is cognitively very burdensome. So of course, we want to compare these different input methods not only based on the distortion that they help achieve, but also based on the cognitive burden that they impose. And this is not a very easy thing to measure. The cognitive burden imposed by a certain kind of question on the voters. And this suddenly requires many different perspectives. It requires you to think of the psychological perspective, the sociological perspective. And so we are not going to answer this question completely in this work that I'm going to describe. But here what we did is we just took a very, very crude measure of cognitive burden, which is the number of bits that a voter needs to communicate to your voting rule to convey the preferences in that certain format. So suddenly, as I said, the number of bits is not the right measure of cognitive burden. It's just the first step. But if you go by this measure of cognitive burden, then you can now ask this very interesting conceptual question, which is that, if I am willing to ask each voter for just k bits of information about her valuation function, then how much can I minimize my distortion? And also, what is the right set of questions that I want to ask subject to this budget of k bits of information? What is the optimal ballot look like? I guess any questions so far on the conceptual framework? You don't consider consenting on the ability, right? So they report the kind of you design what they do. Right, so that's right. So in this work, I'm not going to look at incentive compatibility. I'm not going to look at strategic manipulations. Turns out that this approach of implicit utilitarian voting actually blends very well with incentive compatibility. So Umang actually has some work on looking at strategy-proof voting rules. And it turns out that you actually don't lose distortion very much if you impose strategy-proofness as well. Of course, we haven't looked at it in this framework. It has been looked at in the original standard framework where your preferences are ranked preferences. But that's a very interesting direction for the future. So at this point, just to formalize the model a bit more, we have a set of voters that I'm going to represent by capital N and the individual voters by one through small n. You have a set of alternatives that I'm going to denote by capital A. And there are M alternatives. And each voter, as I mentioned, has a valuation function which assigns a non-negative real value to each alternative. And as I said, I'm going to normalize all the voter valuations so that they sum to 1. So this VI is essentially going to be a vector in the M dimensional space that adds up to 1. So it's going to be in the M simplex. And given these valuations, you can define the social welfare of each alternative, which is just the sum of the values that it gives to different voters. And one nice thing that comes up when we think of this voting rule as designing the ballot and designing the ways to aggregate responses to that ballot is that voting rule is not just one function anymore. It's actually a combination of two different functions. So there is an elicitation rule in there which maps all the possible valuations from the M simplex to some finite response space. So just think of R as some finite set. So this function essentially tells each voter that if this is your particular valuation function, then this is the response that you should provide to me. And the number of bits that the voter needs to communicate to provide this response is the log of the size of this space. So that is the elicitation rule. And the aggregation rule takes all of these responses that it receives from all the different voters and selects a single alternative as the winner. So one key thing to note here is that the ballot is the same for all the voters. So once you decide how the voters should respond, it's the same ballot that every voter is answering. You can think about asking different questions to different voters. And it turns out that you can minimize distortion pretty quickly if you are willing to do that. But this is still a very realistic domain where you want all the voters to answer the same ballot. And here I've defined these rules to be deterministic. You can also allow each rule to be deterministic or randomized. So you can pick this mapping from the M-simplex to finite response space at random. Or you can also have this aggregation rule return a distribution over alternatives. Yeah. So I think you said that random numbers are accessed to random coins. No. So in our framework, we don't look at voters who have access to random coins. What we do is that if you want to randomize your elicitation rule, then you can essentially pick this function at random. Before you send it to the voters, so you pick a random ballot and then you send this ballot to all the voters. And the voters respond to this ballot. And those responses are deterministic. And then you can sort of aggregate these responses maybe in a randomized fashion where you output, instead of outputting a single alternative, you output a distribution over alternatives. But how to get close to communication to the voters saying what is the ballot rule? Right. So this doesn't count the cost of conveying the ballot to the voters. So this is sort of inspired from this sort of this imagination that you can have this ballot printed. And the ballot is already right there. There is a cost to training voters to understand what this ballot is trying to ask them. And so we also did some human subject experiment where we looked at both the time that it takes voters to understand the ballot and then the time that it takes voters to answer the ballot. And so yeah, so I'll talk about that a little bit at the end. This is a single round process, hasn't it? It's not like you don't want to do this in multiple rounds. I think you'll ask for some way. Right. So one thing that we don't want, I mean, we essentially are restricting to do is we want to conduct this election in the standard way where everyone answers the ballot in one round and then the responses are aggregated. Even if you do it in multiple rounds, as long as you don't rely on the responses of one voter to ask a different question to another voter, then you can essentially simulate this multi-round thing in a single round where you can essentially ask for the entire tree of possible responses. But yeah, so yeah, so there is a nice question as to what you can do with adaptive registration that will leave for future work. Just to check, something you can't do is like a randomized threshold thing. The voting mechanism has to define and ask the voter for. You can. You can't just do a different threshold for each voter. They can't do that. But you can still randomly pick the threshold and then you can convey the same threshold to everyone. That is now what the ballot looks like. All right. Great sort of questions. So going back to this framework, now what this looks like is that the voters have these utility functions. They come to this voting method. They see the ballot and this ballot essentially tells them how to convey, convert their numerical valuation function to a response. And then these responses are aggregated by an aggregation rule to select a single winning candidate. And here we are going to measure a voting rule on two different aspects. We are going to look at the communication complexity of the voting rule, which is the expected number of bits that a voter needs to send. And the expectation here is only if you randomize your elicitation rule. If you are randomizing your ballot design, then this is going to be expectation. Otherwise it's going to be deterministic. And the distortion is the notion that I already mentioned, which is the worst case over all possible valuations that the voter could have. The ratio of the maximum social welfare that you could achieve, if you knew these valuations and pick the right alternative, divided by the expected social welfare, that is achieved by the winning candidate selected by your voting rule. And this expectation is over randomization in the aggregation rule part. All right. So what we are interested in is looking at the Pareto frontier between these two different metrics. And so we are interested in the kind of questions that say, if I limit the communication complexity to k bits, what is the minimum distortion that I can get? Or if I want to get a distortion of at most d, what is the minimum communication complexity that I need? And in this sequence of works, we perfectly identify the Pareto frontier. So just to give you a bit of context, so there were some, when we started this project, there were some rules in the literature for which these comparisons were known. So for example, if you ask people to communicate rankings, then that takes m log m bits to communicate, order m log m bits to communicate. And it was known that if you do deterministic aggregation, then you can get theta of m squared distortion. If you do randomized aggregation, you can get roughly theta of square root m distortion. All of these tilde's are going to hide log or sub logarithmic factors. And if you do threshold approval votes, then this has communication complexity m, because you are approving or disapproving each alternative. So it's an m bit elicitation method, and it allows you to get, with randomized aggregation, you can get log square m distortion. So it turns out that when you optimize your valid design, you can actually Pareto dominate all of these known trade-offs. So you can get voting rules that have better communication complexity and better distortion, even with deterministic aggregation than both rankings and threshold approval votes, and also other known methods. So let me start a little bit. Let me talk a little bit about the techniques and the results. And so here I want to focus not so much on the results, but more on the techniques that we are using. But I'm still going to describe some of the voting rules that derive our upper bounds a bit informally. So when you want to do a deterministic elicitation, so when you want to pick your ballot deterministically, then we propose this pref threshold voting rule, which essentially intuitively does something very simple. It asks each voter i to report her top order m over d most favorite alternatives. So here my goal is to achieve distortion d. And for doing that, I'm going to ask each voter to report her top order m over d alternatives and also give me some estimate of her value for this alternative, not exactly conveying this value, but rather picking this value from one of let's say some logarithmic many buckets. So essentially you can think of this as voter essentially picking whether her value for each of these alternatives is very low, low, high, or very high. And then the aggregation rule essentially converts each of these estimates into sums up each of these estimates to come up with an estimated social welfare of each alternative, and then just picks an alternative that maximizes social welfare, an estimated social welfare. And so I'm not going into the detail of how you do this, but this is very simple. You can look at the paper for this. But for this, you can show that this has communication complexity, which is roughly order m over d, because reporting this set of m over d alternatives takes roughly m over d many bits up to log factors, and also picking the right of the log m buckets also takes essentially roughly m over d bits up to log factors. And for this, you can show that the distortion is now order d. Yeah. Why do you say that you quoted to all? With respect to all, yeah. All right, so now it turns out that when you try to extend this technique, this very simple, very intuitive technique to randomized elicitation, it does not work well. It gives us a bound something like m over d squared for the communication complexity to get distortion d, but it doesn't give us the optimal bound. And to derive the optimal bound, we had to make a connection to the literature on sketching and sampling in the streaming algorithms literature. So this is from theoretical computer science, where the idea is that you are going to see a stream of updates, and you want to sort of process all these updates, but you don't want to store the full information. So think of sort of receiving a vector x, but receiving it in parts, where you receive vector x 1, x 2, dot, dot, dot, x n, and then vector x is just the addition of these vectors. Now at each point, you could, when you receive this x 1 and x 2 and so on, you can just store the current sum until now. But if this vector is m dimensional, then you are essentially going to use space that is proportional to m. Whereas what these algorithms try to do is that they try to use sub-linear space. And with storing some sub-linear amount of information, they want to perform some operation at the end on the entire vector x. So this could be frequency estimation, this could be sampling an element at random, things like this. And so there was this notion of perfect LP samplers in this line of work that was defined by one of my collaborators, David Woodruff and his collaborator in 2010. And the idea of a perfect LP sampler is that once it stops processing all these updates, it wants to output with probability at least 1 minus delta, a random coordinate j hat of this vector, such that each coordinate j is sampled with probability proportional to the pth power of the value of x at that coordinate. So imagine that this x 1 through x n are essentially frequencies of different, all these different m elements. Then at the end, you want to sample each element with probability proportional to the pth power of the number of times that that element has appeared in the stream. And of course, you want to do that without actually storing the number of times that that element has appeared in the stream because you don't want to use too much space. And here, of course, we are going to allow some small additive error in this probability sampling. So the way that we connected this to voting is that you can view all of these x i's as the different valuations v 1 through v n. And then the x vector just becomes a social welfare function. And so at first, this kind of sampling seems unrelated to the task that we have at hand, which is to select the coordinate that has the highest value in this social welfare function. Rather, we are selecting each coordinate at random with probability proportional to the pth power of its value. And it turns out that if you can sample an alternative with probability proportional to the square of its social welfare, then I can actually use that to derive the optimal voting rule. And so in particular, we use this perfect L2 sampler that was derived by Jayaram and Woodruff, I believe, in a stock 18 paper. And this L2 sampler, in addition to having this definition, in addition to meeting this definition, it also uses so-called linear sketches, which means that if I use some sort of small representation of each of the x i's, I can just add up those small representations. This is what the linear means to get a small representation of x. And in addition to doing that, it not only produces a random coordinate j hat, it also produces an estimate of x of j hat. And that is very useful in our reduction. So it's also going to produce an estimate of the social welfare of the random alternative that it returns. So just to sort of look at the high level, how our voting rule uses this perfect LP sampler. So here I have listed this more or less in a black box manner due to some subtle points. We cannot actually use this sampler in a black box manner. We have to sort of unroll it within our voting rule, but I'm not going to go over that. So the elicitation rule is very simple. Each voter i, first of all, rounds his sort of numerical valuation function to some discrete point. So essentially, you take each value for each alternative, and you round it to some nearest multiple of 1 over delta, where delta is some small constant. And then it takes this particular rounded valuation, and it sends many, many independent sketches of this rounded valuation. So in particular, we are going to ask for order m over d cube many sketches of this rounded valuation. And these are sort of linear sketches that this aggregation rule can aggregate from all the different voters. It can obtain order m over d cube many sketches of the social welfare function. And then by choosing the best of these different alternatives that are returned by the different self samplers, where the best is defined by the estimates that are produced by the samplers themselves. So each sampler gives us a random alternative that is sampled with probability proportional to the square of its welfare, and an estimate of the social welfare of that alternative. And we can pick the best of these, and we can show that that gives us this order d distortion. And because of the small space complexity required for each of these sketches, you essentially only need roughly m over d cube space communication complexity. Right. So this is a randomized elicitation where the randomness was in choosing the different sketch functions. And then you send all these sketch functions to the voters, and they compute the sketches of their particular rounded valuations, and they send it back to you. So here we are not concerned since we think we have all the randomness complex. So as I said, the universe of m by some polylog m matrices. So you still need polynomially many bits of randomness. But yeah, that's a very interesting point, which is that how many bits of randomness were required to actually generate this ballot. So that's something that we don't consider into account in this framework. It is not so far. I mean the delta is like a small constant. It's something like 1 over 128 or something. It's just enough that everything works out and you get this order d distortion at the end. You can look at the paper for the exact constant. But yeah, something that's small enough. Just to check on this one, so the mechanism passes a bunch of random points. This is put in the ballot. So everybody sees and then they generate these sketches and then each of them you say it's right. Right. Yeah. So the other direction is like sketching lower bound actually. So I'm going to say something about that in a few slides. So so far I showed you that with deterministic elicitation you can get distortion d with m over d communication complexity with randomized elicitation. You can get the same thing with much smaller number of bits using only m over d cube bits. And I did not mention anything about whether the aggregation rule is deterministic or randomized. Turns out that you can achieve both of these with deterministic aggregation rule. But I'm going to show matching lower bounds that hold even for randomized aggregation. So it turns out that when you are already sort of once you fix whether your elicitation rule is going to be deterministic or randomized, whether you do a randomized aggregation or a deterministic aggregation doesn't really change things. So let me say a few things about the matching lower bounds that we get. And here we make connections to the communication complexity literature within theoretical computer science. In particular, we look at this multiparty set, this jointness problem. And the problem is as follows. So you have a universe of m elements. And there are some t players. And each player holds a subset of this universe privately. And these players are trying to solve some joint problem of their private inputs. So in this particular case, they are trying to figure out whether their sets have a common intersection or not. And typically these problems are studied under some kind of promise. So there is some third entity that gives you this promise that the only kind of instances that you are going to see are these two kind of instances. You can either have a no instance where all these sets are pairwise disjoint. Or you can have a yes instance where there is an element that appears in every set. And other than that element, all the sets are pairwise disjoint. Any other third kind of instance, you would never see that. Or if you see that you can output anything arbitrary on that instance, it doesn't really matter. And the question is, once you know that you are only going to see one of these two types of instances, how many bits do the players need to exchange in order to figure out which instance they are part of? Of course they can just declare their entire sets, but you want to do something hopefully smaller than that. And the way this framework is usually analyzed is that you have each voter or each player having this set of elephants. And they write some bits according to some protocol on a shared blackboard. So each player essentially sees what the other players are writing. And when the protocol says that now they need to write something, they go and write whatever the protocol is asking them to write on this shared blackboard. And at the end, they need to compute whether the answer is a yes instance or a no instance. And the communication complexity measure here is the total number of bits that must be written on the shared blackboard before these players can figure out the right answer. So here it's the total number of bits. In our voting problem, we have number of bits communicated by each voter. So there is a slight difference, but it doesn't really hurt us too much. So one thing that we had to do is we had to define our slight variant of this problem. We had to define a fixed set, size set disjointness problem where you require further that each of these players have a set of fixed size, say S, in their mind. And now you're trying to solve the same problem. And then we use this lower bound on the communication complexity for disjointness with M elements and T players. So in 2009, a couple of independent papers established that under any randomized protocol that outputs the right answer with high probability, players must exchange omega of M over T bits to find out the right answer. And using that, we showed that when you impose this fixed size S, as long as S is smaller than M over T, your protocol would require omega of S bits exchanged from the players in total. So this is more like a refined version of this result. And then we use this result in turn to reduce to the voting problem. And so essentially, we show that if you have a voting rule that achieves distortion D with fewer than M over D square bits per voter, in case of deterministic elicitation, or fewer than M over D cube bits per voter, in case of randomized elicitation, then you can actually construct a protocol for F disjointness, the fixed size set disjointness that solves in less than S many bits. And you can derive a contradiction. Just a quick see what I'm missing. So there was a question about the interlay of the communication and you said that it's an equally important question. All into this result, it seems to me that you already resolved this question. So almost, so you are right that these results actually allow for many rounds of communication. So these results are pretty general. And it turns out that when you look at our reduction, our reduction works for any voting protocol where I am. So it's a bit difficult to explain. But essentially, I am only allowed to ask two different questions to two different voters when they have already given a different response in the past. So when I start out, I need to ask all the voters the same question. But then some of these voters are going to give response one. Some of these voters are going to give response two. Now, among all the voters who give response two, I cannot ask different questions. I need to ask them the same question, but I can ask them a different question than I ask to the people who give response one. So as long as you only distinguish between voters when they give different responses, then the reduction works and then these bounds still hold. But if you are willing to ask different questions to different voters from the very start, then this reduction doesn't work. So there's this slight technical open question as to how to actually make the reduction work even when you are willing to ask different questions to different voters. But another interesting thing here is that this is actually not tight. So this is tight. For randomized elicitation, our upper bound was m over d cubed. And this is exactly what the lower bound says. But for deterministic elicitation, this is a bit loose. And to get a tight bound there, we had to define a different kind of promise where in the yes instance, you know very essentially much less structure about the instance. And so you are only told that there is some element that appears in at least some gamma fraction of the player's sets. And for this, we showed that if you have this promise and if you have a deterministic protocol that solves the question under this promise, then you actually need m bits of communication between the players, which is an improvement from this omega of s, which was same as omega of m over t previously. So we improve this by a factor of t here. And that actually improves our communication lower bound to m over t. So this is sort of the summary that for getting distortion d, the best you can do with deterministic elicitation is exchange m over d bits with randomized elicitation m over d cubed bits. And the key sort of moral of the story here is that there is this nice connection between the literature on voting to the literature on communication complexity and to the literature on streaming algorithms. And in particular, with communication complexity, it's a two-directional bridge. So not only were we able to utilize results from communication complexity, deriving new results on voting required deriving new results on this communication complexity literature. So introducing this fixed size set disjointness problem and also introducing this substantial intersection promise. With streaming algorithms, so far we have only used results from streaming algorithms to derive voting rules. As Sid mentioned, it sort of remains to be seen whether you can actually use voting rules to derive new streaming algorithms as well. I just want to mention a couple of things about a generalization that we also studied, where instead of selecting one alternative, you want to select a committee of k alternatives. So you want to elect a committee of candidates. And this is a very well studied problem in voting. One thing that you need to define is how is a voter going to value a committee of candidates? So we have this valuation function, which assigns a utility of a voter for each individual candidate. And there are multiple ways to extend this to define values over committees of candidate. One is to just take the sum of the values for the different candidates in the committee. And turns out that this actually has a very simple reduction to the winner selection problem, because essentially your selection of candidates in the committee becomes almost independent. You just want to select the k candidates with the highest social welfare in your committee. So the more interesting variant is where the value of a voter for a committee is the maximum value that the voter has for any candidate in the committee. So essentially a voter is trying to find the closest candidate that represents me, and I'm going to derive my value from this particular representative. And this makes the problem more like a coverage problem, where if you have already selected one candidate that represents this subset of the voters, then you want to select another candidate that represents this other subset of voters and gives them very high value so that all the voters have high value. And the social welfare of a committee of candidates is again just the sum of the values that this committee provides to different voters. So here it turns out that if you want to, so here again the question is, how much communication do we need from the voters to get distortion D? And before I mention the results, one thing that comes up is that it's not quite clear whether this problem becomes easier or harder as you increase K. So certainly if you can select more candidates in your committee, then you can give higher welfare to all the people, because you can just take your previous committee, you can add another candidate. Now everyone's value can only increase. But then on the other hand, the optimal committee of K candidates, when you have a larger K, that bar is also going higher, because the optimal committee also gives higher value to all the candidates. So when you are trying to compete against the optimal, whether that approximation ratio goes up or down, when you increase K, isn't clear a priori. But it turns out from our analysis that essentially the problem does get easier with K. In particular, if you want to select a committee of size K, then with deterministic elicitation, you need m over Kd bits from each voter. With randomized elicitation, you need m over Kd cube bits from each voter. And this again uses more connections to streaming algorithms and communication complexity lower bounds that I'm not going to elaborate on. But essentially both these bounds decrease linearly as K goes high. So with that, I want to just say a few words about the kind of directions in which this kind of work takes us in the future. So the first direction is that we certainly need better models of cognitive burden. As I said, initially the number of bits that I need to ask from each voter is a very crude measure. So with some other set of collaborators, we also ran some preliminary human subject experiments to measure the cognitive burden imposed by different kind of input formats for a participatory budgeting kind of domain. And one thing that we found is that the cognitive difficulty does roughly match the number of bits. So for example, we found that ranking by value and value for money were very difficult for the voters, and threshold approve, which is m log m bits, and threshold approval and knapsack votes were both slightly easier for the voters, which is m bits, and then K approval votes, which is even simpler were the simplest for the voters to answer. But of course, it doesn't match exactly for example, both ranking by value and value for money. They both have m log m bits of information contained in them, but for voters it was much more difficult to answer questions of ranking projects by value for money than just by the value. Another thing we need to look at is we need to at least keep in mind is the intangible effects of this ballot design. So the goal of the ballot is not often only to get the highest possible welfare. So we were talking to the Barcelona participatory budgeting team in trying to deploy some of our algorithms in practice in Barcelona PB, and one thing that they mentioned is that knapsack votes, which is what they've been using in the past, they might be bad in terms of generating high social welfare, but they are good in this other aspect, which is they help voters understand the limitations imposed by the budget. So if you ask voters to just approve all the projects that like at least a certain level, they are probably going to select maybe 15 or 20 projects, and when the city at the end executes only three of them, the voters are going to be frustrated because they are going to think that the city is being very inefficient. But with knapsack votes, they can essentially see in this very nice interface that they can start putting projects in, and they can see the budget immediately filling up, and they can understand that given this budget, it isn't really possible to execute more than three or four projects at the same time. So these kind of effects we should also take into consideration when talking to practitioners. Another direction is to, and this is something that I've talked to many people about, and this is something that we really need, is to get out of this very restricted view of voting where we get all the preferences, we aggregate them, and we output the winner, because this is not how voting works in the real world. For example, if you look at participatory budgeting, there is the final voting stage, but there are a lot of things that happen before it. There is a stage where the residents are invited to provide project proposals. Then there are a couple of stages of community discussions where these projects are filtered, and then the final voting stage happens. And how these earlier stages are executed is definitely going to impact the social welfare generated in the end. So you want sort of an all-encompassing analysis of this process. And another thing is we should extend these two different models of democracy, like liquid democracy, representative democracy. So with some other collaborators, we have a 2019 AAAI paper where we look at comparing the primary-based election system, where in the US, instead of all the candidates competing in the single election, each party holds an internal primary, and then elects one candidate that goes into the general election, and then voters only vote over these candidates in the general election. And we compare whether this is better or whether having a single direct election is better by, again, using this distortion framework. Finally, there is need for voting theories to go out and talk to people who are doing some kind of voting in the real world and deploy some of these mechanisms. So Ashish Goyal's team at Stanford has done some pretty fantastic work in trying to deploy some of the participatory budgeting-based approaches to cities like Cambridge and New York and San Francisco. And with my collaborator, Ariel Procaccia, we also have this website, RoboVote, where we deploy some of these distortion-based approaches to voting and then let people make real-life decisions for voting based on these approaches. Thank you. Yes, there's maybe a minute or so for more questions. Yes, sir. If I look at L2 sampler, beyond 2, does it give us results beyond social welfare? Right, so we try to think about this. So initially, we were looking at, yeah, so we were looking at L1 samplers, which is essentially, you just sample something with probability proportional to its social welfare. And turns out that this is not enough. L1 sampler does not give you enough information. L2 sampler does. We didn't look at P more than 2. Also, the results that we know for perfect L2 samplers are much stronger than the results that we know for P more than 2, because P equal to 2 turns out to be a very special case, which is not a surprise. So yeah, so we just use this. But it's an interesting question whether you can use LP samplers for P more than 2 for this purpose as well. So you have been bound for case selection. I mean, like the more general thing would be participatory budget, right? Right. Yeah, so I didn't mention this, but it turns out that there is also an easy reduction from participatory budgeting to case election, where you can essentially bucket the projects based on their cost. So you can create exponential buckets of cost. And then within each bucket, you can select the right number of alternatives if their costs only differ by a factor of let's say 2 within that bucket. Then essentially, there is an obvious k that you can select. And it's going to give you the right approximation. So there is an obvious reduction from case election, so sorry, from participatory budgeting to case election, where you can use any case election rule and essentially get the same distortion in the participatory budgeting context up to some extra logarithmic factors. So there is no good reason for doing that. It's just one of the very standard choices in the literature. In fact, for voting, I would even say that social welfare function is probably not a very good choice, because it goes towards satisfying the majority, right? Because you want to always elect candidates that are satisfied by the majority. For k equal to 1, this is not so much of a problem. It's probably because you cannot satisfy the majority and the minority at the same time. For k higher than 2, this is the very reason that we went for this max formulation of the value that a voter has for a committee, so that when you select the optimal committee of let's say size 2, you want to select one candidate that opposes this half of the population, but you want another candidate that opposes that half of the population. Even if this is 51 voters, that is 49 voters, you don't want two candidates that both appease these 51 voters. So you can kind of fix it, that this unfairness of the social welfare function. You can kind of fix it by having this max formulation. But again, sort of what these really mean more principally and more fundamentally, that's left to be explored. How dependent are the candidates on the form of the voting? Right, so they are quite dependent in the sense that, for example, when we are using the sketching algorithms, these sketching algorithms are defined for these updates that sum up. So we have this x that is formed by x1 plus x2 plus dot dot xn. And the sketching algorithms are defined for this very specific kind of setting. So if you have a different kind of social welfare function, and I'm not sure that these sketching algorithms would go through unless it has a transformation. So if you can take log of all the utilities, and then you can add up the utilities. So if you look at something like Nash welfare, then you can just take the log transformation, and then you can apply these algorithms. And it might work out, but something that doesn't have a transformation to social welfare, that might be more difficult. Thank you.