 It's great to be here, and folks can hear me. So I'm going to talk about something that I've been sort of thinking about a lot on and off. I don't have any great solutions, but it's sort of more of like a challenge talk. And at least the first, you know, 5, 10 minutes is sort of more of a challenge talk for things that I hope people will pick up. So yeah, the title is, What Do We Do When Participation? I'm sorry, my clicker might not be. Oops. When participation is unrepresentative or non-representative? I mean, maybe I'll just start with a little bit about who I am in that I might not sort of, I'm not, I've always been blockchain adjacent in some sense, but never sort of really part of the community. So maybe folks here don't know me. So my research is at the intersection of economics and computer science and operations research. And what that means in practice is sort of things I've done in the past is like participatory budgeting, algorithmic pricing, rating systems. Today I'll talk about sort of a crowdsourcing system, but really what I think is a democratic system. And so for example, in participatory budgeting, I think this was back in like 2016, 2017, sort of showed how you can sort of allocate money via sort of what at that time sort of what we, I think, viewed as like a local quadratic funding mechanism and that sort of people can make local changes to sort of a candidate solution. And hopefully sort of you can mimic radiant descent to converge to an optimal societal budget. But sort of that's all sort of a long way to say is I've never directly worked in blockchain type things, but sort of I've always been adjacent. And so hopefully we can take some sort of challenges from the things I do think about a lot more and apply them to sort of the things that you all are building. And I think what sort of energizes me about the conversations I've had yesterday and today is really sort of folks who are building things and sort of just trying out mechanisms that maybe folks in my community are usually just thinking about theoretically or proving things about, okay. Okay, so that was a long-winded introduction to myself, but so let's get to the problem that I wanna talk about. Which is in classic voting and social choice theory, like classic, if you just think about how we vote, the sort of 101 on how we do majority rule, sort of the goal is almost always to correctly aggregate the preferences of those who show up, right. And clearly that's not sort of what's the problem. The problem is not everyone votes. So this is a graph of sort of the percentage of the adult population in the United States that voted in an election, sort of voted in presidential election going back from 1789 to 2016, right. And so like what does this graph, clearly for a long time, a lot of people were sort of ineligible where we're sort of oppressed and not able to vote, not given the vote, right. So sort of this graph shows a long, maybe a very long winding path to the US becoming more of a democracy. Me, especially this week might show that we're not there yet. But the sort of, so clearly not everyone is eligible, but even amongst the eligible, not everyone votes, sort of depending on presidential, off-cycle elections, sort of what have you. And this problem is everywhere and it's especially true for like fancy mechanisms, new technologies and so on. So for example, in grad school when we were sort of, I was part of a lab that sort of, my advisor had built a sort of a picture story budgeting platform and sort of this platform was being used to run PB elections. I think at this point, we've run well over 100 elections in different US cities. And sort of one thing that always bothered us was who are the people voting in a random off-cycle Tuesday election, sort of in this like new fancy mechanism, so like even if we prove all the theorems we want about how our mechanism is good, converges for amongst those who's voting, what about those who are not voting or are not able to vote or sort of for whatever reason. And I'm gonna claim is that this is something that you can't just set aside, sort of this is like actually a bad thing and sort of I hope sort of the governance protocols that y'all are building, sort of treat this as a first order problem that sort of like if you actually want the outcome to reflect your community as well, sort of you can't just make arguments about why it works amongst those who are voting or funding or contributing money, sort of you need to tackle this problem. Okay, so what are various ways that sort of clearly this is not a neat, like I'm not the first one, this is like a very long standing problem that a lot of people have talked about. So what are ways that we sort of traditionally tackle this problem in standard mechanisms and standard society, right? So perhaps the most sort of the most canonical way that has many different implementations is reweighting votes by observable characteristics. And now y'all are thinking this is crazy, like clearly we don't do this in like US democracy, but we do, right? So in the US House of Representatives, the seat count per state is based on the total population. It's not based on the voting population, it's not based on the number of citizens, it's not based on those who showed up at the last poll, it's based on population. And the reason for that is is sort of, I don't want to say the reason, because who knows what the, sort of I don't want to make claims about what the founders thought or I don't really care, but sort of one argument to defend that is sort of the claim that the people sort of spatially related to each other, so spatial neighbors can represent sort of interests of sort of the citizens who vote can represent the interests of their spatial neighbors, even if their neighbors cannot vote or do not vote. Now clearly that's not true in the case of, for example, especially incarcerated populations and sort of often in rural areas and sort of are their interests being represented, but that's sort of that's the claim, right? Is we're upvoting the votes of sort of citizens in proportion to sort of the non-citizen population. You can also think about just like, not just re-voting spatially, but sort of other observables, gender, we see this in various other applications, maybe I can give some examples, right? Race, gender, party, so maybe outside of democracy, but in sort of trying to predict elections. This is what polling companies do, right? Is sort of they call a bunch of people, they know their sample population is not representative of the full population, and they re-weight by observables. There's fancy ways to do this in various settings. I really like this paper sort of not by me, fair algorithms for selecting citizen assemblies, sort of what they do is they're in a deliberative democracy setting where a lot of organizations sort of essentially collect groups of people to sort of like talk through issues and that sort of normally they write reports and produce outcomes. And the goal is to sort of try to capture what people would believe after sort of after like three intense days of deliberation. So not necessarily what the population believes going in, but would believe after potentially being educated, being exposed to each other's views and so on. And one big problem that many organizations face here is participation is again not representative. That you can cut it on any dimension, that the people who volunteer to participate are not sort of representative of the population they care about. And so they sort of do some nice ways to sort of select the actual participates from the volunteers such that sort of each individual person still has a pretty good shot of being selected, but then the overall composition of who's selected is representative. You know, there's all this work on liquid democracy where maybe one way to solve this is you just ask people to delegate their votes to other people. So you know, I don't want to vote, I can't vote or whatever, but I just choose who gets to vote in my stead. Sort of we, this is the stuff that, you know, I've been adjacent to like, you know, read some stuff about like these centralized societies and sort of from what I, you know, I'm gonna, I might be wrong about what I'm about to say, but my impression is sort of there they've built in ways or sort of proposed various ways to do this reweighting based on essentially how correlated you are with other people in your community. Okay. And so this is all to say is there's a whole bunch of ways that, you know, people have tried that, you know, I think roughly fall under the banner of reweighting observations. And sort of this might be the first thing that you should try thinking about. I really hope that some of you sort of implement in your various, you know, voting, funding, sort of governance schemes. But I don't think this is enough. And the reason why I don't think this is enough is because there's a lot of implicit assumptions here that, you know, if you're all on the Twitter, you know, sort of there was a back and forth, I didn't want to screenshot and share, but there was like a back and forth between some of the folks in DSOC and others of like sort of is this like waiting if in the right thing to do, how do you find for the right people to wait? Does it do it implicitly and so on? And so, you know, what do all of these existing approaches assume? It assumes that, okay, so it's good on one end and that they recognize that voting is not representative. But, you know, they do assume that we know what representative means, right? That like sort of what, that we have some sense of the target population that we care to represent, that, you know, we care that there are outputs correctly aggregate the preferences of. They then assume that there's enough people who are like those who don't vote, that even if sort of not everyone shows up, the sort of the people who do show up can be waited in a way such that you can mimic the true population. And perhaps the hardest part, I think this is the sort of the challenging part in general, is that they assume that we can meaningfully identify who those people are, right? So like, and by meaningfully identify, I mean sort of like, not just like, oh, like, you know, y'all match on observables, race and ethnicity or so on, but we can agree that because of these, like sort of these are the features that deserve like waiting up, waiting down and so on. And I think this was with some of the sort of, a lot of the discussions about sort of hard, I think this is where a lot of the difficulty in coding up any governance mechanism that does this will be, right? So like, clearly in the US, we face a lot of challenges with this, with sort of like gerrymandering is a problem with exactly this, right? It's sort of like, it's a way political parties are sort of deciding who to up wait in order to maximize their own outcomes. Yeah, and then there's the sort of, that there's some mechanism that you could actually make those people's opinions more important. And sort of what I wanna sort of say is the challenge is that I don't think these assumptions often hold. And a lot of governance mechanisms that you might be doing sort of sort of, especially the third one, but sort of I think all of them are challenging. And I don't know what the solutions are and I hope sort of maybe someone smarter than me will sort of try to think through what are the right governance mechanisms when these assumptions don't hold. And so like, here are some like fun, maybe the opposite of fun examples. So just as an illustration, so an election polling, many of you might have read that in 2016, the major polls were off, certainly on the binary outcome of the US presidential election, but sort of in like continuous sense were off by like I think around three ish points on what the overall vote would be. And sort of like why did that happen is because sort of they were already doing this waiting. So like what they did was, they called a bunch of people or online polls, got a bunch of responses, knew observable characteristics of those who responded. So primarily race, gender, sometimes party identification depending on the state. And then they just tried to rebalance the people who answered with what they guessed would be the voting population. And the big problem in 2016 was sort of, they didn't re-wait by education. And it turns out that education was correlated with who answered the polls. So in that sense who showed up. And it was also correlated with sort of people's preferences in the election. And so that was a setting in which re-waiting sort of did not work. Okay, so 2020 came around. All the pollsters sort of recognized that failure and started waiting by education. Turns out in 2020 that wasn't enough because sort of even after controlling for education, there were still this residual sort of people who supported Trump were less likely to answer the polls. And so all the re-wait sort of, that was another failure of identifying sort of who amongst the answering population represents the true population you cared about. And so I think this is, I just wanted as an example of this might be iterative. If you're implementing these re-rating techniques, you might want to be iterative, but that's still, you can't always fight the last election, right? And then similar things. And that like in standard voting, sort of deciding who to update might be problematic. Okay, so that was, these are all things that I've been thinking about at a high level. What I want to spend maybe the last 10 minutes that I have is sort of giving an example of a system that sort of all of you, like none of you might even think of as a democratic system, but sort of one that just to show I think of how common these problems are of unrepresentative participation. So not just applying to those of you voting governance mechanisms, but also sort of all sorts of crowdfunding, sort of crowdsourcing, sort of just like learning mechanisms, and like how I think some of these problems show up in quite subtle ways. And so, and sort of I want to do this sort of as in, because this was really my entry way to start thinking about what might be context-specific solutions to some of the challenges I talked about here. And so what's the context? The context is equity crowdsourcing, is resident crowdsourcing. So for those of you who are New York, and so this is joint work with my PhD student, Jalu. And so what's the background? So for those of you in New York, you might know about the 311 system, which is a number that you can just call to the complaint of the government about things, right? So you call 911 because you need an ambulance right now, you call 311 because there's a pothole or a tree falling down on a power line, graffiti, noise complaints, sort of complaints to the local government about things. And like this is huge system. So New York receives about 2.7 million requests last year. And it's like easy to be cynical that you're complaining to the local government and nothing happens, but there's entire government agencies organized around resolving these complaints. So for example, we're partnering with the parks department in New York City. The parks department is responsible for maintaining about 700,000 street trees. So these are trees lining the streets of New York, not counting trees and parks. So there's about 700,000 trees, even if you don't count Central Park and all of the many of the other amazing parks we have. This is virtually impossible for the agency to monitor in any real-time sense from employees, right? No one's being funded by that large amount. And so what they rely on is people calling in to complain about problems about their trees. And so what's the pipeline that happens? There's an incident occurs at some point, hopefully, and this is what I'm gonna view as participation here, is at some point, hopefully, someone is gonna call in a report, right? It's gonna say that, hey, this tree has a broken branch, is about to hit a power line or about to fall on a person or whatever. And the subunit of parks department we're working with receives about 85,000 reports a year. And then this is an intensive process. So more than half the time, almost two thirds of the time, the agency sends out a forester to look at the tree in question and to determine what's going on. And about half the time they do that, they actually schedule a maintenance crew to come in and fix the issue, right? So this is a really capital-intensive, sort of labor-intensive process. And this is how allocation of government services happens. And what we try to understand here is reporting behavior, right? So in what circumstances does an incident actually generate a report, how long does it take? And why is that important is because this is exactly the question of participation, right? So if one neighborhood systematically underreports compared to another neighborhood, even given the sort of, even given sort of what the, if those neighborhoods are facing similar issues, then the neighborhood that reports more is gonna systematically get better government services, right? This is, so this is a system where we're already doing the waiting by geography, right? It's like, I never have to report if I have a neighbor who's amazing who walks around every day and sends in all the reports they want, right? But what we're worried about here is even that's not enough, that they're just neighborhoods that systematically underreport. And you can tell all sorts of causal stories why that might be true, things about trust in government, due to historical services, access to technology, time, educate, right? You can tell all sorts of stories about why some neighborhoods might be, the squeaky wheel that gets the grease and others aren't. And so we wanted to first of all measure this because measuring this is, and I think in your applications too, measuring this is the first sort of step of resolving it, right? Is we can sort of do, and this is what we're working on now, is think about all sorts of fun downstream interventions to correct for underreporting, but you have to know what they are first. And so what's the statistical challenge? The statistical challenge in our problem is how do we distinguish between underreporting and that neighborhood truly having fewer problems, right? So by definition, we don't observe reports, we don't observe data on missing reports, right? The entire reason these systems exist is because we don't know the ground truth. And so just like this, in your systems, it might be easy to say, and we say this in American politics all this time, that those who don't vote are maybe apathetic or don't actually care about the outcomes. But that's almost never true, right? It's the question that they don't care that they're truly facing fewer problems, or is it just some other reporting hurdle or voting hurdle for participation? And so I'm not sure I need to walk through this example, but very quickly, two neighborhoods exist, one with 10 incidents, the other with five, and the one with five reports all of their incidents and the other one reports just half. From the eyes of the system, these two neighborhoods look exactly the same, right? We only observe five reports, or five reported incidents per neighborhood. How do we statistically distinguish these things? And this is, I think, a benchmark problem that's fundamental in a lot of settings even outside of participation. So this is sort of perhaps the hardest thing when sort of trying to do statistical analyses for an equitable policing. A lot of really good economics work is around getting around this problem, when sort of showing that policing is inequitable. And maybe I'm gonna skip through the methods a little bit just because maybe that's not what y'all are interested in. But maybe as a few sent in summary, the classic thing to do, the classic approach is to go out and walk the streets and get a snapshot uncensored view, right? So go out and try to figure out some alternate way to get ground truth and then just compare the number of reports to this true, true ground truth that you might have. You often don't have this, right? It's often, especially in this setting, it's really expensive. I'm just a poor professor, hopefully someone will fund and I don't have time to hire a bunch of people to walk around the streets. And so the statistics question we wanted to answer here was where there are ways to measure this under-reporting without actually going out and walking the streets. I'm gonna skip how we did it just because I only have a few minutes left. I was always planning on skipping this. There's FunMap, you know, happy to talk about this offline for those of you who like FunMap. But sort of the, sort of maybe the upshot is what we ended up measuring was on average how long does it take for the first report to come in after the incident happens, right? So what's the delay between reporting between the incident happening and the report? And the claim is that's like sort of if that differs by neighborhood, that's gonna be a bad thing. Okay, so I'm gonna skip the methods and go through sort of the results. And maybe the first result is just like the good news, which is that the system is mostly efficient, that things that are hazardous get reported faster, things that are maybe less dangerous, like roots cracking sidewalks, things maybe overgrown, things that need to be pruned, like things that are less hazardous get reported slower. So this is all good news. It's also verification of the method. But then the bad news, which is on the same order that hazardous and less hazardous things are differentiated, we have vastly different reporting rates by neighborhood, even after conditioning out incident characteristics, right? So for those of you familiar with New York, so here in this map, the darker areas are where we have lower reporting rates and the lighter areas are where we have higher reporting rates. And for whatever reason, sort of the Columbia area and a little bit north of it was just regardless of whatever robustness we did, always came out as the highest reporting rates. Okay, anyway, sort of what we found is that reporting rates are super variable by neighborhood, even condition on incident characteristics. And the difference in the highest reporting rate in neighborhoods, you might observe a report three times faster than you observe in another neighborhood in sort of the slowest reporting neighborhoods. And I'm not gonna show the regressions, but those of you familiar with New York are not gonna be sort of, I don't need to tell you that sort of the colors on this map correlate with basically any socioeconomic thing you care about, right? Sort of education, race, income, sort of population density certainly, but yeah, so like this is a bad thing, right? This sort of, it took a lot of work, but sort of in the system we were able to identify in what ways participation was not representative. And sort of this was just contextualizing some of these things and just saying that in Manhattan on average, we're seeing about two times faster than on average in Queens. Yeah, so that was, I think my sort of, the reason I went through this work is sort of, it took us a lot of effort to, so even in a system that seems perfect for representation in the sense that it doesn't require individual representation, all that required was like some notion of groups. We had a very natural notion of the group, right? Like spatial neighbors and that was reasonable. And even here sort of like, and like this was almost the best case scenario for I think participation being representative, but it wasn't, it dramatically isn't because who uses, if I ask who here has used the 3-1-1 system, my guess is like maybe three of you and that person has used it a hundred times and like sort of no one else has used it even once. It was like, is my guess, right? So like, what can we do about this? This is what we're actively working on now. And in general I don't think there's easy answers. And so I'm gonna end with just asking you in your work, in your governance work, who is the community and whose opinions you care about? Whose voices are you missing? Can you add them to your system? If you can't, are there ways to identify and up wait similar voices? Probably not. And if not, is voting as governance even a good mechanism? That's all I had.