 Welcome Matt Stempeck and Mika Elsifri, who are going to talk to us about the problem with impact measurement in civic tech. Wonderful. Thank you everyone. Thank you for being awake, despite the time zones. This is talk about the problem with impact measurement in civic tech. It's going to build a lot on what Rosie and Duncan said, and if there's any points of conflict between the two, I would defer to their presentation. My name is Matt Stempeck. I've been working and building and studying civic tech for about 12 years now. It places like MIT and Microsoft and Hillary for America. And I did this research together with Mika Elsifri, the gentleman in the blazer, the blue blazer, who is the co-founder of Personal Democracy Forum in Civic Hall, and combined we've been looking at this space and impact measurement for a long time, which is why we need to retitle this presentation, The 10 Problems Plural with Civic Tech Measurement. We've read a lot of impact reports and annual reports over the years, and today we'd like to share some of the common recurring issues that we've seen that make the kind of research that we do here at TicTac very difficult. So I'll go through these obviously, but these are 10 common issues we've found with apologies to Rebecca, because some of these include opinions about facts, but they're based with evidence and small sample sizes of anecdotes we've picked up. But these come up repeatedly and haven't been adequately addressed, we feel, in the space. And whether you're an independent researcher or staff trying to measure impact or funders looking to invest wisely, these probably address you. So 10 seconds on how impact measurement could work in an ideal world. We'll start with the outcomes of why your group exists for the first place, and then we work backwards to identify metrics or stories or accounts that are, you know, valid milestones towards that progress that show us. And then the third one is the TicTac sweet spot, which is can we or ideally someone else do the research that proves or disproves a causal link between our work and our technology and that effect that we've observed in society. So for example, the Center for Humanitarian Data, whose mission is to make crisis data more immediately available to crisis responders, set three measurements as proxies for their goal, which was increase the speed of the data collection to publication, increase the number and quality of the center's connections in the humanitarian space, and increase the use of their humanitarian data exchange, all of those leading towards that mission of getting the data used. But that last link, the causality one, turns out to be the really hard part in this work. As came up earlier today, most groups don't have in-house researchers trained in methodologies, and there's a lot of pressure subconscious and otherwise to take credit for forward progress without really interrogating its origin. Even in an ideal situation, research requires resources, which is why a lot of the impact work we see was directly funded by a funder who provided those resources and then goes away when the resources do. So let's get into the reasons or the problems. The first is that systematic measurement of the impact of TicTac is difficult because there's a lot of variation in the kinds of metrics we report and collect. Even some groups working on the same issues as each other will report and collect different metrics from each other, and maybe even brand their impact around a given set of metrics, which kind of suggests that some of the impact measurement we look at is a fairly subjective game despite all the numbers involved. And this is research from Network Impact who did interviews with groups like Code for America, Living Cities, Community Planet at Emerson Engagement Lab, and they found that even within the same civic technology projects, at different points in the lifespan of the product, they collected and measured different metrics. So early on when you're building your product, you're really focused on getting users to sign up and use it, whereas later in the product's lifespan, you're looking at general usage patterns to see where should invest your development resources. So if you're familiar with the lean startup methodology, this he used nicely to it, but along the terms of civic tech impact. And then not only do groups prioritize different impact measurements over time, they share them at very irregular intervals, which if we're trying to do longitudinal analysis of a variable over time, it's very difficult if we're an independent researcher to get that data to do that work. So sometimes groups might share a lot of information, like when they're looking for media or fundraising or when things are going really well. And then they might dry up and stop sharing information so regularly when they don't have to or when things aren't going as well, which is a shame because that's the time we really want to learn from what's happening. So just from the slides here, please correct me if I'm wrong, but I could not find Code for America reports from 2015, 16, and 17, even though they had really nice reports in 2013 and 2014, I know that they collect great data internally and share that data with others, like funders. But if you're trying to do that work outside of the group over even a five-year span to show their impact, it's going to be really hard. DoSomething.org, on the other hand, reports quarterly metrics and stories, sometimes by annual. And then there's a lot of groups kind of in the middle of qualitative and quantitative change.org, CARE2, of OZ. Will they share the top-line metric and then a lot of stories that you sort of fill in the blanks with the stories and say that they're universal victories? We'll get back to qualitative reporting. Okay, but we do have one common metric in this space, which is scale. How many people did you reach? And this one's easier because if it's a big number, you hear about it. And the juggernauts, like OZ and CARE2, put it right on their home pages, 42 million email members, 8 billion signatures, but we don't hear about it from most civic tech products. So can we assume that it's not a big number of users? Probably. And Mika's going to cover this in greater depth in his failures of civic tech talk in the next session. But the majority of civic tech products that we've evaluated reach well under 20,000 users. And then even well-funded projects like Jumo and EveryBlock have reached a very small percentage of Americans. And the reason I put the US population and global population here is that when we're talking about civic tech, we're actually talking in terms of reaching the public at large, at least within our country. And I like to just measure, you know, what percentage of that public are we actually reaching compared to more universal experiences like sports or food or music. And the answer is not well. We're not reaching that many people or that big a portion of the world, even our most successful examples. And that's one reason I love the work that some media team is doing at Facebook embedding civic engagement in products that people already use, you tend to reach a very large scale of people. And just in case you thought I was letting myself off the hook, if everyone I knew read my tweets, which they definitely don't, I would be reaching 0.0000008%. So that product's not going so well either. And then this one, if you've ever led an organization, you already know this, but you have different constituencies and they want different things. And when it comes to impact measurement, it means the different constituencies may want to see completely different metrics and results before they consider you effective and you can't ignore one or the other. Your actual theory of change might say we need more cosponsors and that will help us pass legislation that will make our group effective, achieve our goal. But funders might require you to score and track your media appearances. The public might want compelling social media updates and to see you responding to what's in the news that week. So you have to execute on some of each of these, or any of them, but optimizing for one mode of impact might actually depress another. One organization was really proud of how much its APIs were being used by developers, but their funders wanted to see them making progress in media hits and coverage. And you can imagine if you're on the tech team there and you can only build one thing, you might build one thing to optimize developer API keys and another product entirely to get media headlines. So then even when we do adequately measure the outcomes, we often fail to measure that in relation to the resources that were invested in the work. And this has a lot of consequences for our space. Resources might mean grant money. Resources can also take the form of public attention, especially if one group dominates the public's finite attention span for an issue. And to pick on an example that's a little older, let's take Jumo. Jumo was going to be the one single central platform for nonprofits online, except for all the other single central platforms for nonprofits online. With Jumo, the inputs included $3.5 million in grant funding, a high-profile launch, individual contributions of thousands of nonprofits, and a million people creating profiles on the platform. And under a year, Jumo folded. Its value was appraised at $62,000, and it was sold for that much to good, along with five MacBook Pros. So in addition to the $3.5 million invested, all that community effort went nowhere. But let's look at the inverse. Groups that receive very little in terms of inputs and punch well above their weight in terms of outcomes. I give you three. GovTrack.us serves 7 to 10 million visitors a year with information about legislation and the federal government. And they receive $0 from outside groups. They're basically bootstrapped. This example is from Mika. Chris Messina is the first person using the hashtag on Twitter, which was sort of a social practice innovation that took him zero resources, and Twitter supported that practice with relatively few resources. And if you think about all the hashtag-based movements, we've seen this century that's had untold value. And then the 92 Second Street Y started Giving Tuesday, and they marketed that idea with pennies on the dollar for the hundreds of millions of dollars that Giving Tuesday has raised since then. Another slide is that funders might be willing, more willing to invest in open brands like Giving Tuesday if we kept better track of outcomes relative to the actual input delivered. Another kind of abstract issue, but this still affects impact measurement in civic tech, is that our perception of a group's impact is inextricably tied to the macro environment in which they operate. So what I mean by this is we inadvertently penalize groups working in very difficult contexts, and we tend to ascribe a lot of prowess in an easier journey, and sometimes just lucky timing. So countable is one of many, many tools in civic tech that let you contact your representative and weigh in on legislation. They were really smart to build a mobile, first user-friendly product, but do we say that the most effective civic tech group of all time because they launched right when Trump was elected and millions of Americans suddenly are re-engaged in politics? And on the inverse of that, do we remember to sustain the groups doing the really hard work, knowing that their impacts can be really hard to tease out and perceive in an environment that just doesn't support it? So quantitative metrics. It's really nice to have numbers in impact metrics, but anyone who's worked in a large corporation might tell you that they avoid being measured and having their salary tied to key performance indicators, and that's because metrics rarely keep up with the world outside. We might actually have insane success on our key performance indicators, and it won't matter in terms of our actual overall objective and reason for existing. So here's a personal fail for the contribution. In 2016, I worked on the digital campaign for Hillary for America in Brooklyn. And by measurable impact, we did amazing. We broke every record on voter registration. We sent 35 million tech... Volunteers sent 35 million text messages to other voters on a new platform for peer-to-peer text messaging, all targeted and actionable messages. We were endorsed by almost every single newspaper in the United States of America, which is unheard of. But you already know that we failed our one key objective, the whole reason for us existing, even though all of our empirical data showed us winning. So specific metrics can easily miss the force for the trees. And as I looked at all these impact reports and reflected on impact metrics, I started to really fall in love with narratives and case studies because they can embed the true driver of an outcome. Jonathan spoke of this earlier with the phrase causal stories. Nicole at Engine Room has referred to this and their work helping impact other groups create impact as the action chain to impact. And I really like that phrase also for just linear narratives. ProPublica was funded to write an analysis of their impact. And it's a very qualitative journey because they do investigative journalism. And they can count how many times a story did really well and created change in the world, but the change itself is very qualitative, even though it's deep impact. But there's issues with qualitative, which is just bias in the reporting. Often case studies and qualitative reports are commissioned by funders who want to see good news because they're funding the group. Or it's written by peers who have a social tie to you. Or the group itself wrote this study with the survey data came from their most active enthusiastic users. Even academic articles I've read on issues like, say, participatory budgeting, just to pick one out of my head, are often written by champions of that trend or technology inside of academia. And that's fine if we just want to spread the word about the group or the technology. But it's less good for getting us to the heart of TicTac, which is knowing empirically if we're actually having a relatively positive impact on the goals that we state we are. So I personally like to see more independent authors, more balanced reporting, and just more recognition of setbacks and failure of these if we're going to learn from them. Number nine, causality is really hard to prove in societal context. You know, my society has done great research on citizen reporting to government and the Facebook and Google civic teams have done really strong control variable treatments. But there's a lot of cases where we'll never be able to know the true impact. And one example is with the Sunlight Foundation, which has received millions in funding over the years, but they also identified that the US federal government's own expenditure tracking was off by $1 trillion, $300 billion. And some argue that that discovery led directly to the passage of the Data Act, which in the US standardized government spending reporting, and that alone will prevent untold amounts of fraud and spending. So there's a lot of, you know, counterfactuals and hypotheticals that we'll never be able to put a number to, even though there's clearly a value there. And lastly, this has already come up today, and I am happy to say the unintended consequences of our work could end up being more impactful than our work itself, which is really daunting to think about and sit with when we are trying so hard in the work. Tom Steinberg talks about 19th century medicine, where most of a century of medical progress was lost to the idea that we're helping others when actually we were harming them. And in our own space, does transparency of government operations produce more cynicism than civic engagement? Does opening up property data in a distressed city like Detroit actually empower land speculators rather than residents as Jessica McKenzie found in Civicist? Does connecting the whole world to one another produce an exploitable fear of the other rather than community? So we need to consider the byproducts of our work on a historic timeline, and the point is we don't really know most of the time, but we better consider it, which is why you're all here. Let's skip that. And yeah, this work is the reason that we're revamping our Civic Tech field guide. It started as a massive inventory of Civic Tech tools. It's a big spreadsheet, but what Mika and I are working on now is can we add more qualitative and impact reporting to that guide. We'd love to invite you to contribute to that research and add, well, contribute your research to that guide so that when someone's talking about a given Civic Tech trend or technology, they can find where the impact measurement has already happened and learn about your work. So thank you, and I really look forward to learning from you all this week.