 For those of you just joining the room, welcome. We're talking about cities and data, and we have up next is Kate Rabinowitz. She's the founder of DataLens DC, where she analyzes and visualizes data to tell stories about Washington DC. She also works with clients to bring data to life through data science and visualization. She's the co-captain of Code for DC, a volunteer civic hacking group, the co-founder of Tech Lady Hackathon, an event series promoting women in DC tech, and she's a recipient of the 2016 Technologist of the Year Award and 2017 DC Femtex Data Scientist Award. So she's pretty awesome, and this is gonna be a fun talk. So welcome, Kate Rabinowitz. Thank you. Hey, everyone. We're gonna be talking about open data and cities today. So let's jump right in, starting with a definition slide of what is open data? So open data is data that is freely and ideally easily accessible, with little to no restriction in terms of how you use the data, what you do with it, if and how you go on to share it. Specifically today, we're gonna be talking about government open data, and it's important to remember that as citizens, we are owners of the government, and as such, we have a right to the data that it creates in almost all circumstances. And this is important because government data is very valuable in terms of social and economic impact. It is awesome, especially when we look at what it can do for cities. So with open data for cities, we can answer questions about our cities, we can spread conversations, we can uncover and solve puzzles, we can create change. Let's go through a few use cases to see what that looks like in practice. So these are a series of maps that I created for my city, Washington, DC, and it looks at how different neighborhoods in the district have changed across the past decade by race, income, and age. The darker, the blue, the whiter, wealthier, and younger that neighborhood has become. These characteristics are often associated with gentrification. When we bring data to these discussions, we can get a better idea of what gentrification looks like and where it is happening. Within these three maps, there's one neighborhood that is dark blue across every single map. I'm gonna move, I apologize. This neighborhood is Navy Yard, and in the past decade, Navy Yard has had a huge influx of new residents, generally whiter and wealthier than long-term residents. It has had a lot of new construction, new apartment buildings, even a new baseball stadium. With open data, we can identify these kind of hotspots of gentrification, see what it looks like, and start to think about how we can support the communities that are affected by it. Open data can also help us understand a social issue within our city from a different perspective. So this is the Million Dollar Blocks Project in Chicago from Data Made Co. And what they're doing here is they're looking at crime in the city of Chicago from a different perspective. Typically when we look at crime, we think about a map that has points on it of where a crime has occurred. And this kind of perpetuates a notion of crime as an individual and isolated act. What Data Made Co. has done here is they've visualized data on how much money the government is spending per block in Chicago to incarcerate residents that would have otherwise lived on that block. So we're flipping the perspective here and in doing so, getting a bit more to the institutional causes and societal costs of incarceration, we can start to see these kind of heavy concentrations of where that money is being spent. The kind of lack of people in that neighborhood as a result of incarceration. And on some blocks, literally a million dollars being spent. And we can think about that in juxtaposition to how much the government is spending on social services within the neighborhood. Do you have a question? Sorry, I should put a space there. Okay. So open data can also help improve communities. This is an example from Living Lots NYC from 596 acres. And they've used open data to identify vacant lots within New York City. There are hundreds of vacant lot spaces in New York City and they were able to identify the vacant lots and then who owns those vacant lots. They created resources for community leaders to then work towards converting that into either like a park or a community garden or whatever the community and the lot owner came to an agreement on. I think at this point, there have been about 80 acres converted, which for New York City is actually a lot of space, but using open data, they were able to systematically identify an issue at the city level and then bring that information and resources to the community level to help fix that problem. Open data can also help hold government accountable. So ideally government is always working in our best interests. Transparency and open data are really helpful in providing checks on that to make sure that that is the case. IQuant New York was playing around with parking ticket data and found that the city police department had been systematically ticketing legally parked cars to the tune of millions of dollars of fines a year. So there was at one point, a parking ticket category that was totally legitimate and then it changed. It was no longer a parking ticket, but the police continued to hand that out. And in this case, once this became public, the police department did remedy this in terms of at least going forward, they are not handing out these parking tickets anymore. So these are a kind of handful of examples of what we can use open data for in our cities and a next question might be, where can I find this open data? And there are a few different spots. The kind of like main website for federal data is going to be data.gov. And that's gonna have your headline, economic, social, business demographic data. There are nearly 200,000 data sets on data.gov. Your city also collects a lot of data. And if you live in a major or a mid-size and in some case sometimes even small city, at this point they likely have an open data portal. And there will be a wide range of data on that portal, in some cases, from property tax data to public transportation data, to crime data, to demographic data. And all right, I'm sorry. I, the examples that I gave you, I think are really excellent use cases for what individuals can do with open data. And I think that's awesome and exciting. But when I start to talk about open data in practice, I just, I have to be real with you guys. In practice, working with open data is kind of terrible. And this is an emotional journey that I've been on. I think many open data practitioners feel the same way. It is weird to say this at a pew right now. But the promise of open data is really awesome. In practice, working with open data has many, many challenges. Some of you were maybe rolling your eyes when I was saying that data.gov has nearly 200,000 data sets and that is rightfully so because a number of data sets is a terrible metric for the success of open data. Let's use data.gov as an example. So of the nearly 200,000 data sets, two-thirds are geospatial. The most common tag on data.gov is oceans. So about a third of the data sets on data.gov are about oceans. When I go to my open data portal for information about my country, I typically mean like inside the country and not the ocean surrounding the country. So when you go to an open data portal and you see that highlight number, it might not necessarily be what you would consider high-value data sets. It also might not be in a format that is particularly friendly. This is less true for at the city open data portal level, but for the federal government, PDF is number two in terms of data format type. PDF swear data goes to die. The friendliest of data formats, CSVs is not even in the top five. Number of data sets is also a bad metric because it can so easily be manipulated. I do not think this was their intention when they did this, but if I look at what I would consider one data set on the DC Open Government Portal, moving violations, it is actually dozens of data sets because they have broken it into monthly moving violation data sets. So when you see that top headline number of number of open data sets, it is often a bit more deflating as you peel back the layers. It might not be the data that you were hoping for. It might not be in the easiest format. So if you are using or thinking about using open data for a city, you should have FOIA as part of your toolbox and you should also consider advocating for more open data because there is a lot of work to be done. But let's say you have your open data through the Open Data Portal. That is success number one. Working with that data is going to be challenging because open data can be very messy. The way that we're talking about using open data today in many cases was not the original reason that it was collected. So it can lack the infrastructure or it has so until very recently for accurate, complete and clean open data reporting, there are still city agencies that collect this data by fax. In many cases, the system where they hold the data is entirely different from the system that is their open data portal. So translating between these two things can be harder and more manual than you would like. Working with open data almost always has more free form text than you want it to. And this goes back to data analysis not being the original reason for collecting something like business permits. Data standards are not particularly common. On the upside, if you are working with open data for your city and you clean it, you might be the first person to ever get a clean perspective on that data. In cleaning data though, we make decisions. And this leads to my next point which is that open data, even when clean, is far from fact because there are so many opinions that go into any data really being made. So data is shaped by the people around it. We should consider it just as bias and imperfect as the people who create it. Open data is sometimes talked about as a kind of absolute truth or a kind of catch-all solution. It is not, it is a means. And when we talk about and use open data, we have to put it into the context of the environment that it's operating in. So open data is complicated and let's talk a little bit about what that looks like. When you're working with open data, there are always some questions and checks you should be doing really when you're working with any data. The first is asking who collected this data and how did they collect it. This is kind of an involved example here so let's take a sec to walk through it. In DC, as with many cities, there is a Vision Zero initiative to make streets safer for everyone. There is an app in DC and many other cities where citizens can report issues that they have whether it's a cracked sidewalk or a blocked bike lane or a speeding car. This in particular is pedestrian issues. So that's what we see on your left and what we see on the right are crashes that involve pedestrians. So this is not, neither is a perfect corollary for pedestrian safety, but the problem with the one on the left which is data collected by people logging issues into their apps is that you are sub-setting who is going to take part in that from the start because not only does a person need a phone or an internet connection or an app, that person has to know about the initiative, take the time to actually care about the initiative and think that their government is going to effectively respond to their issue. And this sub-set is typically a wealthier sub-set than the general population. So when you're collecting data through a website or an app, you're not going to hear everyone's voices and it's important to consider who you're hearing more of and who you're hearing less of. Here I have circled one particular neighborhood because it's where I live, Capitol Hill, for comparison. On the left you'll see complaints and if you look at complaints surrounding pedestrian issues, you would think that Capitol Hill was like a pedestrian death zone because there is just so much heat going on there. I live there and I can tell you that it is like a walker's paradise. I can also tell you that Capitol Hill has probably like their PTAs are probably the most involved of like anywhere in the city. So when you ask people their opinion through something like an app, you're going to get an opinion that they think a car was speeding but you're also going to get something like how involved that neighborhood would be in a PTA meeting because when we look conversely at pedestrian involved crashes on the right, we see that like Capitol Hill, it's actually, it doesn't seem that bad. So it's important to put your data in context in terms of who's collecting it and how they are collecting it. Once that data is collected, there can be a number of decisions made about what to do with the data. And often when you're working with open data, you're going to find anomalies. And so this is a line chart of when homes were built in Washington, DC based on open data. So if I was just like running with this, I'd be like, wow, 1900 was a boom year. Can I commission a historical study on what happened at the turn of the century in our nation's capital? But that's not at all what happened. What happened is that when the city doesn't know the year that a home was built, but it's like a pretty old home, they just tag it as 1900. So this is an example of an anomaly in the data being entirely by design and not reflective of reality. Policy also lives in your data. So here's another anomaly. This is parking tickets in DC and they are parking tickets specific to inspection stickers. So if your inspection has expired, if you fail to report for an inspection, DC is very small and there's a lot of like interaction between DC, Maryland and Virginia. So looking at this graph, I'm like, okay, DC, Virginia, like roughly same levels of parking tickets that makes sense, like what's up with Maryland? Once again, if I was gonna run with this, I'd be like, those Maryland drivers are fantastic. Let me commission a study, let me data science how everyone can be a Maryland driver. Maryland drivers are not fantastic. What's actually happening here is that in Maryland, you only get an inspection once and then you never need an inspection again. So because of that, you can never be ticketed for having an expired inspection sticker. So policy lives in your data and when you're working with your data, you have to consider the legal and environmental factors around it. Data also lives in society and it's really important when working with data that you put it into that context. This is the racial dot project out of University of Virginia and essentially every dot corresponds to the race of a person. This is looking at Detroit, you can also look at it nationally. Looking at this map, there are some unnatural things going on so that line of separation up top and then once again, this weird loop-in of blue, we see that it's highly segregated and if you were to leave it at that and just let the reader come to their own conclusions, in my opinion, you're not doing enough of putting the data in context. There are very specific reasons why this looks the way it does. There were policy decisions made, there was redlining, there were laws and when you visualize data like this, especially when there's people behind that data, it's important to put that data into the context of society, the laws and the decisions that we've made. So open data is awesome, at least the promise of it is. In practice, working with it is kind of terrible at times, but hopefully we can use this data, putting it in context and working with it responsibly to make better use of our cities. So let's do all the things with all the data for all of our cities, but actually don't do that at all. Only do ethical things and be mindful of the flaws in the data and also put it into context and so all the things but only the good things, please. Thank you. Thank you.