 Hi, everybody. Thanks for coming to my talk on government data speed bumps. I'm Kathleen. I'm a librarian. I'm an open data librarian for the Washington State Library, which is a government library in the Pacific Northwest of the United States. My job is to help libraries use open data in a variety of ways. I also answer reference questions from the public that concern open data and a lot of the time I work with the state's open government data collection on the official open government data portal, data.wa.gov, and I work with the open data program manager on that. I originally had the slide at the end, but then after I heard the keynote on collective creation yesterday, I decided to move it up to the front because everything I'm going to say in the next 15 minutes reflects conversations and work with all of these people and a lot of day to day problem solving with government agencies and members of the public on open data. And I appreciate it and hope I represent it fairly in the next few minutes. So I'm an open data librarian. I've given lots of talks on open data and how great it is. And I usually put the benefits in these three categories. So transparency and accountability of government itself, better government services. Usually this means government is sharing data with other agencies. They're able to streamline and optimize service. And then innovation. And depending on whose paper or framework you're looking at, this will be called different things. It might be called economy, business, entrepreneurship, creation, citizen services. I'm going to use innovation sort of broadly to mean basically any kind of new service or tool you can create with open government data. And a really common example of innovation is a mobile app that is drawing government data. And this is the mobile app I've used the most in my presentations, One Bus Away, which is an app that draws on standardized public transit data. And I can look at my phone and see how long I have to wait for my bus. So here's the dream, all the benefits. And these absolutely do happen all the time. I believe in them. And here's some of the realities that I've encountered day to day as an open data librarian. So if you find this a very noisy slide, that's the point. This is a data conference. So if you're skimming this list, you're probably looking and seeing like, yeah, it looks like a list of things that happen when you're working with someone else's data. It's a short talk. I'm not going to tick through this list. I'm going to go through three cases that have come up in the past year in my work. And I'll let you tick through the list and listen for it. So in other words, I'm going to let you play bingo. So if you go to that URL, it will take you right to an online bingo card. You don't have to enter your name or anything about it. It'll just let you mark spaces. If you're not familiar with the game of bingo, very simple, you're given a grid. And if you don't want to use the online link, we do have some paper copies in the back. And Sarah has them. So the grid is like five rows by five columns. Each grid has a number or an item or a topic. And when you hear that, you mark it down. And if you get five in a row, you win. And you yell bingo, right. So as I talk, listen for this. And we'll practice a little bit with the first example. So here we go. Here's a simple example, and it involves libraries. So this is a map created by the US federal government. And it shows the quality of internet service or the lack of quality of internet service throughout the United States. So like lots of countries, during the pandemic, we found out that a lot of places in the United States have really poor or no internet service. And now the country is spending billions of dollars to fill the gap. So this is really important for libraries. Libraries have a special role in internet service. They provide internet service at their buildings. And they also advocate for their communities to get internet service because it's access to information. We're a state library. We help libraries do planning for things like internet needs. And we thought, this is so great. The federal government has already pulled together all of the major sources that we would have pulled on our own. And not only that, they've provided the underlying data. That is definitely not always the case with maps and dashboards. Sometimes you have these nifty interactive tools, and you click around, and then you go to look for the underlying data, and you can't get to it. But not here. Those links are county data and then smaller and smaller areas. And we thought, they've provided the data. We can hook into this, create our own little app where a library can enter its name, and then just get the data for its community, its service area, and then see some of the neighborhoods that have the poorest service and might need more support. And we worked with students at the University of Washington and started to dig into this data. And right away, we ran into one of the most common speed bumps, no API endpoint for this data. It's really, really common. In fact, at the data portal, data.wa.gov, one of the things we tell agencies is if you put the data on our portal, you'll automatically get an API endpoint. That's one of the big benefits. But lots of government data exists as a spreadsheet posted on a website. And if it's reliable, accurate data, and it answers questions that users care about, they're happy with it. And they call me when it's not posted or updated frequently. They like the data. And if users are interested, they can write script or find another way to get programmatic access. So that wasn't a dead end. What was a dead end was that once we dug into the data, we realized that the most comprehensive data was still at the county level. So we really needed smaller and smaller areas. And once we got into smaller areas, there were fewer data sources. So it didn't work out for us. On the other hand, highlight was, again, unlike lots of data, open public government data, this has pretty good documentation. And if you read through it, the documentation actually says, this may not be as granular as what you need. It also includes an email. And when our students emailed humans to get some answers, humans wrote back to them and helped them with their questions. So just to practice, anybody find things for their bingo card? Yes. Yes. Anybody want to volunteer who wants to be brave? Yes. I also meant doesn't answer your question. Yes. Good. All right. You're on your way. OK. Next example, this is an organization that used lots of different government data for a variety of reasons. So a couple of special things that make government data cool. Similar kinds of government entities collect similar kinds of data. So all cities and towns, all counties, all US states collect similar kinds of data. All public libraries collect similar kinds of data. All law enforcement agencies, all departments of transportation. So you have lots of comparables. Other thing, government collects similar kinds of data over time. So every week, every quarter, every election, every year, every 10 years. So for getting a comprehensive view and for getting a longitudinal view, government data is pretty valuable. And that, of course, is helpful if you do planning and track trends, which is what this organization does. They pull lots and lots of government data. They generate their own analysis, and they create their own indices and reports. And this is meant to form the foundation for sustainable, equitable development in the Seattle metropolitan region. I got an email from one of the staff members saying, hey, I'm missing this education data for the year 2021. 2020, 2021. That turned out to be pretty simple. The pandemic had disrupted data collection. They had chosen not to publish for that year. But I took the opportunity to talk to the staff some more about their experience with government data. And of course, it was full of reality checks. They said, yeah, it is great that every county collects this data or that data, but guess what? They use different terms. It's structured differently. If you went to the session yesterday on police misconduct and migratory patterns, they had a perfect slide for this, showing all kinds of reports in all different formats from various police departments. There's a lot of reconciling. And that's if the data is open in the first place. Lower capacity communities may not even be able to publish their data. And yes, it's great that the government publishes data year to year or week to week or quarter to quarter. But sometimes the data changes in between installments. They add columns, they delete columns, column names change, the methods behind the column changes. If you're writing a script, which the staff member does to grab that data at regular intervals and it's pointing to the wrong thing or it's that's changed places, then the script isn't gonna work. So to make a long story short, the staff said, listen, we're trying to make something new with the data. We expect to be doing some cleaning and reconciling, but especially with that longitudinal data, it sure is helpful that somebody documents the changes, puts it in a place where I can find it, and has an email if I have questions. So in the case of the missing education data, the agency did a great job explaining why that data had not been published for that year. It's just that that document wasn't easy to find from the places where the data was kept, like the website or the portal. So last example, this is a case of something that was really like a temporary problem and just opened our eyes to two wider reliability issues. So some of the best data we have on the Open Government Data Portal is about licensing. Got the government issues licenses for certain professions and certain kinds of work. We got an email from a city that runs an application drawing on some of this government data and it was starting to break, it was starting to glitch. So Kathy, the program, Open Data Program Manager and I got into it and two things came up right away. One was that there was some confusion on the part of the customer it appeared on what the unique identifier was. So this is confusing, this is all business data, every business has a business identifier, but businesses can have multiple licenses. It was the license identifier that they needed to be looking at and that the app needed to be pointing to. And before this, we had never asked agencies when they filled out their metadata to explicitly explain what the unique identifier was. That's a lot of what I do. We've created metadata guidance, it's on the portal and it's always kind of emerging. Now we ask agencies to do that. Other thing that came up is once we realized that it was the license that was unique, we rolled up the data and found a handful of records that were duplicates, that wasn't supposed to be happening. And we noticed there were things like this, the address had, one was capitalized and one wasn't or the directional was in a different place. That was just a temporary glitch, but because we were trying to figure it out, we started asking more questions about the process. How does the data get in? Did it, did you fill out a form and then a staff member typed it in or does it come from another system? And then at what point does the system align that to one standard? This is your address, this is the only name we use. Because if the system doesn't do that, then it makes it harder for the user to know they're looking at the right thing. And this comes up with so much government data because it all starts with a form. It starts with legal cannabis data, it starts with public campaign finance data, licensing data, and the user is left sometimes wondering, are these the same businesses? Is this the same lab test? Is this the same doctor? So before this case, we never asked as much intentionally about the process and now we do. And the agency updated its metadata right away and by sharing this process helped us to improve our guidance. So innovation is not a snap. Even this app reflects years of work developing a public transit standard. But you can do things to make things go faster and more smoothly. And most of that boils down to one big idea which is that publishers need to work with data users as they open data. Ideally, at every part of the process, like, which data should we open? I don't know, what are people asking about? Okay, we know what we're gonna open, which college, how do we explain, work with the developers while you do that, they'll tell you. What if we include an email? And then another idea I get from a developer at the Department of Natural Resources is to create a space for developers, for users similar to the way Stack Overflow creates a space. This allows publishers to see some of the issues without compromising the identity of users, which of course is the whole point of open data. People should be able to use the data without compromising their identity or purpose. And if you use data, email. I've had people email me and I've had agencies fix things in an afternoon. It's not always that fast, but speaking up does help. Thank you. Get bingo. Not close? Well, all right. We can celebrate your effort anyway. Thank you. Thanks so much, Kathleen Tullivan. Do we have questions in the room? Just raise your hand and I'll get to you with a microphone. Yes. So one thing that I'm seeing in this conference in January is that US has a lot of brilliant people working with data, but still do not have a federal law to get data. So do you have any initiative to work with the government to have something like we have in Brazil where we have a law where we can ask any data from the government and they are, with a few exceptions, they are obligated to give the data. Do are we working something like this? We were just talking about this at lunch, weren't we? So no, we don't have a comprehensive law. There is something coming kind of down the pike at the federal level, but we were talking about this because we have mixed feelings about it. And this makes me think of the keynote yesterday, Karthik's keynote where he was talking about kind of building buy-in for new practices and that when you make people do things, they'll do it grudgingly and grudging open data is often not as good as the open data you get when you've built better processes that are easier for people and actually reward them and are really built around customer service and more less contentious interaction with the public. Do you have a favorite example of, say, a dashboard that shows the methods behind the graphs or indicators that it's showing? Shows the methods. Or links to the data set. I'm thinking of dashboards that bring multiple data sets in, maybe do aggregation. It's a problem I'm actively trying to solve right now, so curious for your suggestions. I asked somebody for an example this morning in another session and I was like, wait, I can never think of an example. So I can tell you that I do like the dashboards that our elections division does and Secretary of State because they widen the number of people who can participate in the data and I think in the last year I've been thinking about dashboards and visualizations a lot more that way because during COVID we saw so many dashboards that were fun at first and then became kind of walls where you couldn't get to the underlying data but when they engage more users than would have just looked at a spreadsheet our elections data shows ballot returns in certain voter demographics and it's really well done and it's a great way for anyone who doesn't feel data savvy to explore the data and the underlying data is right there and the elections data throughout the state tends to be pretty good. Yes. Probably you hear me anyway. But it's good for the recording. I didn't use a free, so I didn't win bingo. If you did say free, I won, hurry. All right, woo! Wait, wait, we get to celebrate. It's the only thing I won all week. Anyway, I empathize. I mean everything you've said up there, I work for a government agency, produces lots of open data and every one of those things in the bingo card was true for our organization as well. We're trying to get to a point, I guess, of just to answer some of the question there. We're trying to get to a point where all our data visualizations have access to the underlying data that exists and I think that's important, obviously, for people to be able to not just see the data in the graph or chart, but actually dig into the why the data set it comes from as well. What sort of user research have you done that enables you to do or go back with the evidence that you need to support the point of this open date during the people who are going to invest in it, right? What user research have you done so far? We just finished one of our first formal, we didn't finish it, the research firm that we hired, finished the first collection of usability testing, but that was mainly for things like navigation and how people search. Overall, not just at the Washington State Open Data Program because we talked to other public open data programs and it seems like everybody's thinking about the same things at the same time. You know, the end user has often just felt kind of invisible so trying to raise the profile of the user all together is really just kind of one of our bigger quests for the next couple of years. So in concrete things like creating developer space or community space where users can connect and we can kind of understand more about what's going on, reference questions help, that's more contact. So it's just, there's just a lot of work even before we get to testing and I remember you mentioned testing in your session earlier. Just to raise the profile and make sure that they aren't invisible. We don't have any more time for any other questions. Unfortunately, big round of applause again for Kathleen Sullivan.