 All right, so hi CSVConf, it is so wonderful to be part of this and Andrew and I are delighted to talk to you about this tricky little problem of whether and when and how you close data that's been opened. So I am the open data librarian for the Washington State Library and very recently it became part of my job to help curate and manage data.wa.gov which is our state government open data portal. You see a screenshot of it there on the right. Data.wa.gov is about a decade old, it is one of the more seasoned government data portals in the United States and for most of that decade the emphasis has understandably been on growth. Getting more of our state agencies to establish an open data plan which they are all supposed to do and identify public interest data and publish more of that data on the portal. And this has all happened in a very decentralized process. So the office of the chief information officer in Washington State provides the infrastructure, provides this open data platform and a little bit of support but basically it is left to the state agencies themselves to select the data and prepare it and publish it and hopefully maintain it. And a lot about that decentralized process has worked really well. So here we are in 2020 and we have dozens of state agencies publishing hundreds of data sets on the portal and we have been talking to them and it looks like use is just going to accelerate. If you can go to that portal and you can find wonderful public interest data on campaign contributions and consumer complaints to the Attorney General's office and salmon counts and other environmental data and a lot of our agencies do a beautiful job of carefully choosing public interest, user-centered data and preparing it and documenting it really well so that when it is published it is really usable. However, not all of the data falls into that category. So along with that beautifully prepared data we also have a few data sets that are clearly identified as practice data sets but they're still there. And we have a number of data sets that I think of as sort of like little jigsaw puzzle pieces. It's a little bit of budget information for one agency in 2013 or it's some test results from one location in 2015 and it's never updated and it doesn't look like anybody uses it after maybe the first month and it's just sort of living there. And this makes the collection, the collection of data on the portal sort of noisy and noise is a problem because noise makes it harder for the user to find what she wants and it also just lowers confidence overall on the portal. We want people to go to the portal and have confidence that they're going to find something good there. So enter the library, the state library. So luckily for the state library and libraries generally the person who for many years spearheaded open data in Washington state as well Saunders and well has worked extensively with librarians and library staff over the years and about a year and a half ago he and OCIO started talking to the state library about a partnership where the library would provide some curation services for the data on the state portal. And this is doing some pretty straightforward library stuff with the collection of data there. The first thing was that the library would come in, it would help to prune some of the so-called low quality data, clean up the stuff that was left like, you know, fill in the data dictionaries, maybe add some categories, make things a little easier to find and generally support data quality and circulation and usability going forward. So state library loved that, a partnership was forged, this is now half of my job. But when we were getting started, we wanted to have a baseline assessment. So it was a good job for a summer internship. We turned to the open data literacy project at the University of Washington where I had been an intern when I was in library school and ODL connected us luckily with Andrew who came in to conduct this initial assessment. So we got Andrew all settled and he dug in and started assessing the data but very quickly as he was looking at the data and in talking with Will they realized that there was a little hitch. And that is that there isn't actually really this clear path for retiring or removing data once it's been opened. So in a library, selection, adding content or deslection, weeding content happens and has a single basis and that's the collection development policy. So ideally a library has a policy that's written down and it says this is our community. These are the needs that we believe our community has and in order to meet those needs, we are going to collect these sorts of things and we're not going to collect these sorts of things. And so as the library adds things or takes things away, there is this document that provides some clear transparent criteria and it's there to engender confidence and trust in the institution. But remember data.y.gov like all of the government data portals that I'm aware of in the United States operates in a decentralized fashion. So the agencies themselves are the ones doing the selecting for a variety of reasons and they're the ones who are publishing and maintaining the data and technically they can also just delete their data. So the screenshot you see there is my activity log when I log in as an administrator and owners, data owners can just delete their assets. Now effectively they don't just yank data that the public's been using. Most of what's deleted is a draft or something that was never in the public eye. But this seemed to be a whole like it didn't really seem to be in step with the spirit of open data transparency and accountability. And so Andrew and I talked about like let's look around and see if we can find some policies or procedures that would be a good foundation for something we could use in Washington State. So this is what we did. We decided we would look at other state government data portals as many as we could find. And then we also looked at other domains that do selecting and weeding. So besides libraries that would be archives and depositories and scholarly repositories, museums, that sort of thing. And now I'm going to turn it over to Andrew and he's going to tell you a little bit about what we found. Can you stop sharing your screen Kathleen? Yeah. Okay. So I'll pick up where Kathleen left off and I think my screen is now being shared properly. Can everybody see that? Okay. I'm assuming everybody can see this. So we... Yeah. Looking great. Great. Thanks. So we looked at all 50 states and we found 91 open government data sites, 49 of which were geospatial sites, 13 were transparency sites that are like figures about financial data or their meeting minutes. 24 sites were make tabular data available like data.wacup and five were other open government sites that didn't really fit in any of the other categories. We also sent 53 emails to contact links on a set of those sites and we got 29 responses and we were asking about do you have a data removal retirement policy for your portal? So I assume many people have seen open government data portals but just in case this is one of them for Oklahoma, it's based on the decan software, you can see a list of data sets and descriptions and then on the left there's some facets to help someone browse. Same with data.wacup, it's on the Socrata platform. So these were the sites that Kathleen and I were looking at and we were trying to find contacts and find policies and procedures. So those 91 sites, only six sites and six states had any type of policy that we could find and so five of those listed criteria for removal and that's that little word cloud there. Only replaced by a new version was that was in two policies everything else was all unique. You can see mostly it's about whether data that was obsolete or it's been replaced new version sometimes if there's an accuracy or it's private data. Five states also had some type of procedure published about how to deal with removing data sets. So four labeled retired data sets and three had a waiting period where a data set might get labeled and then 30 or some number of days later it would be unpublished from the site. Now there is one really excellent example of a policy and procedure around removing open government data and that's on the New York City open data portal which was out of our scope the city portal not the state portal but it's it's a great example so you can Google this you can find it all those documentation is there they have a data set that lists all the removed data sets and why they were removed and there's this nice flowchart that describes what might happen to a data set maybe it's considered for our tribes maybe it goes somewhere else. But a lot of state portal probably don't have the resources to implement this kind of thing and so it's nice to know it's out there maybe there's some useful aspects of that the can be used in state portals. So we did find evidence of data sets being removed, retired, archived both in our searching and through our correspondence with portal managers. We heard from the majority of portal managers mentioned that they thought about this that they're working on it that they're interested in this. Some managers even mentioned yeah we've removed data before. On the screen you see examples of data sets that have been less labeled archived and this is a great way to let a user know that maybe there's newer better data available especially this archived version points to that new data but it has to be done well it has to be well documented the lower right one beaches archive data that is confusing could be an archived data set or is it a data set of old beach data that is maybe updated or regularly so it needs to be done with care and I should say I personally had like tried to find a data set I saw before and it was gone so personal experience of this. So if the data set is removed a user might want to contact the portal or figure out if they can get that data and we noticed that portals can be very faceless it can be very difficult to figure out who is behind it and so this is your 5 minute warning so okay got it and we have some examples here of our correspondence so you know one portal had a data set owner who called themselves the open portal kung fu master kind of cool but also it's hard to tell who you're emailing there and then Kathleen got this really nice email and it was just signed portal administrator so not all portals are like this though some portals have really great contact and documentation information it really kind of builds trust the user if you're looking for something you know where to go this is Delaware Maryland and DC and there are definitely other examples with great contact and documentation so like Kathleen mentioned we looked at several domains for guidance on what to do with government data that might be removed from the portal we considered archives we talked to archivists we thought about retention schedules for government documents we looked at research data libraries and the data curation life cycle and we also considered just public libraries can open government data just be weeded like you would with a library collection and we have we've settled on that it doesn't really fit in any one category although any given data set might you know it might be appropriate for the archive there might have some retention rules around it but they seem to fit best data sets seem to fit best between research data libraries and public libraries so research data libraries have really stringent appraisal and selection guidelines and then there's a deaccessioning guidelines takes a lot of work training to do that right data sets aren't necessarily updated all the time open government data though is all about being updated and current relevant and so it didn't seem to fit quite right in that that bucket public libraries seems like a possibly good analogy but when you read a book from a public library a user or a patron can get that book elsewhere interlibrary long publisher open government data set though might be the only copy and so removing that much more difficult for the user to find archives are typically focused on data set or on records that are original that need to be preserved for long term some open data may fall under that umbrella same with record retention it's mostly focused on like records or maybe some types of publications but open data again is just a copy open government data copy of data that's sitting somewhere in a state agency and so most likely archives and record retention rules are applied to the data the original data not necessarily the open government data version so with that I'm just going to keep transitioning slides but Kathleen is going to talk about what this means or data dot one right so I'm continuing to be the curator and as the curator okay so I don't order people around but I make recommendations to OCIO and to our open data advisory group so yeah going to recommend that we have a written data retirement policy identify some of those criteria that we found elsewhere have a clear procedure some notice so that we don't you know yanks and data that somebody was using and then have some clear labeling to make a record of what's been removed and why also good to kind of go back to the beginning and raise the conversation about selection again this is a decentralized environment we're not going to take that away from agencies but it's great to have this conversation in that spirit of collection development which is just who's our community who are our users why do we publish open data what are we trying to do so it's good publishing open data is a lot of work it's good to have these touchstones that help us to remember why we're doing it of course we were talking about lower quality data that we want to remove but thinking about that whole continuum it's a great reminder to you know at the 10-year mark talk to our wonderful digital archives in washington state about long-term preservation and historic value and finally um we don't want to be one of those faceless um data portals we want humans on data dot y dot gov humans are on data dot y dot gov if anything has come up in the past couple of days we've heard these wonderful presentations reminding us that data is not this anonymous antiseptic neutral magic thing it is a human endeavor with lots of human work and human ideas and goals and interests and values and we do a service to the public by reminding them of that and we can do it really easily by just having my name on the portal for help and also some who we are documentation so that's where we're going that's what we found and we're happy to answer any questions if we have time thank you very much both that was really fantastic we have got time for a very quick question and it's a question from sarah and it says how has this research influenced your approach to curating washington's open data platform well um so certainly the recommendations that um that we just talked about and um i think that it's also helped me to focus on how we're going to do this in a collaborative way um working with uh our publishers um many of whom as i said are have been laboring in the vineyard and doing just wonderful user centered work and so it's a great opportunity you know it's what librarians do we circulate you know we find things and make exhibits and we promote really good quality material and we have plenty of that so it's it's nice to be in a position to raise their profile brilliant okay so we're exactly on time so andrew cathleen thank you very much for that and thank you to erica and andres as well um it was another fantastic session and uh we have one more session for the day uh i believe uh which will be followed by a drinks zoom session so if everybody would like to jump to those things thank you all for coming