 All right, well, thank you all for joining this session. So I am Phil Ashlock. Can you talk about the state of open government data infrastructure? Obviously a little bit from my own bias perspective, but I think it'll be useful for a lot of folks here. So just first a little bit about myself. I'm coming to you from sunny Washington, DC. Actually, I have a beautiful view from my apartment. So welcome to my apartment. I wish I could show it to you when I can't move the camera around, but if you've never been to DC, hopefully you'll get a chance sometime. Sorry, you can't all be here in person, but maybe next time. I joined the federal government in 2012, and I am currently the director of the data and analytics portfolio, which sits in GSA Technology Transformation Services Division. So it's mostly just like a big, or try to describe here more than my title, but for those who aren't familiar, GSA is an agency that mostly serves other agencies. We manage public buildings and government-wide contracting, and then we do technology services for other agencies, as well as directly to the public, including sites like USA.gov and data.gov. Data.gov is where I've spent most of my time and continue to support a lot of work there. And our team also works closely to support the Office of the Federal Chief Information Officer on federal IT policy, on data policy. And that's part of the White House Office Management Budget. One colleague there, in particular, Rebecca Williams, I believe will be giving a talk later today as well. And just the stuff I care about, I care about open government, open data, open standards. That's why I'm here. And in this context in the public sector, I really like to emphasize this idea of infrastructure, civic infrastructure, especially in the 21st century. I think it's important to kind of help elevate some of these kind of more abstract and invisible parts of infrastructure that we don't see as much as traditional infrastructure, like roads and bridges and public transportation systems, but is equally, if not more important in many ways in terms of helping to shape it and be involved with it and maintain it and make sure it serves the public good. So that's the idea of infrastructure behind my talk. But before talking about the current state, I think it's good to sort of provide some historical context. So if we're talking about machine-readable, open government data, we got to start with the machine-readable part, which actually dates back to an invention here in Washington, D.C., back in the 1890s. The Herman Hollerith created the tabulating machine, actually to perform the 1890s census. And that was really the birth of machine-readable data processing. Actually, the factory down in Georgetown and D.C. still exists if you're ever here and want to sort of see the kind of birthplace of machine-readable data. And then the open part really kind of in the modern era comes with the Freedom of Information Act in 1966, which still serves as the bedrock for kind of open data and open access information and the federal government here in the U.S. and many other countries and state governments have similar laws as well. And then of course we start to get some of this data on the web in the 1990s. And then it's around 2009 and thereafter where we start getting these big data catalogs. So data.gov actually launched almost exactly 11 years ago in 2009 and then many other, you know, data.gov.uk and many other countries, state and local data catalogs and our data policy is kind of incrementally started to develop from there. And then at least here in the federal government, we've kind of come to a new era with policy that really sort of sets a strong foundation for this work moving forward. So the Foundations for Evidence-Based Policy Making Act came into effect last year and there's a lot of work on implementing it as well as the federal data strategy which sets a sort of a strong foundation for data management in the government really for the next decade or so. And so I'm gonna be getting into some of these things on the bottom of this slide. So the current state and the future, but just wanted to provide some of that historical context first. So, you know, my main focus has been data.gov and so I just wanted to give a quick overview of that. Like I mentioned, it launched 11 years ago, it's gone through a couple iterations. It looks like this right now, I'm sure you're familiar with it and just a quick sort of overview of some of the stats. So it serves as the federal government, the US federal government's open data catalog, but we also try and think of it as a national catalog in the sense that it also incorporates some state and local government data catalogs on a voluntary basis. So actually we're right now about 85% of our data sets are from the federal government and 15% from local. And it includes these comprehensive enterprise data inventories of metadata from federal agencies and it even includes some information about some non-public data sets which are not available to download but just the information about them is public. And so currently we're operating under this new statutory requirement from the Evidence Act and Title II of that which is called the Open Government Data Act. And that sort of builds a sort of more permanent foundation for the work that we're doing but also expands the scope of the agencies covered. So it went from just the two dozen or so sort of primary federal agencies covered to really all sort of independent agencies and other agencies, so about 100 additional agencies are now sort of more clearly covered by this new law. So we've kind of gotten a more sort of stronger, bigger, longer, more mandate, more permanent mandate moving forward. But what happened in 2013 with the sort of executive order that sort of first started to set the kind of architecture for how we operate, is it changed, data.gov and kind of the publishing model for this catalog to be more decentralized and to be and to have this federated model so that each federal agency is publishing a metadata inventory using a standard metadata schema, publishing it as a data.json file and then data.gov acts more like an aggregator in the same way that other entities are able to as well including those that are doing similar types of aggregation now today like Google data set search. So this decentralized model has also helped us sort of scale and evolve. So just some sort of details about this kind of federated harvesting model. We're not hosting the data. We're really just providing these kind of card catalog entries of metadata describing each data set. And that includes the URLs to download the data if it is available for download. And we're pulling those metadata from these data.json files at each agency usually on a daily basis. So it's all decentralized and people aren't editing the metadata directly on data.gov, it's through their own metadata files. And because of this decentralized model we also sort of try and provide some quality checks that do some automated analysis of metadata at each agency instead of some QA. Because it's so decentralized it's helpful to sort of provide lots of checks and provide feedback. So we have dashboards and some tools that help agencies sort of validate and publish their data.json files. So just to give you a little bit more of an overview and detail some of the recent policy that kind of underpins all this. So like I said, the current era started with this 2013 executive order. And that really set the kind of architecture for how metadata is published and managed. And it also really set a really broad comprehensive scope for the metadata that it was tasking agencies to include. So it wasn't just public data sets. It was really thinking about all their data assets and sort of having a consistent way of classifying access level to those and basic details about each one. And then that policy was passed as law through Congress through a very strong bipartisan law that itself came from a bipartisan commission of experts. And that so expands the scope of the law in terms of agencies covered and really makes it a permanent thing through legislation as opposed to just an executive order. And then we also have this federal data strategy which is sort of setting a framework for kind of fundamental principles and practices and these year by year action plans to help really kind of set a long-term foundation and sort of have capacity building for better data management across the government really in the long term. So looking at like a 10 year long sort of vision but doing so through year by year action plans. And so we're in the first year action plan and there's quite a lot of actions in that that you can refer to that are probably of interest to you and there's a couple that I will refer to here later but just as far as some of the specific responsibilities at least for data.gov in the new law. So it sort of doubles down on this role of managing this federal data catalog and also talks about us managing this repository of resources which we currently have at resources.data.gov. It's kind of an interim site there that's about to relaunch in a couple of weeks but that's meant to help provide this repository of tools, best practices and schema standards to help support the data catalog and related work. And then as I mentioned the federal data strategy you can read up on it at strategy.data.gov but it's a pretty comprehensive framework that really is implemented through these yearly action plans and I just highlighted a couple of the actions that really relate to the work that we're doing with data.gov and some of the metadata standards that I'll be getting into a little bit more. So really the bulk of kind of what I wanted to focus on is kind of the state and future of kind of the metadata standards that we're working with here. So from the 2013 policy we had this website, the project open data website that provides some of the technical guidance and the documented the schema for implementing this kind of decentralized federated model and that's evolved over time. And as part of the new law and updates the work we're doing on the metadata schema these are kind of being renamed. So the project open data site is now moving over to resources.data.gov and what we used to refer to is the metadata schema we call it the project open data metadata schema we're now trying to start referring to that as DCAT US because it is based on this international DCAT standard. And as I said before, we also have this new site and actually this is a screenshot of a staging site we haven't actually relaunched the site yet but that should be coming in the next couple of weeks. But so DCAT I'm sure many of you are familiar with it but it is the data catalog vocabulary metadata standard that's used for a lot of data catalogs. Incorporates existing vocabularies from things like Dublin core sort of standard metadata concepts and itself has evolved over the years that the first incarnation was finalized in 2013 or 14. And actually because of the 2013 era policy was finalized actually a little bit before the W3C as a standards body had finalized the DCAT specification. Things are actually a little bit out of sync when we published from when the final version of DCAT was published. So we had this public engagement process through GitHub both to get feedback on issues with that with our version 1.0 of the metadata schema but also to look at things that had been updated in the latest version of DCAT so that we could align that. And so over the course of six months or so we went through a revision process with a lot of public input and input from agencies to produce version 1.1 of the schema and also got some support from those involved with DCAT. Folks from W3C like Phil Archer who helped us kind of finalize the JSON LD version of our metadata schema to ensure that it was you know really kind of compatible with DCAT and using the proper namespaces and all things like that. And so this is you know just the page that documents the current spec, the version 1.1 spec that we're still continuing to use although we have a revision process that will be coming soon. And for those who are not familiar just a quick sort of overview of the kind of data model of the spec and there's a couple fields here that are highlighted that are custom unique fields for the federal governments which are documented here but I won't go into detail. But you know DCAT is actually pretty widely used in different shapes and forms around the world. So not only is it being used you know by all these federal agencies as part of this policy but because it is based on this international standard you know we encourage it to be used by state and local governments even though they're not bound by the same policy but on a voluntary basis they can incorporate their metadata into the data catalog as long as they sort of meet some minimum requirements from this DCAT based schema. So there's quite a number of federal local governments that are implementing it as well. And then you know other national governments so you'll actually sometimes even see references to our data.json sort of convention and the project open data fields from some other folks like data.gov.uk. And then you know the European Union member states have I think worked on an application profile for DCAT in the EU which has also just gone through revision process along with the update to DCAT or W3C sort of international version. And then there's a schema.org schema for data set which itself is based on DCAT and that's used by Google and other folks that are working with schema.org. So that's been sort of the basis for things like Google's data set search and their announcement from a couple months ago they noted that schema.org is actually widely used by governments in their data catalogs and at least when this was published the US had the most number of entries with about 2 million. But DCAT has been going through a revision process to be honest, I have not been as closely involved with this as I wish I was but it is something that I think is pretty stable now although I see that the editor's draft date continues to be updated. So I haven't tracked the latest changes but I see this date is from just about a week or so ago. And just to highlight some fields in our metadata schema that I think might be things that I want to dig into a little bit more. I'm not gonna go into too much detail about these but this was trying to be a little bit future proof and kind of have some extensibility in our metadata schema to let people reference kind of lower level metadata. So like a data dictionary or a schema or a data standard that was much more sort of specific to that data set because the kind of high level DCAT metadata fields that we're using are pretty high level and they don't really get into the richness that you might wanna provide with a fully documented schema or data dictionary. So these are some ways to reference that or reference sort of doing specific data standard that might be being used by that data set. To be honest, these aren't very widely used but it's something I would like to explore further and see what we can do to sort of better leverage that kind of capability. And speaking of the future, sort of a lot of thoughts about what more we can do and really a big part of my motivation for being here is to get input from all of you on the future of this work. It is actually part of our mandate to engage in the public and have a sort of a voluntary standard space process for developing things like this. So just some thoughts on sort of the future of some of this work. So we do have, we are gonna launch a revision process for the metadata schema. I'd actually hoped to have that already launched and under way by the time of this talk but we're a little bit behind schedule. But I will point you to things to sort of get engaged as we kick that off. But we wanna incorporate updates from TKat 2.0 and some new requirements from the Evidence Act. You don't wanna consider sort of ways to better integrate or harmonize other standards. So there's the geospatial metadata standards that we work actually quite a bit with already like ISO19115, which has had some updates. There's everything schema.org is doing. NEM is the National Information Exchange Model which is sort of like a schema.org in the governments. There's the dataset publishing language which is more used for statistical data as well as SDMX for statistical data. And then tabular data packages and CSVs on the web which I'm sure this community is familiar with. I think there's probably opportunities for us to better leverage and incorporate. So those things, there's a lot of opportunities to I think do more with things like digital object identifiers or make them maybe requirements as well as other opportunities to better use unique identifiers across our metadata and have better ability to generate dataset citations. And then there's also super-emerging technologies, you know, get like distributed version control concepts applied to data, things like that. And then of course, domain specific data standards that could be sort of better leveraged or aggregated as we're providing sort of a national catalog that could aggregate sort of common data standards across local governments and things like that which actually brings me again to another point. So action 20 in the Federal Data Strategy is to develop a data standards repository. So this will likely be part of the resources.data.gov site. We kind of are expected to have an initial version of this developed throughout this year and expect to like properly launch that development process here in the next month or so. And there are some existing resources that are similar to this. So data standards.directory or fairsuring.org these are sort of directories of data standards some for the public sector. So we're looking to have something like this that's sort of more tailored to the federal government but this is something we'll be looking for input from the public on in terms of, you know, not reinventing the wheel and what's most useful to sort of document, you know the data standards that are currently in use or it could be developed. So I know I probably didn't leave a ton of time for Q&A right now but again, really my point was to open up sort of the floor to input on the work that's going on and let you all know sort of the background and context and encourage you to participate. So one of the things is, you know we are gonna be doing this revision to the metadata schema. We haven't fully kicked off that process but it will be done through GitHub issues like we did for the previous revisions. So I've actually highlighted issue 630 and that get a repo is one particularly to follow and other sort of URLs and things to reference but certainly I will be available in CSC Slack as much as I can the next two days although maybe a little bit more so tomorrow. So there's that Q&A channel there which I'll be keeping tabs of although it may not be respond to all your questions in real time but certainly we'll try and get back to everything as much as they can. So I don't know if it's worth trying to open up the floor for questions right now or for out of time. Well, no, thank you very much for that Bill. I've been checking some of the questions and we don't have a lot of time we do need to move over but I think that one of the main questions that we have here is I think you've really highlighted that the US federal government's been very involved in building foundational technologies and setting trends in this space for a long time. And there was just a set of questions around kind of the role of GSA and how there's long living projects that go across administrations within the US federal government and how data.gov and the kind of initiatives that you're working on, how do they relate to the kind of the ebbs and flows of politics throughout the US government? So I don't know if you want to just comment a little bit about the way that the projects work within the US federal government real quick. Yeah, I mean, it's, I don't know that there's any one consistent sort of rule that applies universally. Sometimes initiatives can come and go with different administrations. I think we've been, you know, there's, we've really, there's been a lot of, you know, opportunity with the consistency of this work and particularly with the legislation passing. So having this as part of a legislation, especially something that was very bipartisan is something that helps, you know, ensure that it is something that will sort of, you know, I think extend and continue between multiple administrations. So that really changes, I think the perspective not only from the outside, but also from how folks kind of, I think approach the policy internally and how things are funded and staffed and things like that. But that's really the thing that sort of makes the biggest difference in sort of thinking about this as long-term infrastructure and not just some sort of initiative of the one administration, but... Yeah, well, thank you. Yeah, thanks for the talk and also thank you for the work that you and your colleagues do, you know, your unsung heroes in the data world. So thank you very much. Well, thanks for having me. I'm glad to be part of this event.