 Fantastic. Good afternoon. I hope you all had a wonderful lunch. My name is Eugene Huang and I'm with the White House Office of Science and Technology Policy. I am not Andrew McLaughlin. For those of you who are expecting to see Andrew, he sends his regrets. He is unfortunately on his way to Europe at this particular point in time. We're gonna be talking today about primary legal materials and innovation and we actually have split up this panel into two mini panels. The first panel will be discussing and demonstrating some of the innovations that are going on in the legal, primary legal material space, and then we'll follow that up with a panel discussion. And so I'm gonna turn this over first to the GovPulse.us team, Andrew and Bob, who are sitting to my right, who will be giving us a demo of GovPulse. Hi. My name is Andrew Carpeneau. I'm here with Bob Burback from GovPulse.us and just wanted to walk you through what we built and what we think is interesting. So when we started this project about eight months ago, we barely knew what the Federal Register was. And we come at this from the approach of technologists and we wanted to re-envision what the Federal Register was. There was a sunlight contest apps for America, too, and we were selected as a finalist and built this for this contest. And wanted to make all this rich, important information that's in the Federal Register, so the daily rulemaking and the administrative workings of the executive branch, more accessible to the public. So we started with the bulk data, originally just the mods files, which are the metadata around these documents, and then spidered the GPO sites to get the full ASCII text of these articles. And then, as we'll see, we were able to do more when the bulk XML feeds were made available. We also added value by sending things out to external APIs and being able to mark up the data in more rich ways. Our code is completely open source and we're built on open source technologies and everything we're doing is hosted in the cloud. So it's really sort of a new approach to these things. The bulk XML was released in late 2009. We'll hear more from FedThread, which was, I think, among the first to take advantage, but it contains the full text of each article, along with the markup of the structure and meaning of it. So you've got all the headers, the tables, and indications of where images should be. And with this, we were able to do a variety of different things. We were able to put the actual tables into the documents. We were able to link up the footnotes and have them work in a sort of standard footnote way. And we were then able to go through and take the PDFs that the GPO provided for these documents, the official PDFs, pull them apart, pull the images out of them, and then insert them back into the HTML versions of these things. So that you now have a HTML version of a Federal Register article that has all the information that the PDF one does, but in a more easy to use format. So our daily process is to download the bulk text and XML each day and then process them and insert them into our database. Right now, the GPO is two primary different ways of sending us this information, sort of just the raw metadata and then the full text, and we have to merge those two together. We then scan for references to a variety of different sources, the Federal Register, CFR, US Code, these sort of things, and hyperlink those links to the particular section in the associated document. We also use this to do citation analysis of the Federal Register itself to get a sense of what's the most cited article in the last year by a given agency. This is likely to be important or show me this proposed rule. Okay, now when I'm viewing this, I can see the final rules based on it because it refers back via the citation to this one. So we do the reverse citation as well. And then we also spider the regulations.gov site to be able to figure out which to be able to link directly to the spot to leave your official comments. And in we also pull out information, send our data off to Yahoo! Placemaker for location extraction. And this is useful so that we can whenever a location is mentioned in the in the Federal Register, we're able to put that on a map and you can say show me show me all the Federal Register articles that mention a place within 200 miles of this place that I like or show me all the things about fishing that mentioned something near Bethesda, Maryland. And you can then see the regulations that are applying near you. Great. So I'm going to go quickly through the next part. We've kind of briefly covered these and we're going to talk about what we added to the data once we got it. And then I'm actually going to jump over to the website and show you a few examples of that. So as Andrew mentioned, we send the full text off to Yahoo! Placemaker. We basically give them a bulk of text and they send us back entries that say we think these places were mentioned in the text with a certain level of competence. It's not 100% accurate, but it gives us the ability to kind of do some of that geolocation to allow you to see where things are and what government agencies are working near you and kind of get a sense that, oh, the EPA actually does stuff down the street for me. It's not this agency off on the other side of the country or from San Francisco, so other side of the country, that we don't understand or know anything about. We've seen a lot of really good response to that. References to the federal register, Code of Federal Regulations, US code. In the upcoming new federal register site, we're also doing patent information. So when a patent's mentioned in a federal register article, we're going in and we're grabbing that citation and we're linking you off to it so you can actually see, oh, this regulation is about this patent and see what that patent is very easily. And I'll show you an example of that in just a moment. Citation analysis, as Andrew mentioned, we go both ways. How often this article is cited and how often this article cites these other articles. So you can really begin to get a sense that none of these regulations are actually standing alone. They all have an ecosystem they live in and affect each other. And that we feel it's really important for people to understand what that context is and that these aren't just made up out of the sky for the public, which often can seem the case. But there is really a process in this. Related to that in the new federal register site, we're working with embedding the unified agenda and the regulatory timelines that the OMB provides so that you can see that this proposed rule is in this step in the larger process. Exactly. And a little backstory on that. We are working with the OFR and the GPO to build out the new federal register site that you've probably heard a little bit about. That's going to be coming up pretty soon. And I think it's going to be talked about in a moment here. To take this code base and use that as the code base moving forward. And it is open source. It's entirely public. Everybody can see that and see how this works and really even go down to the transparency level of the code itself. So official comment submission. We actually spiderregulations.gov to figure out where their form lives. So we don't submit the form for you, but we get you directly to that forms to make it very easy for you to say I care about this regulation. I want to comment on it one click and you're there. One of the features we've actually just added in the last week is a number of articles will talk about endangered species. And they'll say we're proposing to set regulations around this area for the endangered species that they're talking about. We've actually gone in and often there's coordinates in there. We're now parsing those coordinates out, turning them into a KML, which is a standard format that any number of systems can use to import these. So you can download that and we also visualize that on a map you can zoom around and not sharing via social networks. We want people to get out there and talk about the things they're passionate about and exploring my agency and topic, visualizations of things like how active is this agency? What sorts of entries are they most likely to put out and how often they put those out? And then again locations for where this agency is active. So I'm going to jump over to the site. And this is the homepage of the Pulse. You'll see kind of on the left hand side there. We've got one of the things we're able to do because the XML is structured is pull out when comments close and when comments are opening. So we're trying to expose that to the public to say, hey, you may be interested in these things or, hey, these things are closing soon. You should probably comment on them if you're interested in them and try to bubble that up to the surface. The geolocation on the side here, we're actually viewing Washington at the moment. So one of the things we can do is say, oh, well, there's some entries mentioning Anacostia, Washington. Let me go over there and see what's going on because I live there. Some of the visualizations down here on the right. We're using, if you're familiar with Tufti, his explanation of spark lines and allow you see a lot of information in a small space to see what agencies are active and then you may be interested to say, well, let me go see what the Department of Agriculture is doing because they seem to be very active. So I've actually opened a bunch of tabs here so that we can get through the, don't have to worry about the internet. This again is Washington, D.C. We can come in and see, oh, here's, this place has some entries mentioned. Here we actually can see three of the entries about Cheltenham, Maryland. So again, quickly you can get in and see what things affect you. Here's one of the Endangered Threatened Wildlife entries that I mentioned. You'll see a number of things here that as I come down. We've generated a table of contents from the Federal Register document itself so you can quickly navigate around within these hundred page documents. We've taken the tables so we can jump down to a table. And this isn't a particularly complicated table, but we've tried to go in and reformat them to make them a little friendly and easier to use. A lot of the tables in the, in these documents become very complex and, and they're very hard to read as ASCII tax. And so one of the things the bulk XML allows us to do is, is really build those in a way that you can get to and see. Again, Andrew mentioned the graphics. The bulk XML encodes where these graphics are located so that we can go into the PDFs, get them, and then bring them back in and embed them into the document. And I believe we're the only place online doing this at the moment, combined with the article. I could be wrong on that. So I'm going to click on one of these here. So you see it's actually, here's one of the, the images that were embedded in the, the PDF about this area that's going to be sanctioned for wildlife. And what we've done up above is, here's the original coordinate list. Lots of coordinates, not that easy to, to really do anything with. But here's your map. Here, here's where it is. We've got a smaller map on the side here. Kind of gives you a little bit of context around that. And then you can just click here and download the KML if you want to do something with this. And so we really want to find ways to take what's in the Federal Register and in any of these legal documents and allow the public to take them and then re-envision them. Who knows what someone's going to do with KML and the sort of things that they'll be able to, to kind of build around that. So part of what we've tried to build in the site is the ability to, to then again take this data and do something new with it that even we haven't thought of. Actually, what I wanted to show here is this article is, right, article? No. So one of the things we do deep inside of these articles is links to their predecessors or later articles. So if it's proposed rule, we'll link to the final rule when it comes out. And so we've embedded those in the text. You can actually see the, we linked to the US code here. So one of the articles was actually a revision. I'm going to stay on this article. So we link over, here's the citations on the side here. You can see this article is mentioning these other citations and there's actually 28 more that we could go to and see. So you really start to get that sense of context there. And then here's an agency page that we wanted to show. You can see this agency has released articles about all these different locations in the US. We've got a graph on the side about the different types of rules they propose and then topics. So you can see the Department of Interior has a lot of things around reporting and recordkeeping. They also have a lot of things about around imports. So you can click on that, go in and just kind of begin to see what agencies are doing and make that understanding. So again we're all open source, Ruby and Rails. We use 40 plus open source libraries and technologies. Every single part of our SAC is open source, which has allowed us to build fast and cheaply. That we don't have to reimplement the will. We get to use these incredible technologies that other people have built and have opened up. And we've actually built a lot of technologies along the way and open those back out. The whole parser for the points is open source and available for people to use. We're hosted on Amazon. We use the cloud. The cloud is cheap. It's powerful. There's a contact information. We're definitely around and love to answer questions and thoughts. Great. Andrew and Bob, thanks so much for that presentation.