 So the fourth of our beta sprints to be presented John Butler and Wendy lugee will describe government publications Enhanced access and discovery through open linked data and crowd sourcing I don't know if it's possible for one of the presenters at some point to give us a sentence or two on open link data It's been represented many times through the course of the day, but I don't know Wendy Is that something else is being set up you'd be willing to do? Okay, I was kind of Well, we're going to move to more of a conceptual model here Very targeted in the area of government publications from content and services point of view This is a project a collaborative project that is being put forth by University of Minnesota by the Hathi trust and we have represented some Hathi trust here in the audience and By the committee on institutional cooperation the CIC and We have Mark Sandler the executive director of the CIC with us and my colleague Wendy lugee So this I said it's a flatter presentation particularly following the Harvard presentation great job guys So the idea here is to create a critical mass of content valued and in demand by the user community and build context sensitive and needed services around it And we do this by proposing innovative and scalable approaches using Hathi trust is a real life at scale test bed as The larger largest publisher of information in the world As the largest publisher of information in the world the federal government provides key information resources that encompass all aspects of you at us History and culture and the development of better access and discovery processes Enhance how citizens work with their government Because the information resides primarily in the realm of the public domain. We see obvious Opportunities for exploiting this information We anticipate that government information will be a significant part of the DPLA So I'm gonna start here with content the amassing of large-scale content We have Since 2009 it's a comprehensive approach to amassing these documents using a framework of digitization facilitated by Google and others the accessing and archiving of this content by Hathi trust and corresponding print management strategies Bringing together for regional Depositories and just to note that we have the opportunity through the Google work here to share these digital files the status of this is that we're seeking a Potential digitization target of these print documents of 1.5 to 2 million documents and thus far we have we think in Existence 300,000 are in Hathi trust digitized by CIC and additional 200,000 plus by other Google partners we recognize the challenges here of accurate identification of these government publications the sourcing of the documents and their corresponding metadata and The enhancing of this digital corpus by useful services, and that's largely what this project is about I want to note too that the Hathi trust held its first constitutional convention this past weekend or the weekend before and one of the ballots that passed was the To initiate a planning process here for a business model and operational plans for the creation of a complete corpus of Government publications and new forms of access so that fits really quite well with this proposal So moving on to these services contextualized services around government publications we've been working with a Community of government publication librarians to identify what would be the most valued services, and they really fall into two areas One is to deal with the problem of us government Department and agency name changes as users encounter that confusion and chaos in their Searching and navigation and the second really is to make provide some sense-making in terms of Legislative histories that is how do we link these documents together in some kind of genealogical fashion So we propose to use concepts of link data To deal with both but primarily with the name agency changes in this example You see some of the variants of the u.s. Coast Guard including the life-saving service a Preceding name and the reverse cutter service and onward so when a user searches for us Coast Guard Are they aware of these variant changes over time? Probably not and so we're seeking to use link data and RDF triples that really move us away from our metadata record-based Concept in terms of soul search and retrieval to identifying and leveraging the relationships between one data store of bibliographic data and existing external or internal data stores where name authority changes can be That are managed and so there are a growing number of data aware stores out there Including the Library of Congress name authority file the virtual inner virtual international International authority file and so forth world cat identities These are just waiting to be tapped in order to create on-the-fly relationships between documents that and Information objects that reside in one place and that of another and here's a little bit of flow how we might imagine this working with the Hathi trust Moving from a catalog heading using an RDF triple that tries that relates One entity object in Hathi trust or an information point to that where it exists in another data store and returning a related name a related Piece of information Based on the request it could be presented back to the user on the fly It could also be used to enhance records or enhance queries that the user would be Conducting within the search engine And here's an example of how that might might actually play out in a user interface searching for US Coast Guard a touch to these other data-aware services that deal with the variant names preceding and following names and Either by explicit or implicit inclusion of these alternate names could be embedded in the search without the user having awareness of those variations The other object of our project is to make sense out of legislative histories here And this is probably the least ugliest slide that I could find We hear a lot about sausage make sausage making with legislative histories, and it really is the case here This is a daunting challenge here as we look at the relationships again And the key to this project is building relationships between information objects documents and such How do we do that with legislative histories? And here's a simplified version of that Hard to read here But again this we see this as a combination of algorithmic approaches clustering of documents based on legislative history Public law numbers and so forth and interventions by the professional government publications community as well as Possibly end users under certain editorial controls And those are the interventions there by tagging or perhaps even the creation of these RDF triples by users And the keyword search that would help us stack these documents up in some kind of related history So just to wrap up here again We're talking about the continue amassing of government publications content on which we can create build interesting and useful services and Finally again, this is the front slide here our Proposal was submitted as a short video and the URL is is there I don't know if it's up on the blog site John I know we put it up on the wiki site as well. So that's another source for that Thank you. Excellent John. Thank you I have a hunch that my friend Carl Malamud would like to say something. So I'm just gonna send the mic in his direction. I Have a question a wonderful beta sprint. You say you're authorized to share those government documents with With the government with the FTCS government printing office. So two questions one is GPO at all interested in Getting access and can you share those documents with other groups such as public resource or internet archive or? You know commercial operations or whatever an office that's been expressed To that second question And Google's willingness to share with other parties Really the discussion we had with Google was was specific to GPO and You know, I guess we could you know always go back and And talk about that. I think it would just be you know, what what's the nature of the vision? that that someone wants to For which someone wants to use these documents. So, you know, I wouldn't foreclose it but our specific discussion with them was about GPO Okay, other comments on this. Yes, sir As the mic goes along, I would say that the GPO would be a wonderful partner in this effort overall as well I'm not sure if there is someone from the GPO here, but we certainly welcome participation My name is Augustin Holtz. I'm a retired vice president of Redex digital company and my responsibility was in government documents, but I just want to comment a little bit on the difficulties that that that mark and his colleagues have alluded to if you Consider that congressional committees are either standing committees or special committees The standing committees are fairly long-term and seldom more than a hundred of them They're not eternal even though some senators would like to think they are But the special committees are a completely different thing either special select committees or special committees There have been over 6000 of them in American history so far and The names of these special or select committees often do not appear on the reports or documents themselves They'll all they'll have some kind of a name but it'll be a Purifices like the committee of that part of the president's message of the third instant now the real name is is Locatable in the House or Senate journal for that session, but it doesn't appear in the document itself So it's not something that ever could be extracted by full search requires research either through cataloging or some other form of Intense research to get it straight. So all of this is by way of saying perhaps too long that it can't be a very very complicated Thank you acknowledged awesome. I think we're gonna Move on but any response to this or do we just say yes? I think we we recognize the complexity of these problems and in fact, that's what's inspiring us to try to do chip away at it with some scalable Methodologies here, but fully acknowledged. These are very complex Information structures if they exist at all Fantastic team. Thank you so much