 Alright, well it's four o'clock and we're trying to cram two hour-long talks into one hour, so I should probably get started here. So Tom Kramer and I will be talking about the link data for libraries project, giving you an update and a little hint of what's to come. So for those of you not familiar with the project, Link Data for Libraries is a partnership of Harvard, Cornell and Stanford funded by the Mellon Foundation. So we've been working together to assemble ontologies and data sources to provide relationships, metadata and broad context for scholarly information resources and just think about anything that you would catalog in your library or archive or wherever. And it does take advantage of existing work from both the Vivo project and the Hydro partnership. So the vision, small vision, create a link to open data standard to exchange all that libraries know about their resources, not merely bibliographic but any kinds of other information that libraries and librarians and potentially scholars add to those resources. So we'll start with an overview. So specific goals of the project. We really wanted to free information from existing library system silos. Libraries have been doing this for a very long time and they've come up with a very sort of specific set of standards that work well for them and that don't necessarily play as well with others out on the web. So we wanted to free that information, provide context, enhance discovery. We wanted to leverage usage information about the resources so that we can take advantage of that for discovery and context. Link bibliographic data about the resources with things like academic profile systems, more information about researchers, authors and what they are doing. And other external link data sources. And assemble and where needed create a flexible, extensible link data ontology to capture all of this information about the resources and then turn around and make it available. And demonstrate combining and reconciling the assembled link data across our three institutions. So our working assumptions. We are trying to do this conversion and relations work at scale with about 13.6 million catalog records from Harvard and roughly 8 million big records from each of Stanford and Cornell. When you translate this into link data triples, it turns out it's in the billions. It's a lot. And we're trying to understand the pipelines and workflows that we will need to do this. And finally looking to build useful value-added services on top of the assembled triples. So that's probably not, we're not doing a lot of that in this first version of the project. So here's sort of a summary of the kinds of data sources that we have. We have bibliographic data, mark records, some mods, EAD and others. Person data from the, let's see, common academic profiles, that's what CAP stands for. And then we have a lot of CBO and faculty finder at Harvard, Orchid, research ideas, business, and usage data. This is the stuff that really doesn't get incorporated very much into existing catalog records. These can include curation data, like, it's the thing that's selected for an exhibit. Is it mentioned in a research guide, it was included in a syllabus, and then more arbitrary tagging. So use cases, we started our two-year project by really trying to develop some clear use cases for faculty and students. We started out with a set of 42 raw use cases, we know those down into 12 refined use cases in six clusters. And essentially these are sort of combining data. So let's look at where we're combining bibliographic with a kind of curation data, that exhibit data and other things that I was talking about, where we're combining bibliographic and person data, where we're leveraging data that we're connecting from outside. So we're linking then our references in our catalogs to outside resources and taking advantage of that kind of data. Where we're leveraging the deeper graph, where this is now a network of interconnected entities, and can we take advantage of those connections to improve discovery and understanding. Leveraging usage data and then building cross-site services. So I'm just going to go through a couple of the use cases and what we've done in terms of initial pilots and demo implementations to show you the flavor of what we're attempting to accomplish here. So use case 1.1 was build a virtual collection. So we want to allow librarians and patrons to create and share virtual collections. So essentially add organization to a set of resources by tagging and annotating those resources. And we did initial implementations at Cornell and Stanford. I'm going to walk through the Cornell example. So here's our blacklight catalog search with a little bit of augmentation on it. And I'm going to create a virtual collection here, an archery collection. And I'll go ahead and search the Cornell catalog. I'll find a particular archery scholarly information resource that I want to add into that collection. I'll go ahead and go to the pull down here and add it to the archery collection out of my own personal set of collections. So now here we go. I've got the archery collection with one item in it so far. But I'm not limited to the Cornell catalog. I can go off to Stanford SearchWorks. I can search for archery there, pick out a resource that looks interesting to me. Grab the URL out of that and go back to my collection. And if you can see under archery, there's a little thing that says add an external resource. If I click on that, I get a new item thing where I can drop in the URL. And now behind the scenes, I actually grabbed the Mark XML out of the Stanford catalog. I translated it to RDF and I build this as a new link data representation with all of its metadata in my virtual collection. I can go out to a native RDF system like Vivo, which has a set of publications, pick up an archery publication there. Again grab the URL, add it into my archery virtual collection here, grab the RDF, populate the information and continue to build out this RDF network and this virtual collection. So really transparently use this to pull together information and collections from a whole bunch of different sources on the net in this standard link data format. So a second use case, tag scholarly information resources to support reuse. So we want to have a way to create manage online collections. Support more automation, batch processing. Just a really simple example here, I can do the annotation again in my virtual collection add a text note, I could add a folksonomy tag or other things. So let's look at a second use case. I wanted to see and search on works by people to discover more works and a better understand the context for the people, the authors or other people associated with the resources. So the goal link catalog search results to research networking, researcher networking systems to provide more information. So what are we doing here? We're adding Vivo URIs to mark records for thesis advisors. This is the project we did at Cornell. Adding links to the Vivo records linking back to the faculty works and their students thesis. So our metadata specialists went through adding thesis advisors into the mark records using net IDs from the graduate school. So this is a standard ID for faculty at our institution into the 700 field. The advisors have then looked up against our Vivo instance, our faculty profiling system to get the URIs for the faculty members. And now we can build the links from Vivo to the catalog and potentially vice versa. So here in the Vivo profile of a thesis advisor, I can link to the set of theses in the catalog in our institutional repository that he supervised. And as I discover those things in the catalog, I can link back to the faculty advisors Vivo record and get his research context, understand more about the faculty member, and potentially take advantage of that information for that on the line to improve discovery in searches in our catalog. So the next example I'm going to talk about, identifying related works. So we wanted to find additional resources beyond those directly related to any single work. So as an implementation here at Cornell, we explored modeling non-marked metadata from the Cornell hip hop flyer collection using link brains. So I'll go through and explain what that means. We have a set of like 500 hip hop flyers as part of our hip hop collection. These are particularly interesting because they describe an event, which typically we'll have from one to 20 hip hop musicians presenting at them. There's a location. There's a date. There's all sorts of useful information here. So our pilot is linking this hip hop flyer metadata into the music brains link brains data, which is a database of information about music and musicians. So we tested our link data for libraries, a big frame approach to describing the flyers that were originally cataloged and shared shelf. We created these big frame work subclasses. We used URIs for performers to recursively discover relationships to other entities via dates, events, venues, and other kinds of information. So here, for example, Afrika Mbambara is a hip hop musician that is mentioned on several of the flyers. Here's his information in music brains. So I can now take the information and connect a particular performance event, maybe a CD release party, on a flyer with the information in music brains. So the flyer itself isn't going to say what the particular tracks on the CD release were. Okay, but for music brains, I can get that information and I can infer what was probably presented and performed at that particular event. So make these kinds of connections. So some takeaways. We were able to map large parts of our metadata over to RDF using a bunch of different ontologies to discover more relationships to more entities. It was largely predicated on manual workflows for doing the pre-processing, doing the URI lookups, and the fact that we have unstable software for RDF creation. And really, we need, in order to build on this, we need more URIs for both linking to and linking from in order to really build up a richer set of context. So there's a quick overview of some of the use cases. Now I'm going to hand it off to Tom. Well done. We're doing okay on time, so let's talk about the ontology. So take it away. Dean never really seems panicked. He has the nice things about working with him. One of the nice things about going first. He always takes less time, but I always take more time, so it works out. So Dean described some of the new data entry that the team undertook over the course of the two-year project, and he showed earlier on kind of the three big pools that we were dealing with. So bibliographic data, person data, and curation or usage data. And theoretically, they're all the same, but because we work in libraries, they're actually rather unbalanced. And the big 800-pound gorilla is bibliographic data, and when we started Link Data for Libraries, it was expressly not a bid frame project, and that lasted for about four months. And what we found out is the mass of bibliographic data and the issues around bid frame were so significant, in particular around the mark records, but where eight or nine, eight to 15 million, depending on the institution, of our records where we really couldn't escape. It was sort of the Jupiter drawing us in. All three institutions actually had pretty good pools of person data from things like the profiles applications, and then these external data sources like Orchids, Izneys, or VIAF. And the usage data is a great idea, but we don't really have examples. And in fact, the examples that Dean showed are where we spent a lot of the time around building virtual collections and tagging items because it was sort of green field. It was easier to experiment there, and the tool chain lent itself to that. So really, we had kind of an 800-pound gorilla, a chimpanzee, and a pygmy marmoset in terms of our capacity. And as it turns out, one of the outputs, as Dean mentioned, for the project is an LD4L ontology, which is, as we said, we don't live in a world just of bibliographic resources. We are these richly interrelated institutions where we know a lot about not only the works, but also authorities and people and topics and research output. And one of the basic things that we were trying to do was try to relate all of those things together. The core of it, because of the mass, was Bipframe. How many people, who doesn't? No, I can't ask that because you all identify yourselves. How many people know what Bipframe is and kind of roughly the current state? Ha-ha. You thought you knew the current state, and what I'm doing is a clever advertisement for the session tomorrow afternoon at 2 o'clock, because my colleague Rob Sanderson is going to be talking about some recommendations to substantially revise Bipframe, largely based on the experiences throughout this project as we sat down and we really tried to model it. And our colleagues at Library of Congress were actually quite interested in this project as we were going through, looking at doing an RDF sanity check, I think is what Sally McCallum said. I don't know if Sally is here on Bipframe, and then also seeing how it worked with other parts of this greater pie. So here's kind of the Bipframe model pre-Rob, and you have to go to the session tomorrow to see what it looks like post-Rob. But what we ended up producing, and this is actually checked into GitHub, is a LD4L ontology that relates not only the bibliographic resources that are tagged here with BFs for Bipframe, but namespace, but also some of the resources and predicates from other vocabularies. And I won't try to talk through, one of the decisions that we did have to make is because we knew Bipframe was a work in progress, notwithstanding the recommendations that we felt would be made and that Rob put forward to Library of Congress, in which they'd begun to entertain and are incorporating some of that feedback in the ongoing list of revised Bipframe proposals that they're putting out. We took Bipframe as it existed in about May of 2015. 14? 14? No. Well, yeah, we don't remember because we're old and other people did the work. But what we did is actually that ontology is probably a reference but not one to use until the new Bipframe ontology comes out in some of the dust settles. But we actually do feel like it was a concrete step forward. And while Dean described some of the novel data entry with things like hip hop flyers linking to music grains, what we found is we had to spend a lot of time actually just converting the mass of our records from Mark into linked data. And so this is actually a pretty cleaned up process flow. But we did spend a good chunk of person months going through and establishing this flow at each of the three institutions. So that's taking Mark records, converting it to Mark XML, pre-processing the XML to get normalized Mark, running that through the LC Bipframe converter to get Bipframe RDF, post-processing that to do some cleanup and then actually producing linked data, which then as we found out was not really linked data, it was just RDF because the way the LC Bipframe converter works, we didn't have those look ups to external authorities and in fact even it wasn't internally coherent or internally linked. So we uncovered a ton of opportunities for improvement through this. So I think I've alluded to some of the future processing challenges. One is to be able to actually look ups and incorporate authorities as part of that conversion process, either at runtime or after the fact with ongoing reconciliation and enrichment. We actually also focus primarily just on Mark but we have things other than Mark. But Mark is for all of its works and all of the opportunities to extend and enhance it, it was actually there's the best toolkit around that. The linking to external entities is a big challenge and largely unaddressed challenge. It simply won't scale when you're talking about millions of records to have individual metadata librarians doing look ups to things like music brains or DBpedia. And then finally we wanna just really link into the rest of the linked data world. And I think what we found through this process, the partnership across our three institutions doing it at the same time was remarkably rich. We actually consciously made very different infrastructure and technology decisions because we had different technological capacities and interests, but comparing notes as we went and then trying to link it together at the end has been enormously instructive. And I think we actually have made some good progress but it's clearly establishing a cooperative cataloging environment is still a major to do for the community. Speaking of which, we were interested in not just having a link data for libraries project that worked for our three institutions, but one that also reflected and aligned with kind of the state of the art and the understanding for where the community was. And in February of 2015, we convened a meeting at Stanford with everyone that we could think of and who would say yes to an invitation to come and it would fit into the room to tell us what we were doing, what did it look like, how did it relate to what they were doing, where did we have things wrong, where did we have things right and what should we do next. And really we focused on the report, really the set of use cases that Dean just covered just to ground the work. This is what we're trying to do and by extension this is what we're not trying to do. We reviewed the ontology and then really talked about the technology and the prototypes. The workshop results are actually published in the LD4L wiki, which is at the Duraspace wiki and they are linked from ld4l.org if you're interested in seeing them. Now, so we had our list of topics that we wanted to talk about, but you get 50 ontologists and link data experts and the conversation goes where it will. And what we ended up doing was talking a lot about the curation of linked data. So really how do you get a meaningful and useful set of linked data pools? We spent a lot of time talking about the techniques and technology. I think there's general agreement that this is a field that is rich for opportunity, it's fairly immature. We also spent a surprising amount of time talking about the why of linked data. And I think it reflected the maturity of the practitioners who happened to be there so that no one actually took it for granted that linked data was a good idea unless it solved something useful. And there was general agreement that identifying what those useful bits were is also a work in progress. And then finally, there was a lot of discussion about who, and I'll talk about that in a second. So some of the big recommendations that came out of the workshop is that we need within the library to actually use the linked data that we produce. Producing the linked data but not making use of it is not useful by definition. We must create applications that let people do things they couldn't do before. And it's not enough to talk about linked data, we have to talk about the value or the services that it enables. In terms of specific tactics, using local original assertions, or when we make local assertions, we should use local URIs, not simply link to a global URI when it exists. And really it was up to us to create a critical mass of linked data for us to derive the benefit and create a scalable and critical mass. And going back to our monkey theme, I think what we found was there's more work in front of us than there is behind us. It is an immature space, but one of the major assertions is it's probably wrong to think about linked data for libraries and what we need to be thinking about is linked data for everyone. So that is putting library data on the web, that's not only fed by the entire web, but used by the entire web. So that wasn't really heartening as we were halfway through the project. And our convened experts say, no, no, no, your scope is too small. Solve it for everyone. Two, pick me Marmoset. Our current capacity, there are a lot of open questions. We don't have this nailed. And these were some of the people who if you thought anyone had it nailed, they should have been in the room. So one is what is the business case and what is the value proposition for linked data? I think our use cases were enormously helpful. Others had additional use cases. A lot had to do with visibility on the web and increased traffic and search engine optimization. But clearly understanding why you would use linked data is gonna be instrumental. There was a lot of discussion about being fit for purpose. And if linked data is fit for purpose, you have to know what the purpose is. Two, our ontology, still a work in progress. There's some, the wonderful thing about ontologies is there's so many to choose from. It's not exactly the saying, but it fits here. Three is the community is who is we. There was a lot of talk about if libraries solve this by themselves and we're not engaging publishers and we're not, if it's just the humanists and we're not engaging the scientists, are we really doing ourselves a service or a disservice? At the same time, you can't boil the ocean. So where do you make the lines and who's convening these things? There was a lot of talk about, well, we should LODLAM this. LODLAM was used as a verb. And John Voss, who is one of the LODLAM conveners, was just aghast. He's like, there isn't a LODLAM as a hashtag. It's not this, it's not Spectre. It doesn't underlie every single linked data project that there is. It's a group of people who get together once in a while and talk. And that really, I think, highlighted that there is a vacuum in terms of who is defining who is convening linked data practice within our sphere and relating us to other spheres. And then tools and infrastructure, they suck. Okay, all of these, how many people have been in the triple store conversation? There's actually two. What do you mean by triple store? Do you mean something that stores triple or something that does sparkle? That's one of them. The other is, does anyone have a triple store that actually runs fast? I've been in both of those conversations about 15 times over the last two years. One's conversation 37. The other one is conversation 38. And still I haven't heard good answers to those. They're major to-dos. Things like the converters and validators as well. Editors, stores, there's just a whole set of infrastructure to be built. So in terms of challenges, ontology and vocabulary mapping, there's more work to do, but we are making progress. The way to make progress is to actually demonstrate concrete value and services, not just to abstract everything out into triples and put it someplace in the cloud. We need a major tool-building effort, a Marshall plan for linked data tools. And we need to spend a lot of time developing the community. So knowledge experts with the technologists and hooking in with not just libraries, but also lambs with industry and with the internet at large. So that was the workshop. As the, we're in month 22 of 24, as we go. So we're in the home stretch in the final push. And what's coming up next? So one of the things that the partners realized is that it's great to think about novel or conversion, but we're also producing metadata as we go at all of our institutions. And what we really need to do, the first rule if you're in a hole and you wanna stop going down is to stop digging. So how do we convert our metadata production from traditional bibliographic metadata into linked data? So one of the outcomes of this is, I think we're up to six partners at this point who will somewhat like the LD4L partners over the next couple of years, spend time actually defining the tools and the infrastructure to do original and copy cataloging in systems that look a lot like our ILS. This won't be in production, I think, for any of the six partners. Production is one of these words that you have to be very careful. When you talk to a technical services person, so Phil Skir, my counterpart at Stanford, when he means production, he means making metadata. And when I mean production, I mean a server that doesn't fall over. So we try not to say that. So it's metadata creation, but it won't be in production. 24-7 running, it'll be kind of a shadow system. So, but this is one of the big areas. It will be focused largely on BibFrame, though there are some digital object creation at Stanford at least that is not using Mark, so we'll have to get BibFrame for mods or alternative schema. And it's really to understand the end-to-end workflow and the issues about creating new metadata. The other project, maybe Dean would like to speak to is LD4L, two, the son of. Son of? So we're really trying to create a serious set of tools that we're piloting now with the goal that they can be put into production within sort of the three to five year timeframe. So, link data creation and editing tools, starting with vitro and eagle eye, doing some hydro tool integration, building on the existing tools and making them more link data compatible, doing some initial experiments in discovery visualization network analysis. So how can we actually take advantage of this information in ways that will help scholars and people looking for resources at our libraries. We need tools and strategies. We've already talked about this for resolution, reconciliation. We need to worry about persisting our URIs. And we still have a bunch of conversion issues, a better mark to bitframe converter, if particularly for bitframe 2.0, geospatial and moving image metadata perhaps. In some cases, we're gonna need to mark, go from bitframe back to mark so we can have sort of a core mark record for existing ILS systems. So a bunch of approaches like that. So we are now actually pretty well out of time. So the goal is a common language across all of us to avoid getting stuck in the tower of Babel. I will say questions now and we'll take a couple of questions while we're transitioning to Charles. To Charles. Maybe I'll segue, because I've seen two presentations by Tom Kramer, obviously one of them being craft and they're both excellent. And how do I segue between, what we've just seen and what we're gonna see now? I could say that I'm gonna represent the A in glam, i.e. the only vowel in glam. Or I could point to this glass and say that Tom, I think, fairly stressed the top of this, which is the half empty. I'm gonna stress, well not stress, suggest that maybe the glass is being filled, but I think his point is fair. Things are still shifting. But let's see where they might shift in future. The University of Chicago Library Digital Repository is a preservation repository consisting mainly of the digital component of the university archives. It contains a variety of cultural heritage object types, both born digital and retrospectively digitized. The structure of these objects ranges from the relatively complex to the relatively simple. The size of the repository is now over 50 terabytes and continues to grow rapidly. Two problems present themselves. First, how to manage the materials. Second, how to describe them. Management follows traditional archival practice. First we transfer, then we accession, then we process. While digital materials form an increasingly bigger proportion of what we archive, many digital collections consist of both analog and digital components. These must be managed and described together. The Digital Library Development Center automated processes for the special collections research center to continue transferring and accessing the digital components of these materials. Transferring means moving something from your place to my place, where your place is your place and my place is the archive. By the way, my written remarks don't duplicate what's on the slide, so that contains more pertinent information at times. Basic information about what is transferred is recorded during this step. Following traditional archival practice, all deposits must belong to a collection. As part of accessioning, we generate technical metadata and perform format migration as needed. All of this information is recorded in a relational database which we use to generate everything you will see in succeeding slides. That's the something old. Archivists mean something very specific by processing. The end result is a reduction of the finding aid. We shall help automate this step as well, but before we do that, we will turn our attention to how to describe these materials in their variety. First, though, we will appropriate the term processing and map it onto the OAIS Reference Model, returning to its archival we use at the end. The OAIS Reference Model describes a submission information package, or SIP, as well as a dissemination information package, or DIP, which is the information package transferred from the archive in response to a request by a consumer. We create SIPs as linked data. DIPs are produced in response to a query addressed to an RDF triple store, which is a database for data. Our DIPs, therefore, precisely conform to the definition of an information package as something transferred from the archive in response to a request by a consumer. They are lightweight, easy to transport using standard tools and robust. How are we going to describe the variety of materials in our repository and all their variety and complexities so that we can create our SIPs and DIPs? In looking at possible solutions, an obvious candidate presented itself, the European data model. While EDM was designed to model cultural heritage objects put into a web-facing repository, it seems reasonable to test its suitability for a repository from which one pulls cultural heritage objects out in order to build web-facing applications. EDM extends OAI-ORE, which defines standards for the description and exchange of aggregations of web resources. If a resource is a compound structure, then the description of that resource must say something about the structure as well. This, the European data model does. We shall apply the model recursively to issues of a serial title and to the pages of the issues. The wager is that if we can successfully model a relatively complex object, we should be able to model a relatively simple one as well. The core of EDM is shown in this diagram. The provided cultural heritage object or provided CHO is the intellectual object one is describing. In this case, an issue of a magazine and, as we shall see shortly, a page from that magazine as well. The web resource is how that intellectual object is represented on the web, for example, as a PDF file or a website. The aggregation describes the pieces associated with the intellectual object so that one can recreate its representation in all of its forms. The proxy is descriptive metadata about the object. It is optional in EDM because one can embed double-core descriptive metadata in the provided CHO. So one might have a mark record or a TI record of both. The proxy concept allows one to treat descriptive metadata as first-class objects in the repository. EDM reuses proxies and aggregations as defined in the O-A-I-O-R-E specification. Europeana mandates either title or description. Here we use both. The description will be a link to a file of OCR for the issue. This intellectual object has these constituent intellectual objects, its pages. We shall represent this object on the web by means of two web resources, one a website and the other a PDF file shown here. EDM requires that one web resource be described, though more are possible. We record all required premise metadata for the PDF resource. Here we show a few of them. Link data allows the use of elements of other data models. The aggregation pulls the various pieces together. The proxy is a link to descriptive metadata. This example is for a mark record describing an object in another collection by way of illustration. We can associate more than one proxy with an intellectual object. To recapitulate, Europeana also models agent, place, time span and concept, which we are not illustrating. Modeling the page object uses the same model for one level down in the intellectual object stack, except that in this case, we do not have deposited metadata for each page, so we dispense with a proxy. The model allows us to do this since the proxy is optional. The first page object again uses the description element to link to its associated OCR, which for a page object is structured as a file of XML. Words are accompanied by coordinates to allow highlighting in a hit list. Here is an, there we go. The second page object also has a company OCR, but in addition to a description, it also has a title. While the title of the issue might be the University of Chicago record, the title of the second page object is page one. The first page object in this case is the cover, which has no identifying characteristic at the level of the page. Note also that EDM lets us model sequences of objects. Here, page object two is next in sequence after page object one. Our scanning software provides us with technical metadata for TIFF images. There we go. We record premise metadata here as well. Here we're just showing one example. We can use the model of the page object level to collocate master file and derivative with the page as the intellectual object. This is a screenshot from our campus publications website. The website contains links to the OCR and the PDF for the issue, both of which we've shown you. In this example, we search for my surname. A hit list is returned with a search term in context. If you pick the first result and go to page two, layer is highlighted on the page. Since none of you can see this from where you're sitting, take the first hit and make it larger. Note the bounding box around the name. How does this work? This works by generating dips from the RDF triple store by means of sparkle queries, which is the query language defined for RDF triple stores or databases of linked data, as we said. Following standard archival practice, all accessions must belong to collections as we also said. We map collection to named graphs. We scope queries by means of the named graph of the collection and by means of the type of the triple we care about. In this case, we're limiting the search to only those assertions in the campus publications collections, which are about web resources. We can request dips as XML or face on. While we reference the persistent URI of the image in this response, what we really care about are its width and height. The URI is also an actionable URL to we care to retrieve it from the repository for any reason. We need to have width and height information for the TIF image because the bounding box coordinates in the structured OCR are stated with reference to it, not with reference to the derivative image used in the interface. And what we're using here is the internet archive book reader. To build the campus publications interface, we write a number of distinct queries to produce the dips corresponding to the different pieces we need for the interface, such as the OCR, the PDF files, derivative images, descriptive metadata, the dimensions of the TIF, master files, and so on. The analogy is when you order a computer, you get one big box, but in it are smaller boxes, one for the system unit, one for the keyboard, one for the mouse, and so on. Suppose now I want the digital master files for scores added to the repository for a music collection after the last time I made the request. This is a real use case for one of our researchers. There are several ways of doing this, and here we show one. We use the data model we've been illustrating to link scores at one level to the associated digital master files at a lower level. We use premise event datetime metadata to filter the results set so that only master files after the date of the last request are retrieved. Dips are purpose built just in time, but the queries generating them can be reused just in case. For Aristotle, virtue consists in the mean between doing too little and doing too much, a value which persists through the 17th century in parts of Europe. Ours may be considered an Aristotelian approach to packaging repository objects for dissemination according to the O-A-I-S reference model, neither going too far nor falling short, which is not something I expected any of you thought you would hear at CNI. Well, while we've automated the archivist's work of transferring and accessioning digital collections or the digital components of hybrid collections, we have not yet helped them by leveraging the accessioning database to produce inventories of processed collections in which they will add descriptive front matter. Once both they and we have done our work, we can create EAD encoded finding aids for these collections. Having done that, we will use the EAD to produce linked data according to the European data model. How do we know we can do this? The literature tells us how. Here are two articles on master's thesis describing the solution, one of which is explicitly entitled, conversion of EAD into EDM linked data. That particular article has a mistake in it. It's a small mistake, but just in case you read the first and the last and say, oh, the arrow is pointing the wrong way in two different ways. The second one is wrong, so. Thinking about how to characterize the solution being presented here, several analogies came to mind. The one is handing some on the canvas, brushes and paints and telling them to model whatever it is they want. Our community used to build software that way and we still do on occasion, but it is usually unwise and unnecessary. Modern software is complex and we can often leverage the work of others. But the other extreme is handing some on the box containing Mr. Potato Head. What you get is a plastic potato with holes in it and a variety of customization options for the eyes, mouth, nose, and so on. However, regardless of how much customization you do, with all the sudden done, what you have is Mr. Potato Head. Between these two extremes is Lego City. There are lots of kits for Lego City. What I'm showing here is a firehouse. But there is a police station, a construction site, demolition vehicles, so on and so forth. Once you get the hang of building with the instructions to come with the kits, you can venture out onto your own. One day I came home and my younger son said, look dad, an electricity truck. I have no idea what use case an electricity truck serves in the mind of a five year old and I don't have to know. What I do know is that his electricity truck plays very well with whatever it is he and his older brother built using parts from Lego City. Link data allows us to build something according to someone else's pattern. It also allows us to build our own electricity trucks using stock components. Even if we for some reason decide to manufacture our own components, we can do so in an interoperable manner. The EDM object model is built using parts from OAI-ORE. DPLA source resources defined as a subclass of Europe as provided CHO. The DPLA data model reuses parts of OAI-ORE and EDM. If we are not careful, we can use linked data technology to build silos. But if we are careful, we don't have to build silos simply to address particular needs. Just as with the electricity truck, purpose built does not have to mean siloed. Finally, what does it take to do this? Two core FTE do the heavy moving and lifting. One in the special collections research center, one in the digital variety development center. An archivist establishes the collections and directs the work of our digital accession specialist because our philosophy is to embed specialized expertise in departments that can provide a broader context for the activity. That position has a dotted line reporting relationship to me to provide technical supervision and to ensure that the policies established for the digital preservation program are being implemented. A program analyst reporting their relative to me built the accession's database and automates the production of sips according to specifications I wrote using the European data model. We rely on our system administrators to maintain the repository infrastructure both hardware and software as they do for all of the many systems for which we are responsible. Since we're not heavily staffed as a library, this approach has proved to be lightweight and robust not just technically, but with respect to the staffing required as well.