 Welcome to the session on link data in the libraries. My name is Phillips Kerr, and I'm the head of the metadata department at Stanford University. And I will be speaking first on libraries and link data, focusing on the mark records, the good, the bad, and the ugly, and why this transformation to link data is important and why it is so difficult to do. A revolution is on the horizon, one that is potentially as world-altering as the development of the web. And as most truly transformative revolutions, it is driven by a simple concept, link data. Link data has the potential to change every aspect of our world of information, creation, and exchange. And as primary purveyors of information, the library should be at the nexus of this revolution. Every aspect of our world will be dramatically altered as basic tenets of what we collect, how we collect, how we organize, and how we provide information, are questioned and rethought. Much has been said about link data, its ties to the semantic web, and its application for libraries. But what is it exactly, and how does it work? Link data has so much potential because it is designed to work with the web. And as more of our professional lives move to the cloud, it is important, more and more important, the way this information is stored there. The four tenets of link data are quite simple. First, use URIs or Uniform Resource Identifiers to name things on the web. Second, use HTTP URIs so that someone can look them up. Third, have the information provided by the link be useful. And fourth, provide links to other URIs so that people can discover related information. Link data is expressed using the Resource Description Framework, or RDF. The structure of any expression in an RDF is composed of a collection of triples, each triple having a subject, a predicate, and an object. This simple structure allows anyone to make assertions about anything. For instance, and this is for Baltimore, the Raven subject has author, predicate, Edgar Allen Poe. Ideally, both subject and object would be represented by URIs, and the statement itself expressed using an XML based syntax. The advantage of using URIs is that much more accurate matching can be made. There may be many variations in the spelling of Edgar Allen Poe. Poe, Edgar Allen, 1809 to 1849. Edgar Allen Poe, EA Poe, and not to mention all those typos and other variations. Is it an AN, is it an EN, is it 1L, is it double Ls? And so any machine matching you try to make by character string is very problematic. By linking to a URI, however, for Poe's authority record in the Library of Congress, name authority file. The link is explicit. And by recording this information in RDF, applications can exchange information on the web without loss of meaning. As RDF is a common language, information expressed in it can be used by many applications. And applications can be developed to take advantage of this growing pool of data. The strength of this model is that it allows anyone to make assertions about anything. What is equally as powerful is that any two expressions may be linked together, and through this process, an immensely rich web of data is created. Although it is true that there is no requirement that these statements are true. So for instance, the Raven, as author, Philip Scur, is equally a valid statement in RDF. It is equally as true that anyone may correct invalid statements. And this way, through an iterative use of data use and correction, the web of data becomes more rich and more reliable. It's crowdsourcing at a truly international level. Since the days of the card catalog, our focus has been on bibliographic records. These discrete bundles of information supply metadata about resources in our collections. Their record structure is carefully controlled, and access points, such as names, subjects, or series, come from recognized the sore eye and carefully curated authority files. With our transition to online catalogs made possible through the development of Mark, our focus remains on bibliographic records. The information they contain is fractured into various fields and subfields and stored in relational databases where they can be associated and maintained. This fixation on bibliographic records, though, has its drawbacks. First, many institutions prefer their own particular version of a bibliographic record. Even though OCLC might espouse the use of the master record in their database, libraries are free to alter and enhance the copy of that record in their local database. Corrections who perceived errors in others cataloging, missing data elements, and local practices all can be incorporated into a local version of the record that meets local users' needs. Large numbers of staff are dedicated to this work at an enormous cost. As the number of records grow, so does the cost of attempting to maintain them all. Second, these bibliographic records are stored in relational databases, which are by definition closed systems. In order for a patron to discover a resource in an online catalog, a bibliographic record for it must be present in the system. The downside to this arrangement was driven home to me by Michael Keller, University Librarian at Stanford University in the 1990s. At that time, I was head of the catalog department, and Mike asked me to catalog the web. I've puzzled over this question for more than a decade now. The question itself was very perceptive, but far ahead of its time. Our patrons were increasingly interested in what was on the web, and it was natural for us to provide access to it. However, our mechanism for providing access was both too expensive and too restrictive to provide access to nearly an infinite number of resources. Within a world of limited staffing and records in relational databases, consistent access to the web of data is an impossibility. Link data, however, is not focused on bibliographic records, but individual statements of fact. There are no discrete records to be maintained in a local ILS, no master records in a worldwide database, simply massive collections of triples and triple stores. By bypassing the all-priority need for a record, link data frees us from the cycle of record creation, maintenance, and deletion. Valuable staff time can be freed from these activities, and the confines of the relational database can be broken. But is link data really the solution? From June 27th to July 1st, 2011, Stanford University hosted a group of librarians and technologists to examine the use of link data in the academic environment. The hope was that in the short week allotted to us that we could both confront the challenge of planning a multinational, multi-institutional discovery environment and lay the groundwork for its development. One of the most interesting products of the workshop was a series of value statements as to why the link data approach is worthwhile pursuing. First, linked open data puts information where people are looking for it on the web. Second, link data can greatly expand the discoverability of our content. Third, link data opens opportunities for creative innovation in digital scholarship and participation. Fourth, by that worldwide crowdsourcing, link data allows for open, continuous improvement of the data. Fifth, linked open data creates a store of machine actionable data on which services can be built. Sixth, link data can help break down the tyranny of those domain silos that we've been hearing about at this conference today. And last, link data can provide direct access to data in ways that are not currently possible by reaching into the documents themselves. As people shift to the web as the first point of discovery, it's important for library resources to be represented there. Although it is true that our card catalogs may appear on the web, any semantic meaning embedded in the mark coding is lost. For the most part, the data in them becomes blocks of text. Through the use of RDF, however, important information encoded by the mark tags can be translated into triples that carry semantic meaning for machine processing. And each one of the elements in the triple can be recorded as a URI that can link these data points to matching data points within the web of data. By intelligent conversion of our library mark records to machine-resolvable RDF triples, the semantic meaning in the records is preserved. By moving these statements to the web, the data becomes a vital, structural part of the semantic web. In the world of linked data, these mark records are a prime preliminary source of information. All of the efforts that catalogers have put into controlled subject access, controlled names, classification, and consistent description has made it extremely desirable. As any library foray into linked data must begin with its collection of mark records, it's worthwhile taking a look at the format itself. Take as an example a typical mark record for a sound recording. And this is taken from our Discovering Environment search works. The record is quite impressive. It gives a description of the medium, the contents, the years of performance, controlled subject headings, analytical entries for all the individual musical works it contains, and displays the information in an easily digestible structure for the eye. It's simple for anyone glancing at this record to see that it represents a recording of Fritz Kreisler performing a selection of violin music. The musical works are clearly articulated, and responsibilities are clear from glancing at the record as a whole. But what would a machine make of this record? Much of the semantic meaning in this example can only be derived from the bibliographic record as a whole. The human eye can easily see that the main entry is Fritz Kreisler and that he is a violinist. That the piece by Joseph Sultzer is for violin and piano. And that if they liked this type of music they could follow the subject heading violin and piano music. And also that the Mozart violin concerto was accompanied by the London Symphony Orchestra, conducted by Sir Landon Ronald. This dependence on a complete bibliographic record for a semantic meaning is a holdover from the card catalog days. The mark format allow these records to be transformed into electronic documents and shared internationally. But they are still bibliographic records. And to be understood must be evaluated as a whole. Individual statements such as the participant note Fritz Kreisler violin with various accompaniments or the event note recorded 1904 to 1924 are meaningless taken out of context. RDF, however, is a series of independent statements meant to be understood by a machine. The conversion of mark to RDF has to overcome two great obstacles. The first is the concept of the bibliographic record. And the second is the inability of the mark communication format to clearly convey semantic meaning. It's often difficult for us to realize how much information our minds supply. From the author field we see Fritz Kreisler is listed as a creator. From the participant note we see that he is a violinist. From the contributor fields we see the recording includes Ephraim Zimbalus. From the contents note we see that he is also a violinist. From the contents note we see that Kreisler performs a piece by Tchaikovsky, Chanson's Parole, that was originally for a piano. From the included works note we see that this piece is from Tchaikovsky's work Souvenir de Hopsal. From the subject notes we see that the correct LCSH subject heading for this work is violin and piano music arranged. But there is nothing in the bibliographic record itself that links these bits of information together. It is the human mind that makes these logical associations. The mark format itself was created to clearly communicate the information encoded in our card catalogs. And in this it has been very successful. Although perpetuating the concept of the bibliographic record, it very clearly articulates and differentiates all the elements in the record. The mark format is used, however, almost exclusively by the library community. And much of its semantic meaning is lost to machine understanding. In the semantic world of linked data, these mark records themselves are inarticulate. The shift to the web as a primary source of information is unarguable. And as it is impossible for us to encompass the entirety of the web in our library catalogs, our catalogs must move intelligently to the web. Our millions of bibliographic records and the resources they represent are one of the truly great treasures we have to offer the web of data. The care with which we have created, maintained, and enhanced them over time have made them a primary focus of the semantic web. But the way in which the data has been recorded in mark prevents any intelligent automated manipulation or linking. Although a daunting challenge, this conversion of our bibliographic records from mark to linked open data will become one of the most powerful drivers in the transformation to the semantic web. Placing our data and resources where people are actually looking for it and tying them intelligently to the wealth of the web. Good morning. My name is Jennifer Bowen. I have two different roles at the University of Rochester. One is that I'm Assistant Dean for Information Management Services. And also for the last several years, I've been one of the leads on the extensible catalog project where we've been developing software. And I have to say that when we started out to develop the extensible catalog, we had no inkling that we would have anything to do with linked data. In fact, I don't think I'd heard of linked data at that point. But what we found is that there seems to be a very good fit with the software that we've developed and linked data. And so I'm going to be talking about linked data today and some of the lessons that we've learned with the extensible catalog that I think will be useful as we go forward to do exactly what Phil has described to try to take our mark data and convert it to linked data. And also some of the things that we've learned with the user research that we've done for the extensible catalog. So I'm going to be addressing four different questions today. If we're talking about linked data for libraries, why should we do it? Who should do it? How can we get started? And what are some of the possible outcomes? And again, I'm going to relate all of these to the extensible catalog. So Phil talked quite a bit about why we should do it. And I'm going to talk about why again. So I'm going to reiterate some of the things that he said. But what I'd like to do is have my remarks be in the context of some of the user research that we've done as part of the extensible catalog and that we've done at the University of Rochester. So we really need to, if we're going to do linked data, we need to know why it's important for our users that we do it. So we want to know what our users need and can linked data help us provide with their needs, what they need. And then I think there's an interesting role here where we're all looking for what's going to be the role of libraries in the future. That there are some things that I think that linked data can help us give us some ideas there about where we could go with that. So we do a lot at the University of Rochester with studying scholars. This is a book that came out a few years ago, edited by Susan Gibbons and Nancy Fried Foster, studying students where we studied undergraduate students. We had a follow-up book that came out about a year ago, which is called Scholarly Practice, Participatory Design, and the Extensible Catalog. And this is co-edited by a variety of people from several of the different ex-user research partners that contributed to the project. So anyway, the work that we've done with the Extensible Catalog, I'd like to show you very quickly some of the findings that we have. Some of them reflect things that libraries are already doing. And some of them, I think, give us some ideas of what scholars need where libraries may be able to have a role in the future. So with that in mind, here's some of our findings. The first one is that scholars want to read everything on the topic that they're researching. Well, we sort of know that, and we've been trying to assist them in doing that for a long time. Scholars want to be in the middle of everything they need, all organized so it's findable and usable. Again, that's something that libraries have tried to provide. Scholars want their research to be findable and usable by others. Well, we've tried to do that as much as possible and through institutional repositories, whatever. In this particular slide, I'm showing a researcher page that we've developed for our institutional repository that will try to bring the scholars, bring them more visibility there. So we've started, libraries have started to do some work in this area. Scholars want to connect to people whose work is interesting and useful to them. Well, this is something where libraries haven't really done a whole lot there. But again, it's something we knew, and it's something that the user research confirmed. Scholars don't care what the technology is as long as it helps them to do their work. So, and put another way, they want the technology to be in the background. They don't want it to get in the way of what they're trying to do. They want to just do their work. But one of the major findings that we're seeing is that we're really seeing yet another shift in how people see can use information. One is we've known that scholars bypass library discovery systems. They go right to Google. They go right to search engines. But we're finding that in addition to doing that, it gets even worse. They're not even going right to Google, but they're also doing work in their own applications. These could be web-based applications, mobile apps, social networks, whatever. That, so they're not only just bypassing us for the search engines, they're bypassing us for other apps that are providing what we are not able to provide for them. So it's sort of another layer on top of what the bad news that we had before. So here's a quote from my colleague, Nancy Freed Foster, who is the director of all of the user research at the U of R. Even scholars who continue to use library finding tools are turning into new applications to aggregate and analyze information in ways that extend their scholarship beyond what manual searching and analyzing allows. So that means that even the ones that are still using our applications are not really getting what they need from them. They're going elsewhere. So that's sort of the bad news. So what we need to do obviously to address this, I think, is we need to make our resources discoverable on those apps that they're already using, whether it's search engines, and we know that. We need to make our resources discoverable through Google and Google Scholar, but also through mobile apps and social media. And here's where we, link data has a real opportunity to provide that for us. So one of the problems I think with link data is that it's all backend stuff. It's how we present the data. It's how we do the metadata. We put it in triples and things. And it's very hard to envision how that could help our users. So I've come up with an example. This is a hypothetical app that we could develop to try to help our users. And so I'm gonna just sort of go through it and then talk about how link data might relate to it. So my example, it uses a cemetery that's right next to the river campus at University of Rochester, Mount Hope Cemetery. It's actually the first municipal cemetery, rural municipal cemetery in the country that started in 1838. And some of the residents, I guess, of the cemetery are some very well known people. There are some very famous and beautiful monuments there. And in fact, this cemetery is the subject of a course taught at the U of R called Speaking Stones, a professor of religion, Emil Homerin, has students go out, choose a monument, and then research the person who's buried in that particular grave site. So this is another little view here. Google Maps or Google Earth on the left hand side is the U of R campus. You can see the buildings there. And on the right hand side, you can see the winding pathways of the cemetery. You can see it's right next door. So we envision an app where someone is walking through the cemetery with a mobile device and GPS enabled mobile device. And they come across a grave site. In this case, Susan B. Anthony. So they take their device and they click on that and the device knows exactly where they are. And the device takes them, the app takes them directly to materials in our special collections on Susan B. Anthony. So we're pushing the information about our local collections right out there to our users. So this is sort of a cool idea. Have we built this app? Well, no, we haven't. We could build this app. And what would it take to build this app? It would take a lot of dedicated programming to try to do this. We could do this, a lot of hard coding. But what if instead of doing it based on the way we would do it now, we had this, here I'm quoting from one of the linked open data value statements that Phil mentioned earlier. What if we had a store, a machine actionable data that we could build improved services upon? We could do this so much more easily that data would be out there. Application developers could just grab it and use it in their applications. So this is really the vision there. I think we wanna go with linked data. So I'm gonna move, change gears and talk about who should create library linked data. And I'm going to right off give you my opinion that is as many libraries as possible should be creating linked data. And this actually came up in the last session that I attended this morning, where on the DPLA we were talking about, there was a question at the end about, well, isn't there a problem with duplication of data? Maybe who should be doing this? Should we expose this? Should we expose that? So I'm gonna give you my rationale for why I think as many libraries as possible ought to be doing this. First of all, as Phil said and explained very well, linked data is a whole new paradigm. We're used to thinking of our metadata as records. A record is we create the record and then it moves over here, it gets updated, it gets moved over here, it gets augmented, it gets deleted. Records, records, records moving through. We're now talking about statements. Just a statement, a linked data, an RDF triple that makes one assertion about a resource. And those are gonna live out there somehow and get updated and all of that. And it's a really, really different way to think about data. And I think what I've seen so far is that library metadata people, catalogers have a hard time grasping that. And my belief is that what we really need is a way to get hands-on experience with this, to really kind of grasp what this means for the library world and what the potential is. So reason number two, to serve the unique needs of local users. And I think that the Mount Hope Cemetery example is an example of that. Rather than having one entity create linked data that the whole world is going to use, everybody has data about their local resources, they have institution repository data, they have users that have their own needs. So we have this class where we have students doing research on cemeteries. We could create things for that if we have our own data out there. Reason number three, we need to encourage our vendors to implement linked data. And the first part of that is educating them about linked data. And I'm sorry to say I had a rather, a kind of unnerving experience with this midwinter. I was on a panel with some of our vendor colleagues. And we were asked specifically to talk about what our various software applications did regarding linked data. And one of our well respected, very well known vendor reps stood up and said, yes, our software does link data. We link directly from the URL to the full text. Not understanding that that's not what we're talking about at all. And one of the unfortunate things is that the word link data, it's so generic it could mean just about anything. So there's a huge education there. Right now, I wonder if this vendor was maybe engaging in some selective ignorance because their system didn't do link data and that way it sort of looked good. But anyway, that's another topic. But we need to know what it is. We need to have an understanding of what it is, why it's important so that we can press them to move forward with it. And then finally, why we should create link data because there are new opportunities that link data provide for us. And these can be in the area of new roles for library expertise that we can offer to our scholars by having link data as a tool that we use. And again, the linked open data value statement as Phil quoted them. Opportunities for creative innovation and digital scholarship and participation. So that's where we wanna go. So how can we get started? Well, first thing we need is we need some kind of a tool to create link data. We need to take all of our legacy data, our mark data and we need to create link data out of it. And what I'd like to talk about with really the rest of my presentation is how we think the extensible catalog could play a part in that. So what is the extensible catalog? It is software, open source software we've been developing over the last few years. We provide both a discovery system and a set of tools. And the tools can allow us to manage metadata and also to build applications on top of them. So it's really the tool part that I'm gonna talk more about today. We have four toolkits that are available at our website extensiblecatalog.org. Our funding has come from the major funding from the Mellon Foundation. We also have received considerable funding from several sponsors and partners, Carly Consortium in Illinois, Kyushu University in Japan, UNC Charlotte and of course from the University of Rochester. This is a really exciting time for XC because we've been developing the software for several years and we've really been working to create a very robust system and we have our first XC sites up and running right now. We have a demo site that you can go to the XC website and that's what you're seeing here. You can take a look at that and play with it. This again is our discovery interface. We have our first production site up at, this is Kyushu University in Japan. They call it their cute catalog. And they have both the Japanese and the English version of the interface. So this is really built on top of their ILS data as an alternative for their ILS. Another example, Tufts University developer is working on using some of our toolkits as the background for the Perseus Digital Library. So we have some non-mark digital library usage going on right now. So those are the first ones and we hope that you'll start to see a whole lot more implementations of XC. It's really, we're really exciting. It's been a long time coming, so we're really excited about that. So let me talk about XC and its potential for linked data. What we offer right now with XC is a platform for experimentation with metadata and metadata transformation and reuse. And potentially we could do it also for linked data. So we're not doing linked data right now but this is where we see the potential going forward. And I think it's this, the whole idea of experimentation is something that I really wanna emphasize because I think if you have gone to any other presentations on linked data, heard people talk about it, you realize that we really don't know how to do this. That this is a whole new area that we're gonna be embarking on and we need to be able to do something and try it out and see what works and if it doesn't work, we start over and try it a different way. So XC is, could be a platform for creating linked data. We already do bulk conversion of existing library metadata. We have done a ton of work with marked data, marked holdings data, marked bibliographic data and trying to get as much value as we can out of our bibliographic records. So it stands to reason that that would be sort of the groundwork for using linked data. We can synchronize the data conversion among existing systems. And again, this risk-free way to experiment with the data. We can put the data through XC, we can do a conversion and if we look at the result and we think, you know, this isn't really what we want. We throw it away, we start over, we haven't done anything to harm the original metadata and we can experiment. We also have the potential to make the linked data available to developers in multiple ways, multiple formats. So we think that, again, this will be very important as we go forward. So here's the architecture picture. Have to show the architecture picture. So the four blue boxes are the four toolkits that we've produced or that we've developed for XC. They are all available right now. You can see it's a very modular system, a very modular architecture. And we think that that's one of the reasons why it's gonna be really useful. Down at the bottom are the repositories in green where you would originally get your metadata. I'm not gonna go through this fairly in detail, but the red arrows coming out of the green boxes, that's the metadata and that's sort of the pipeline of metadata coming through the various toolkits and then ending up in the user interface. The orange arrows represent circulation data going back the other way. So we're just gonna concentrate on the red arrows. So that's how the whole system looks. And as far as link data is concerned, it's these two that are really of interest. The one on the left is the Drupal toolkit, which is where our front end user interface lives. The MST is the metadata services toolkit, which is where all of the bulk processing takes place in our system. So the title of my talk was that XC can produce link data ready, is link data ready software. And I think there's two aspects to that. One is that the software itself is link data ready and I'll show you how the software itself could produce link data. And then I'll talk about how the data itself that we produce is link data ready. So we have really, we've identified three different ways that we could provide link data using the software that we've developed with some additional software development. One way to do that would be to produce RDFXML harvestable record sets. We could do that as output from the MST with developing a little bit, an extra service to do that. Second way we could do it is to provide a sparkle endpoint, which would enable people to developers or other applications in particular to query for information on demand. Both of these two options, the sparkle endpoint and the RDFXML output would be developed on top of the MST, which is where we do all the bulk metadata conversion. We also have a third option, Drupal 7. The next version of Drupal enables the inclusion of RDFA, which is another format for link data right in the website, right in the user interface. So that would be yet another way we could produce link data right in the user interface. So again, we think that there's a lot of different ways that we could do this. And producing link data in more than one way, I think would be useful. We'll find out exactly what application developers need and for what purpose. And again, if we don't like it, we can go back and create different data. So I talked about how the software itself is link data ready. Let me shift gears and talk about how the library metadata itself that we're producing is link data ready. Excuse me. Well, the first thing we've done is that we have an underlying schema that we use in the system. When we created that schema, we were very careful to ensure that we used only registered element sets from registered vocabularies. And by doing this, it's going to really facilitate conversion of this internal schema into RDF triples. That is every one of the elements that we're using in the schema already has its own URI. Let me show you. This is a diagram of an RDF triple. So similar to what Phil showed you earlier, on the left is the subject, the middle of the predicate, and the right-hand side is the object. So this particular example, this resource has subject poets American. And there's a box right here around the predicate because when I'm talking about the schema elements, I'm really talking about the predicate here. So the statement, the assertion that we're making about each resource, this one has subject, is something that comes from a registered element set. And the ones that we're using in XC, we're using all the double and core terms. We are using a subset of RDA elements and role designators, excuse me. And of course, those are all registered in the open metadata registry, so we can just grab those and use those. And they're all ready for linked data. And then in a few cases, we had to create a few of our own properties and those are being registered in the open metadata registry as well. The second way that we think is very useful for making metadata that's linked data ready is that we are incorporating the Furber data model, the Furber group one entities, if you're familiar with Furber at all, that's work expression manifestation. And we are converting marked data to Furber entities as that's the first thing we do. And we think that this is a way that we can enable us to produce more meaningful linked data than we might be able to do without the Furber part in there. Now the reason we're doing the Furber part is because we have that as the underlying structure behind our discovery interface. So we were doing that anyway. And we started to look at, well, is this a good thing for linked data or not? But what we see is, so this is what we're doing. We have a Mark XML bibliographic record and we parse it into three linked records with links between them so we have a work expression and manifestation. So that's what we're doing. And here's, now here we're talking about the value added here that we're doing in Furber would be the URI for the resource itself. So the left side of the RDF triple here is where we would see the value. So if we were gonna create an RDF triple about a Mark record and the data in a Mark record without using Furber, it might look something like this. This Mark record bibliographic, big record number has this author JK Rowling. So we're actually making statements about a Mark record, which as we're moving away from Mark, maybe we don't wanna do. So if we're, by using the parsing that we're already doing to create the Furber entities, we can make statements then about a Furber entity so we can say that this work has creator JK Rowling. This expression has language English. This manifestation has an ISBN and this is the ISBN number. So we're not making assertions anymore in our linked data statements about Mark records but about something that the library community has said is very important to it and that's the Furber model. And of course, Furber is the underlying data model for RDA, the new library content standard for metadata. So this seems like it ought to be something that would be valuable to look into for the library community. So why should we do this with the Furber part? Again, our user research has shown that users wanna see the relationships between resources. This is why we built a metadata standard on top of Furber because it's important. So since we're already doing it with XC, we can find out whether this really is important or not. So it's yet another way that we can look at the value of the Furber model and evaluate whether it's useful. We also believe that we don't have to be wedded to that as far as using XC to produce linked data. We could also use other data models as well but since we're already doing the Furber, it seems like a really, really interesting opportunity. So what might be some of the possible outcomes of doing linked data? And I'm gonna revisit here some of the user research findings that I mentioned at the beginning, connecting scholars with others, making scholars work findable and developing technology that helps scholars to do their work. I've developed a couple of use cases that I think relate to these that I think we could do with linked data and I've tried to put them in the form of an RDF statement for each one of these. So these are just sort of to give some ideas of where we might be going. So some ideas of tools we could develop, we could allow scholars to create linked data as part of the work that they're already doing so that they wouldn't have to say, oh now I have to learn how to create linked data. We could develop a tool that will enable it to just be linked data and add it to the linked data cloud as a result of the work they're already doing. Citation relationships I think are very important area that we could explore here because again citations are all relationships. I cited this work, he cited me. Creating and managing vocabularies, scholars do this from time to time and librarians are really good at this. So there's a natural area that we could offer value added to our scholars. Enabling experts to augment metadata that we've already created with information they know about a resource and to have that information then be expressed as linked data. So here's some of my examples. My thesis is based on this data set. So there we have something that could be expressed as a triple, the thesis may be in your institutional repository. If you could create a linked data statement that links out to the data set, then suddenly you have that relationship documented and anybody else that uses that data set, whatever can take advantage of that as linked data. These photographs are all of the same person. So again we're making a statement about this is the same as this. Again this is the kind of thing that researchers are doing all the time and that can be stated as linked data and could be out there in the data cloud as something that they're contributing as part of their research. These other researchers are citing my research so we sort of turned it around as far as what's cited in this particular article but what a researcher or a scholar is really caring about is who's citing my stuff so that I can get credit for it and I can demonstrate my value and again I think there's some linked data possibilities there. So just to reiterate how XE can facilitate linked data, we have open source software, risk-free experimentation with Mark which is obviously a good place where we start because we have this huge investment in Mark data and also other library metadata and we can manipulate that data. We have the potential for creating bulk conversion of linked data and we have three different ways that we could do that with some additional development and so we've already done a lot of the heavy lifting with doing the metadata conversion, all the sort of the backend processing of the Mark data and with some additional development on top of that we think we could do any one of those three ways or all three and we also think that XE would be a really useful platform for developing some of those linked data applications and tools such as I described that really could start to address some of the needs of scholars. So the next steps for XE and linked data, we are seeking funding to do the development of the linked data work and also we wanna do, when we develop software, we always do user research at the same time so we would do both at the same time and we would love to have libraries to partner and participate with us in that effort. Bill, thank you very much. And I think we have about 10 minutes we can take questions for both of us. My name is Veena Jartra and my question is for Jennifer. I really enjoyed your presentation and how you describe linked data and as you know we've been working with Fulver and RDA for a long, long time. And there's a problem that's coming up when you take these bibliographic records and convert them to triples, you get a whole volume of data. I mean the content becomes larger and larger as you go. So in one of the examples you said, expression ID has language English. But given that there are 150 languages in the world of so forth, we could potentially have 150 assertions. But instead of doing a direct link like you did, expression ID has language English, we could simply say expression ID has assertion ID and it's one of 150's assertion IDs which compacts the data but increases the reference, correct? Is that advisable or not, is what I'm gonna ask you. I honestly don't know of a node. Yeah, I don't know and I think one of the things we don't know with linked data is whether the volume of data is going to be an issue when we start putting it into triple stores and how much, so I guess the answer, my answer is I really don't know. I think it's a very interesting question and I'd be happy to talk more about it with you and why. You know I hesitate to answer because I'm not one of the people who is a real expert on linked data syntax and what's legal and what's not and whether that somehow breaks something. So I guess I'm hesitant to respond to that. But again I think this is, I'll just reiterate my point is that we really need to experiment and we need to try various things and find out whether what you're suggesting, whether it works better, whether the duplication of data and the volume of data ends up not being a problem in a totally linked data world. So I think that's really an open question. If anybody else would like to answer that, not me. But I had to have another question. So your example showed a predicate that was a locally namespace Rochester relationship. So and also my question really has to do with in addition to just having RDF, don't we need centralized sources of ontologies and predicates as well as how do you match the information within the mark record to which source vocabulary source as the link? For your Edgar Allen Poe, is it necessarily the Library of Congress name authority or there's some other name authorities and how do you map what's in the mark records in your catalog to a specific instance? We absolutely need to do all of that and that's again where a lot of the work is gonna need to be done to make sure that we do this right and there are a few vocabularies out there that we can try to use right away. But certainly there are others that we need to figure out how we're gonna do that and how we're going to map across vocabularies from one to the other. I think the predicates we're using are the registered schemas. I think what you were referring to was maybe the subject part, which is the refers to a URI generated by XE software. Is that what you were looking at? That's the Rochester part of that. Other questions or comments? I guess I would add one other comment to that. It's in the work that we've been thinking about with linked data, there's two sort of different thoughts about that idea of authority control and how it works. In the work that we do now, we think a lot about authority files and having that central point to be able to link to. And there may be multiple different authority files. There could be MIMAS, there could be LCNAF, there could be a number and then we could link those points together and make this sort of web of interconnections. But there are other automated means of doing that linking. And so there's organizations like same as that will have, by machine processing, look at these names and say this is most likely the same person with an 80% of probability and we'll link them up. I was speaking with some developers at IBM a couple of years ago and they said they could process the information on the web and with the 80% guaranteed rate they would be able to link up the names that were the same in an automated way. So there's sort of two different approaches we have in the library world our desire to create those authority records as linking points. And then also there's the side that says well we can do it by an automated means with a high percent of accuracy. So which is the best approach to take? So there are definitely those two different camps of people thinking about this. As a head of a metadata department I do like authority records but I'm thinking their use may become not so much as that anchor point for which of course they're very useful but because of the additional information that can be put in them that could become very useful as far as identifying who that person is and a bunch of other, it's just a place to put all that information and make some very useful. Question and a comment. I'll state the question first. Is the mark to linked open data a good idea that's being explored at this point in time or are there action items that will allow Stanford or other participating schools to move forward on this with some vigor? And quickly my comment was that I understand that billion triple stores have already been successful and probably trillion triple stores will be right behind and of course it won't be long before we have trillion bytes of disk for a dollar so I'm not worried about needing to compactify the data. So when I think about linked data in the billions and trillions of triple stores I think to me it gets back to that idea people always talk about of the killer app which is one of those things. Everyone always says needs to be developed. So in the world that we have right now we have to me a very small world. We have our bibliographic records. We can't possibly create catalog records for everything we'd like to be able to contain. So to me there, the difficulty is that the world is too small. So now at Stanford we're thinking about all the other things the university produces and its day to day activities. So we've got reading lists, we've got data sets. There's all sorts of valuable research that goes on far too much for us to create a catalog record for. So we want to try to capture all that information. So instead we're faced with a world where there's too much information. What we can't possibly make make some sort of creative or useful world of use out of all those billions of triples. So we're gonna need to find that app which instead of the downside is not too little information but too much information. So how do we sift through all of that to be able to make these useful, display the useful links to people without having them be overwhelmed by this amount of data that's out there. So that is gonna be one of the real challenges is the manipulation of those triples and to present that information in a reasonable way. Just to follow up on the first question that you asked was, can we go ahead now to work on Mark? Yeah, and a lot of different places are experimenting with that in a lot of different ways. And so I think with the extensible catalog, what we want to do is enable more people to do that experimentation. But yeah, I think we, the time is now to really see what we can do with it. Yeah, another question. Yeah, Patrick O'Brien, University of Utah. I had a question concerning about your tool set or development tools that you have available now. Particularly, what can we convert the Mark record to in terms of ontologies that are available and then what tools are available to improve the quality of the data? So for example, de-duping, what is the meaning of the particular element within the data? So an example is time period. The creation date was that the date it was published or is that the date in which it was written about. So it might be an article written about the Civil War, but it was written now or it might have been something written during the Civil War time period about something in the future. How do we go about determining the meaning of the data? Well, I think that for in the library world, the way we've done that is through having content standards where we have defined what is the meaning of that data. Unfortunately, our tradition of doing that has not been very machine actionable. They've been text documents. And what we're looking at now is a real period where we're converting from this Mark ACR2 world to an RDA world. And one of my other hats used to be to work on RDA development. So when we started working on RDA, again, we didn't know about linked data. We didn't really know the value of registering the data elements. We weren't thinking in terms of a data dictionary where you would be able to have a registry of metadata elements and that registry will tell you exactly what that data element means and what it refers to. But luckily, that work has gone on with RDA. So there is a registry now where you can look up. Once we make that shift from our old standard to our newer standard, we'll be able to do that. And then it's a matter of taking all of our legacy data and mapping that in ways that make sense. I guess my question is. It's kind of a mess, but we'll get there. Okay, so we can't use the tools to do that yet. I think we will be able to with RDA data because the registry is there and we do have some RDA data that's now being produced. Unfortunately, the RDA data is in mark. So we're trying to provide other opportunities for having RDA and other schemas that will make it more usable. I think we're in kind of a messy transitional period, but I think we've got some of the pieces in place to go where we need to go. Anyone else? Thank you all. We've got a couple of minutes. We can hang around and if you have other questions. Thank you.