 I mean, Apple says it's four o'clock. It must be true. And I'm Karen Smith from the OCLC Research Office based in San Mateo, California. And in this session, I'm going to give you what amounts to at least a partial view of the landscape of linked data implementations, the who, the what, and the why. Now, what we started out with was questioning by some OCLC Research Library partners, metadata managers, two years ago, who knew about a number of linked data projects and felt relatively familiar with them, but felt that there must be more somewhere out there. There must be other linked data projects that we should be aware of, but aren't. So that was the genesis of international linked data surveys for implementers. Some of those focus group members collaborate with me and some other OCLC colleagues to design a survey instrument. And some implementers of linked data actually benetested that survey instrument. And we then conducted it between July 7th and 15th, August 2014. The results of the 2014 survey were reported in a series of hanging together blog posts, who's doing it and why. The second one was some examples. A third one was on the why of what constitutions are consuming. Another one was about why and what institutions were publishing. There was another one on technical details and then another one on advice from the implementers. And really the only criticism I received about any of these posts and results was you're missing some of the biggest players data implementations. You don't have the National Libraries of Germany and France in your results. And I'm going, but they didn't respond to the survey. I can't show results of people who didn't respond. So in the end, what we did was we repeated the survey in 2015 and made a concerted effort to actually get those National Libraries to respond. And we also changed the time frame to avoid that European summer vacation period so that the 2015 survey was from June 1st through July 31st, 2015. And as a result, we did get more responses. Now these are institutional responses from those that have implemented or are implementing one or more linked data projects or services that either consume linked data, publish linked data or both. So the 2014 survey, 48 institutions responded. And the 2015 survey, 71 responded. Of those, 29 responded to both. So altogether we have a pool of 90 institutions that have implemented one or more linked data projects or services, some of whom are in this room. Okay, this is the geographic, you know who you are. This is the geographic breakdown of all 90 of those responding institutions. So yes, the United States, we got the most, that's 39 responses, which amounts to 43%. But 43%, it's still less than half, right? And more than half of the responses came from outside the United States. Actually, there was a sort of neck and neck there between Spain and the UK. Who's gonna come in second place? But Spain beat up the UK by one. Then the Netherlands, Norway, and then a whole number that were just one or two responding institutions. This is the bar graph comparing by type the institutions that responded to the 2014 survey versus the 2015 survey. That second bar from the left clearly shows that we were successful in getting far more national libraries to respond to the 2015 survey than we did the previous year. So that's 14 national libraries, also an increase in the network and overall. So if we just look by type for the 2015 survey, you'll see 31% academic libraries. The speaker notes actually list these by category because some of my categorization you might not agree with but I think national libraries are clear. I'll read them off just so you know. They included the REA, academia, Nazi and other medicine of Spain. So National Library of Medicine Spain, Bibliotech National de France, British Library, German National Library, the KB in the Netherlands, Library of Congress, National Library of Malaysia, National Library of Medicine, National Library of Portugal, National Library of Spain, National Library of Sweden, National Library of Wales, and the National Library of Hungary. I can't pronounce Hungarian. The network one is a little tricky because sometimes these networks, which is meant to be more than one institution, multiple institutions all sharing the same link data services and database. And often they are actually hosted by a academic library or an academic institution. So you have to look at it and say, is this an academic or is it a network? So among networks there's ABIS in France, BBSIS in Norway, the Corsia de Cervius Univis at the Catalonia, Digital Public Library of America, European Foundation, the Oteco de Castillon de Genève, which is actually Swiss Bib, the North Rhine West Baleen Library Center in Germany, OCLC, RERA, which is the library network in Western Switzerland and the European Library. So if you think of just those three top categories, that alone is 65% of all the link data implementation responding institutions. And that will affect the kind of other responses you see later on in the survey results. So how long were the link data projects in production? Well interestingly enough, there is a more than double of the number of projects and services reported in production compared to 2014. Of course the numbers are more. The numbers on the bomb, these are the number of projects. So remember, one or more. So some of the organizations reported five or six different projects or services. So of the 71 that reported, we had 112 projects or services that were described compared to 76 in the previous year. And so two thirds basically of the projects are in production and a good third of those have been in production for more than two years. So we're talking about, in link data terms, fairly mature projects or services. Okay, let's move on. Okay, so how link data is used? The interesting thing here I thought was in both surveys, most of those projects and services both consume and publish link data. The number that only publish is relatively few. And although the number of responses vary between 2015 and 2014, the reasons for publishing link data, the ranking of those reasons is basically the same. The first one being exposed to a larger audience on the web. And those of you who have implemented or are implementing link data survey responses those that resonate, you can nod your head, say, yeah, that's why we do it too. Demonstrate what could be done with data sets as link data. The always experiment that we heard about link data and wanted to try it out by exposing our data as link data and then see if publishing link data would improve our search engine optimization. But by far the number one reason in both those survey responses was that exposure to the larger audience on the web. Number one reason. So here are the types of data from the 2015 results that it's being published. Now, given as we've seen that most of the respondents are libraries, academic libraries, national libraries, libraries reporting, libraries affiliated with networks of libraries, it should come as no surprise that the main types of data being published by that group of responses is bibliographic data, authority file data, and at close third is descriptive metadata. Other data's are also being used, but those are the three big ones. And so I'm going to show some examples of those link data projects that are in production. Cabiott, we were talking about 75 of those projects, which seems like a really good pool to choose from, except they're link data projects so they're designed for machines to read and not humans like me. So the ones I'm showing are ones that had something that was human readable, so we could all sort of understand what was going on. But I did try to choose examples showing different types of things. So the first one here is the North Rhine-Besphalian Library Service Center. It's based in Germany, so of course the labels themselves are in Germany, German, and they started in March 2010. This is from the description that they included in their response. So those are all Cologne-based libraries in the library center of Rhineland-Palinate, and it's an open data initiative, the first one for German institutions to release library catalog data into the public domain. And then in November 2013, they launched a link to open data API by its service, which you see there, the L-O-B-I-D. And the API provides access to different kinds of data, the bibliographic data from their union catalog of 20 million records, authority data from the German integrated authority file, and address data on libraries and related institutions. And the interesting thing also to note about this one is it's one of the largest of the published link data sets reported in the survey with one to five billion triples. So it's established, going on for a long time, multiple institutions. The next one I thought I'd show is one closer to home, North Carolina State University's organization name link data. This is the one that shows CNI Coalition for Network Information, and it also has other link data sources, variant names with a link to the website, where possible acquisitions and discovery staff created links to descriptions of the same organization and other link data sources, including the virtual international authority file, the Library of Congress name authority file, DBpedia free base, and the international standard name identifier. So this is all attempts to try to link together all of the institutions and vendors that the NCSU acquisition staff have to deal with. Springer is the only publisher to respond to our survey. So I thought that was interesting in and of itself. The description for its link data project was to make data about scientific conferences available as link to open data. The availability of such a data set will contribute to the broader goals of publishing the scholarly data as link to open data. Accessible science, data about publications, authors, topics, and conferences should be easy to explore and transparent science, the data on productivity and impact of authors, research institutions, and conferences should be open and easy to analyze. So this is the result of a search for a conference proceedings, which is part of its conference series, and it also has a link to a type of the data set and it has the rights visible there as under a CCO license. So that was interesting in terms of a publisher perspective. Now the British National, the British National Library, this was one of the very first to publish its entire national bibliography as link data. So this is a result of a search. They consider it successful in that it has been selected as for the UK National Information Infrastructure and its data modeling has been very influential on a number of other link data projects. Besides what you see here, what I thought was interesting is that it includes links to both the ISNAE identifier, the international standard name identifier, as its authority, and also a link to the VF, the virtual international authority file identifier, with the same as relationship. And at the bottom of the screen, it actually includes the Sparkle query that was used to retrieve that result, which the user can then modify as you would like and rerun. So that's the British National Bibliography. I included the URLs for each of these sites. This is from the National Diet Library. The National Diet Library actually reported five different link data projects and services. This is one of them. For those who don't read Japanese, here's the English. They do published bibliographic data as link data, that's one project. Another project was to publish their authority data as link data. This one is of bringing together the pictures, sounds and videos, maps, and other web resources, all related to the great Conto earthquake of 2011. And these are all link data sources that they themselves converted to link data, and then they bring it together and it's also available as link data. That's the National Diet Library. British Museum is one of the few museums to respond to the survey both times. This is from their Semantic Web Collection online. And its goal is to join and relate to a growing body of link data published by other organizations around the world interested in promoting accessibility and collaboration. So for each of the objects in their collection, it has both identifiers and then links to the topics and the ontologies that describes the person or the thing. Then we have one of those projects, it's scholarly. I call it scholarly because it's on a theme and involves multiple institutions. So in this case, it's a, the Moonen, I don't know how you pronounce it, but Manin or Moonen project, it's a multi-disciplinary, multinational, academic research project investigating millions of records pertaining to the First World War in archives around the world. Our aim is to take archives of digitized documents, extract the written data using massive amount of computing power and turn the resulting information into structured databases. And these databases will then support further research in a number of different areas. So it's a niche, First World War. I thought it was interesting because they do do some modeling of an event in this case as well as what the Great War was. And those are also directly from the project's link data. Now another project that probably some of you have heard of before is from the Pratt Institute. It's called Linked Jazz. So that's a project that has both the idea of linking the various musicians or jazz apply link data technologies to digital heritage materials and also explore the implications of link data to the user experience. So it exposes relationships of that between the musicians, but then enables jazz enthusiasts to make more connections between the musicians that they know. So they're actually using crowdsourcing to help establish relationships. The other interesting thing about this project I thought was it generates new triples from the content of interview transcripts. In other words, they are generating triples from the data of the contents of these interview transcripts rather than trying to convert existing metadata about the musicians. And so they took transcripts from various interviews from multiple institutions like the Rutgers Institute for Jazz Studies Archives, the Smithsonian Jazz Oral Histories, Hamilton College Jazz Archives, UCLA Central Avenue Sounds and University of Michigan's Video Archive of Oral History. They took all the transcripts of those and then generated all of these triples that then users like you can go and establish new relationships among them. So that was very interesting. So there's a whole bunch of questions we asked about barriers. And for the barriers to publishing linked data, the number one, no surprise, Steve learning her for staff, not that you would have it, but that's what they answered. Inconsistency and legacy data, go figure, selecting appropriate ontologies to represent our data, establishing the links and little documentation or advice on how to build the systems. So that takes care of the publishing part. You notice I changed colors. I'm talking about consuming now. In both the 2014-2015 responses, the number, the first two answers were similar in ranking. There's a little bit difference in the numbers again. The ranking is slightly different this time, but overall, same major reasons. Provide our users with a richer experience and enhance our own data by consuming data from other sources. They're primary two reasons. A third, more effective internal metadata management. Not so much in 2014, more so in 2015. Greater accuracy and scope in our search results. Again, see if consuming link data would improve our search engine optimization to see whether that would result in our stuff appearing higher in the rankings for a search engine search. Experiment with combining different types of data to a single triple store. And then again, the herd of outlink data, I want to try it out by using link data sources. Our curious might always want to try, but it's still, yes, it was one of the top reasons, but really at the bottom. The number of two reasons. Provide the users with richer experience and enriching our own data by consuming link data from other sources. Especially if you consider that everybody seems to have never enough staff or resources do what you want. Why try to replicate what's been done elsewhere. Try to take advantage of those sources and incorporate it into your own data. So these are the 2015 results of the link data sources most consumed. And I found it interesting that towards the middle, still part of the top, was resources we convert to link data ourselves. And I found that interesting because one of the advice from implementers in the 2014 survey was when you publish something, first consume it yourself. If you want to see how well you're publishing link data, see if you can consume it, see how easy it is. So in fact, that is one of the top resources that the respondents said they were consuming. The other interesting thing about the results, VEF, given an increase in the number of national libraries responding, VEF came up number one as the most consumed link data resource. In the 2014 survey where we had far fewer national libraries, it was fifth. So just again, who responded to the survey sort of affects how the responses are. So VEF number one, DBpedia number two, GeoNames number three, Library of Congress, IDLote.gov number four. And if you notice, in terms of numbers, they're all pretty close together. And then in the next segment, the Getty's AAT fast, that's the facet application of subject terminology, which is derived from the Library of Congress subject headings. That org of data bnf.fr, that's the Bibliothèque Nationale de France, the National Library of France's database, and the German National Library, Deutsch National Bibliothèque data service. So the stars there indicate the ones that also responded to the survey, okay? So we have those who are consuming the data that was published by somebody else who also responded to the survey. So what I've done is I've extracted some of those responses into a sort of brief profiles of each of those sources. So for example, VEF was the first one listed. It's, for those who don't know, it is a database where we combine multiple name authority files into a single OCLC hosted name authority service. It gets more than 100,000 requests a day, which is on the higher ranking of our activity list. The size was 500 million to 1 billion triples. And I put in the sources consumed. The ones in red are the ones that also responded to the survey. So besides DO names, it's idlocov, ISNE is the International Standard Name Identifier Service, WikiData, WorldCat.org, WorldCat.org works. Isn't that nice that OCLC does consume the linked data sources itself, publishes. I included here somebody, before we started, asked, am I gonna understand linked data? I mean, I don't know technical stuff. So I won't go too much into the ontology here. But you will notice as we go through the profiles that there is a lot of overlap between the RDF locator's ontologies used by the most consumed sources. So bibliographic ontology language, Dublin-Cortese terms, friend of a friend, all to web ontology, rdfschema, schema.org, and scos are the ones that they walked off. I can say they, even though we were both OCLC, somebody else actually responded to the survey. idlocov, on that list, the second one, enables developers to interact with vocabularies bound in the data and standards promulgated by the Library of Congress as data. More than 100,000 requests a day. Size, very similar to BF, 100 to 500 million triples. It consumes a little different agrobac, and the data from the National Library of France, the DMB's linked data service. It also consumes itself, idlocov.gov, BF, Wikidata, WorldCat.org works, and also resources they convert themselves. And in terms of RDF vocabulary ontologies, here, of course, bibframe, and of course we know Library of Congress is a leader in bibframe development, so it's nice that they are using it. Friend of a Friend, MadsRDF, RDF schema, and scos. For the Getty, a structured vocabulary for the generic concepts related to art and architecture, they also get more than 100,000 requests a day, and it's only 10 to 15 million triples. So it's a much smaller database getting about the same amount of activity, which is very, it doesn't consume any linked data. It's only one that publishes. And in terms of vocabularies, the one that's sort of different from what we've seen in the other, it also uses its local vocabulary that was not mentioned in the other two. FAST, the FAST-ed application of subject terminology. It applies LC subject headings with a simplified syntax to retain LCSH's rich vocabulary on making the schema easier to understand, control, apply, and use. Slightly less activity than the others, 10 to 50,000 requests a day, 10 to 50 million triples, and it consumes DBPDA, GeoNames, IDLCOV, and VF. For the RDF vocabularies and ontologies, mostly overlap schema.org, Dublin Core, DC terms above. One difference with this one, it does use the WSGS84 geo positioning, which you expect because it does also cover geographic names and geographic places. WorldCat.org, as you know, OCLC has made WorldCat.org available, bibliographic metadata available as linked data on an experimental basis for several years now. We're getting more than 100,000 requests a day. It's the largest of the linked data sets reported at 15 billion triples. It consumes DBPDA, but also FAST, VF, WorldCat.org, and in terms of RDF vocabularies, Dublin Core, friend of a friend, schema.org, and SCOS. And then we have the Bibliothèque Nationale de France, and this is making the data produced by the National Library of France more useful on the web, slightly less activity than the others, 10 to 50,000 requests a day, same size as the similar size to like VF and IDLCOV at 100 to 500 million triples. They consume a lot of linked data resources, so many that I couldn't fit them all on this list. Agrabac, like the Library of Congress, data.bnf.fr, in other words, their own data, DBPDA, the German National Library, linked data service, geo-named, idlog.gov, ISNI, the International Standard Name Identifier, BF, also the National Library of Spain, the data.bnf.es, which also responded to the survey, and then others, I mean, there was just a lot, but those were the main ones. It also had a lot more ontologies it was supporting. So I guess when you have been around for a lot, one lesson I'm inferring is the more you're around, the more ontologies and more vocabularies, the more you both input and want to export, and the more you want to support. So not only bibliographic ontology, but biographical ontology. It also supports Berger, as well as ISNI, of music ontology, OAI, ORE terms, RDA and RDF schema, and Life Fast, also the WSGS84 geo-positioning. So very rich sources, both in and out. The German National Library publishes authority and bibliographic data in RDF to make the data accessible to the semantic web community with no need to know library-specific metadata schemas. Similar size to the others, 100 to 500 million triples. Again, like the Getty, it does not consume any resources, but does support a number of RDF vocabular ontologies, most of which you have already seen on the other list. The one that's different on this one, ISBD. So the barriers to consuming the linked data. The numbers here are very close. There's like a looming number one, but matching disambiguating and aligning source data and linked data resources as the number one barrier to consuming linked data. The mapping of vocabularies being a number two. What's published as linked data is not always reusable or lacks URIs, lacks of authority control, data sets not being updated, size of RDF dumps, and understanding how data is structured before using it. Now this was interesting, we asked up front, so guys, you've been implementing these projects, some of you for several years, what would you do differently? Now I was really used to people saying, oh, we need more time, we need more resources, we need more staff, we need more support. I was not expecting, we do nothing differently. That was the number two response, obviously from the ones who feel that they're most successful. Having more realistic expectations, also one of the top list, but certainly lower than any of the others. Time, more time, always the number one, right? We always want more time. Okay, so the advice from the implementers, we did ask for advice. And some of you who have been around this block, you know, might also, some of this might resonate with you. Focus on what you want to achieve, not the technical stuff. There's a lot of discussion about the technical and which schema is better than this, or which language or which programming. And they said, you know, you gotta focus what your objective is. And similar to that, is build on what you have that others don't. Don't bother publishing it if somebody else has already done something very similar to yours. Link up, collaborate with them rather than start from scratch on your own. And pick a problem you can solve. All of this should be for a specific objective. What problem are you trying to solve with your link data project or service? Model data that solves your use cases. You should start with use cases, then figure out how you want to address them. This came from a couple of the national libraries consider legal issues from the beginning. You don't have to deal with legal issues, right? Copyright and all that kind of stuff. Read as widely as possible and consult community experts. It was noted by several of the ones who had implemented the first projects and services that they didn't have so much literature. They didn't have such a large community as there exists now for people doing link data or having done link data. So take advantage of that and consult them as much as possible. Have a good understanding of link data structure, available ontologies. Understand your own data. It's sometimes you'd be surprised how good or how not so good your data is. Strive for long-term data reconciliation and solidation. Involve your institution and community. Don't do this on an isolated type of just-a-project service. If it's going to be for a purpose, involve and collaborate with whoever your end users are, whether it's your faculty, your students, the public, other institutions, involve them early and often. Experiment and start small. I'm not sure I agree with that because one of our very first link data experiments was convertingworldcat.org to link data. So that was not small. That's 15 billion triples. But I guess, you know, but I'm just conveying that. I'm not saying I agree with all of it. And start now, just do it, go for it. I mean, don't be handicapped just, if you're interested in this area, if you have something to contribute to the greater community, just go ahead and do it. And that, this is a small snippet of a really big, big spreadsheet that has all of the responses from the 2014 survey. That's the URL for it. And our plan is to include, as another worksheet in the same workbook, the 2015 responses. So that way, you don't have to go by what I have presented here. You can do your own filtering, your own analysis on whatever parts of the survey you're most interested in, or you could look at the detailed responses from the institutions you most consider your own peers and see how they might resonate with youth. So that's coming soon. But anyway, it'll be available for anybody to use as part of that spreadsheet. And that's it, and that's my handle. And now open for questions. That's lots of time for questions.