 Okay, good afternoon, everyone. Thank you for coming to our session this afternoon. My name is Chloe Lankford, and I'm going to start us off on this whirlwind tour of link data. Does it work? Maybe not. I may have to come back over here. Sorry. For today's presentation, we're going to try to run through all of these topics. We're going to talk a little bit about how we got started with link data, all the way through where we are today with our project, and we're going to show you what I think are some of the most compelling things to see in the world of link data, how we visualize the data and show it to users. Just to orient us to begin the presentation, we're going to present a definition here that will guide the rest of the presentation. When we talk about link data, what we're referring to is a set of best practices for publishing and linking the data. It has to be machine readable, and I get a lot of questions about link data. What is really link data all about? How do you define it? And I found that one of the ways that is helpful to think about it is thinking about upgrading our data. Data comes in all different formats. For instance, there's a website here, five-star data. You can go to this and look at the diagram. But in the nutshell, you could have, for instance, on one end of the spectrum, a scan of an Excel document with the temperature in St. Louis. That scan will give you that piece of information, that content is available to you, but it's not great data. On the other end of the spectrum, you might take it out of the Excel spreadsheet altogether. You could put the spreadsheet on the web itself. You could put it in a non-proprietary format. You could assign a URI to that piece of data, and then you could link it to other temperatures elsewhere, and that would be the other end of the spectrum. So we're talking about taking our Dublin Core metadata records that are housed in our digital collections and transforming them from a lower end of the spectrum to the higher end of the spectrum. How did we get started with this? Well, we're a small digital collections department, but we were very interested in what we heard in the conferences and the buzz about linked data, so we decided to dig a little bit deeper. We formed a study group, and we decided after studying the topic, let's try it. And I'd like to emphasize today that we want you to leave this presentation feeling like you can do linked data too. If we can do it, you can do it. Why would we want to do this in the first place? The main reason is that our data is trapped. It lives inside records, which live inside collections. It's very hard to show users the links that are existing, the relationships that are existing between those records, those collections across different people's collections. We have to manually show people those links right now, and that just isn't an ideal situation. So what we're hoping is by showing you an alternative through linked data that we can do something different. We can free our metadata from the silos that it lives in. We can expose those types of relationships. We can then link things together seamlessly, and our users can start to discover things in really compelling ways. Not only will they have more precise search results, but we can repurpose the data, and they can repurpose the data in all sorts of very interesting ways. So sometimes you have to make the case for linked data. You have to make the case for a new initiative. So here's the problem we face. Right now we have very rich metadata that exists in Dublin Core Records that when they are harvested, that richness is lost. Then the harvesters may produce linked data, but it's not reflecting the full richness of our records. So we decided we wanted to try to create linked data that preserved that richness. We see a lot of theory about linked data in the world, but there isn't a recipe on how to do this. It's a little bit difficult to know how to get started, and there aren't a lot of projects to reference to tell you how to do it. It also takes a bit of a paradigm shift for us to think beyond the record, to think in a post-record world where we embrace, you know, URIs and different sorts of data representations. But we find that this is very exciting, even though it's very uncertain. So right now, we're from Las Vegas. We work at the University of Nevada, Las Vegas, and so we have a picture here of Frank Sinatra for you. This is an example of starting to pick apart a metadata record and turn it into something going towards linked data. This image, this photograph, has lots of rich information in it. This is just a very small picture of what we might start to extract from the photograph. Frank Sinatra has a profession, entertainer. The photograph was created by the Las Vegas News Bureau. You sort of see where we're going with this. These are all just graphical representations of things we know about the photograph. But that's just one record amongst many collections that we have in our digital collections. Not only do we have collections with Frank Sinatra, we also have showgirl costume designs. We have a casino architecture collection. We have all sorts of collections that connect to each other, but users don't necessarily know that, and they wouldn't necessarily go to each collection to find those relationships on their own. So hidden in between all of those connections are relationships between different things. And what we decided is when we started to map out all of our items and how they might be deconstructed, that there were a compelling number of connections that people couldn't find. So we decided that we would develop a project. And the project was to study how feasible is it for us to create a process that would allow us to convert our collection records into linked open data, keeping the richness of the Dublin Core metadata that we create, and then allowing us to learn how to publish it into the linked data cloud and ultimately test it with users so they could see this new experience and discoverability. So this is how we started our project. I'm going to hand it over to my colleague Sylvia, also from UNLV, and she's going to talk about implementation. Okay, so here we are with this idea and where to start, right? One of the first things we did is a very deep analysis of what was available for us to go from our records to linked data. And most what we found were open source tools. And after researching all this software, we kind of pre-select the ones we are going to use. And this is a very, very high simplification of the phases of the system. So here I'm just talking about three of them. So in phase one, we realized we needed to clean our data inside Content DM. It's our content management system. We need to clean it. And there were various things we needed to do and I'm going to talk about them. After cleaning the data, we export it in a spreadsheet and import it in a new software called Open Refine. It's an open source software that is created to manage data or to transform one kind of data in another kind. Anyway, so we import the data to this system. We prepare the data and I'm going to show some examples of this. We reconcile the data. So when we use control vocabularies that are well known like Library of Congress, how we can reconcile that with bringing from the Library of Congress that nowadays has URIs, unique identifiers for each one of the terms how we can bring that to our system. And then we generate linked data that are triples. Did you talk about triples? Yeah, triples are how we express the data in linked data world. And then we export that linked data in order to publish it. So the third phase is to import data into publish. And for that one, we use a different software called Mugara or Virtuoso. Okay, in order to clean the data within Content DM, the things that we have done is to have metadata elements common across all our collections. We use well-defined CVs, control vocabularies. For those terms that we want to use or for those names that we have for names that show in our photos or some historic name, if they are not in the Library of Congress, we then create local control vocabularies. But we needed to share those control vocabularies across all collections so that we can identify the same person in one time with one particular unique identifier. So in the phase two, then after we cleaned the data, we prepare reconcile that is to bring the URIs, generate, triple, and export. We use the open refine, which is a server which can communicate with other datasets using the HTTP protocol. And we can generate the triples that we need for linked data using this particular software. So I'm going to show you just a few examples of things. The first one is when we import the spreadsheet from our digital collections to the open refine. And it appears very much as a spreadsheet where the columns are the content and here are the type of metadata elements that we have. This system has a very nice feature which are the facets. So for example, here I have the generate terms and on the left side I have the actual terms appearing in a facet. I'm sure you can't read it, but the terms are grouped together separate by semicolon. That is a problem for linked data. We need each term by itself. So there is another function in open refine that will split both the values and you say that the mutual values are separated by a character. In this case would be a semicolon. And then if you do again the facet, the terms will be all separated. So it really allows you to do a lot of manipulation and preparation of the data to be transformed into linked data. The other thing I just mentioned is the fact that it can reconcile with some of the control vocabularies that are out there like the Library of Congress, authority files. And for this we created a reconcile service. We put information about where the system should go to get the information about the unique identifier of the terms and these are when we fill in this form. So once we create reconcile service, then we started in a particular column. In this case here we're using graph elements, the TGM, which is the source of graph materials we use. And so we reconcile with this control vocabulary. So here is the result. And here you can see this indicates that there was a full reconciliation and we are able to bring the unique identifiers of those terms to our system. Okay, so now we cleaned the data and we reconcile and now we needed to start transforming it into linked data. But how are we going to do that? If I'm going to create triples which is subject, predicate and object, what language I'm going to use, what terms I'm going to use. This is why we adopt a data model. And this comes from the Europeana. It's a consortium that created a model to generate linked data. With that model we mapped to our elements, digital elements that we have. For example, we know that if we have title, we want to create a triple using DC title. I mean it's the title that was defined within the doubling core element set. So we are not going to go into details of this, but the idea is that you have an idea that we adopt a model to make the transformation between our metadata to linked data. So in order for the system to create linked data, we need to express that mapping into a skeleton. They call it a skeleton. And here it is. In this first column, this is a way for us to create unique identifier for each unique material that we are describing. And here, for example, is the mapping. In our collection, we have, for example, a description of the photograph of the material that we are dealing with. And so we mapped that to DC doubling core creation. So this map, once we have a model, this map is not difficult to develop. Okay. As we create that map, we can verify the triples that have been created. I know you can't see it, but basically it says, if I'm talking about description, it says the identification of the photograph, let's say it's the subject. The predicate is the DC description, and then the description itself. This is what is shown here in this triple. So the system creates the triples for you as long as you make the right mapping. Once we create triples for all our metadata, then we are going to export those files that are now leaked data. And so we did those two phases, and now we are in the third phase of the process. In the third phase, we are going to import that leaked data in some place, and it is called the triple stores, where they obviously manage triples. Then we publish and we can query them. So I'm sorry. So in the phase three, we import the data, publish, and query, and for that we use a system called Mugaya, and it's very simple. You just upload the file that you created with your leaked data. And this is wonderful triples that we created. The only reason I'm showing to you is to tell you that in here, so this is a triple, in here we have URIs or unique identifiers from Europeana, GBPDA, GeoNames, and Library of Congress. What that reflects in the result of our leaked data is that we are already creating a link with all those other control vocabularies. Okay, now we got to all those triples, and now how can we use this? What is the advantage of using this? We are in the exploration phase of the project at this point, but we are going to show to you a few visualization tools that we have used. And we are going to start with the open-link virtual pivot viewer. Then we will show you Ralph Finder and Jeffy. These are three open-source systems. So the open-link viewer, what is nice about it is that it's very good for images to show images instead of showing the triples with the URI. It shows images. And for that, in order to show the image, we needed to pre-select which images we want to show. And that goes through a Sparkle query. Sparkle is a language specific for the Padawis triples. And it allows us to also refine what we are seeing through the facets. And another interesting thing about it, once we create a query, we can make the result of that query a collection, a dynamic collection. So let's see how it works. And please have it into your mind that this is not for users to do. We would never do this, but this is what is behind the scene. So we can create those Sparkle queries for the user. And let's go to the... So as the... they told us that probably we would not have very good internet connection, so we created a few videos to show the work that we did with the pivot viewer. Okay, yeah, let's go. So this is what is behind the scene. And what we are going to show is the design of... custom design of showgirls. So here are the all... showing the images of the custom designs that were selected. And whenever we are interested in one particular design, we can select it. We'll select this one. And then on this side, we have all the metadata that we created in our collection. But now they are triples, but we still can bring them here. I feel we are looking for a different... a different custom design. And again, here comes all the information that we created as metadata report. Now we are using facets on this side. And here we are selecting the creator. And then it brings all the custom design of that creator that is the set. Now select a different creator and showing all the designs of the other creator. And now we go to subject and select everything that has hats. And here are the custom that have hats. And now gloves. It's a very nice example. And robes. Okay. And again, whenever we select one, we can see all the information right there. The other feature is right on the top. We select creator and then set create the columns with the image for each creator. So here we show. And obviously, we can zoom it so we can see better the images. And this was a selection of one particular creator and then it undoed the columns and then just that creator. And then again, subjects and creating the columns for subjects because you can put all the subjects here. It creates like a range of subjects. That is in alphabet order. And so we can refine, refine to a point where we find something that interests like this custom design that have feathers and so on. I think that's probably it. Right? So this is one of the possible views and what I think it is interesting is that what we have behind us are those triples and how we can get to this. The other one is more a system that allows us to explore relationships and among people and among things. We'll see two examples of them. This first example comes from an oral history of African Americans in Las Vegas and what we did here is we looked into their interviews and we saw which names they talk about and what people they talk about and what was the relationship with those people. So here... I'm trying to stop. So here we are just adding the names of the people that we want to see the relationships among them and find relationships and here is Ruby Duncan and Helen Toland and there are two relationships here and when we click and in that or this it shows on the left side the information about those people. And to see this particular there is a relationship that is known by... Let me see. Duncan has a record in the Library of Congress we just saw and this is a relationship that involves someone else. So that is a friend of this which is a friend of this person. So now we added someone else into the relationship to look for a relationship and that doesn't... It's not a part of it. It just came. And... Okay. And now the system is adding that third person and then bringing all the relationships that it found in the collection. Looks like a life kind of animal, right? So the relationships that we create are friend of, known by reputation, colleague of. There is a control vocabulary that it's just relationships. From here we can see when this was clicked it was related to many people. And in particular Eva this one is the person more that has more relationships. So we understand that probably this is not much for the regular user but maybe for a researcher that is looking into this community and might find it interesting to find the relationships. And again for each node we click we get information about the person. I think that's it. Okay. Yeah, the second... The second example we are going to show here is not much... The relationship is not much as this type where the person knows the other or the one but the relationship is because they are in the same photograph for example. So it's a different kind of relationship. So, yeah, we can start. And here our most famous person Frank Sinatra the relationship between Frank Sinatra and Dean Martin in our collection. And here we go. So if you see this number here means that we brought it from Library of Congress. So here it's creating the relationship. You can see that what relate them are photographs. Okay. So, and the other interesting thing is when we when we click on Frank Sinatra the information that you are seeing here comes from Libypedia because we made a relationship with Libypedia which is the Libypedia. The same thing with Dean Martin and Dean Martin also has a Library of Congress connection to Library of Congress which allows in the future when someone else create data Dean Martin using the Library of Congress idea it will be connected with us. And now we didn't put a person but we did we included a sense of talent for Sinatra. So what are the relationship between Frank Sinatra, Dean Martin and the sense of talent for Sinatra. So this is why I try to relationships among things. Here while between Sinatra and Dean Martin a lot of the relationships is through photographs here there is a show, it felt fullies that link them to the sense of talent. Unfortunately this screen was was too small to show all these relationships. So when you say this is a relationship through a photograph you see the photograph there and see who is who in that photograph. The other thing when we click in the sense of talent information that came there came from Wikipedia. This is all because we use the same URI the unique identifier in dbpedia and our collection. The idea of showing this to you might be a little confusing just for you to have ideas of the kind of thing you can do with linked data and I think we are not even close to explore all possibilities but this were just right away when we create the triples we are able to plug in some of those visualization system that show us that kind of relationship. So the last one is Jeffy we have been playing with Jeffy but for and this is very good to show relationship. It has a much more sophisticated interface and also we are working with it and get to a point that we wanted to show so we are showing one example of the linked data no, linked jazz. Have you seen this? This is a very nice using Jeffy it is about the jazz musicians and their interrelationships so here I am just clicking through the names of them and show who they are interrelated to. That particular one if we click that one which is Mary Lou then the system rearranges and put her in the middle as center and then all the the other musicians that are related to her. So again selecting this one this person will be in the middle and the relationship with her will be shown it is kind of a selected this is an option you could also go through the names of them and select the name you want to see what the relationship is there and or you can search for this case we are searching for and then we are searching for other people to see what other musicians to see what are the relationships in the holiday but anyway it is very dynamic I think it is very interesting in this project which is this project is developed by the the Pratt Institute in New York and what I thought was interesting is that besides creating this network of people they also are asking the crowd to tell them what kind of relationship exists between those musicians so that they can add the specific kind of relationship that they do not show here okay so we are at this point in the project just exploring those transition tools and we are transforming our digital collections into linked data we are evaluating alternative interfaces like the ones I showed we are creating freezing the linkage with other data sets but because this is very important so that the users can actually go beyond the context that we have we are preparing to publish our collection into the linked open data as linked open data we are planning for this this coming this semester to I mean the fall semester to start developing an interface that would integrate our data with other data sets with similar data and we intend to produce a cost benefit analysis to inform future plans for the development of our collections okay this is a project implemented and managed by two busy librarians and we talk about this because it's not just for technological people that it's so focused on technology we could do it with a mix of all our tasks that we do as librarians we didn't have any particular model to follow and we started experiment as much as possible and evaluating what was available for us there so we understand that with interest and motivation it is a feasible goal to create metadata oh I'm sorry create linked data and the benefits seem quite interesting we are still exploring the benefits of it okay and thank you if you have any questions