 About 10, 15, so we'll go ahead and get this started. My name's Christina LeBlanc, and we're from the University of Notre Dame. And we're going to talk about a collaboration between the libraries and academic units. And I am actually from the academic unit side of this collaboration from the Center for Civil and Human Rights. So I'm not library, I'm not IT. I'm the research scholar side of this. So I'll have a slightly different perspective, but hopefully this will still be helpful for you to hear about our project. So when I started for the Center for Civil and Human Rights about three and a half years ago, I was charged with a project. And what my boss wanted to do was to take documents from Catholic social teaching, which are documents from the Catholic Church about different social issues, and documents of international human rights law. And they wanted to create some sort of database or research tool that would help scholars bridge the gap between these two different fields. They would have been approached in the past by various diplomats and audiences asking for help in bridging this gap, and so they saw a need for it and wanted to create some sort of database or research tool. So we looked online at other resources, and there were a few other resources that did something similar to what we were looking to do, but not quite as far as we wanted to go. And so we looked at Virginia Machine and this allows you to compare two different versions of the same text and see how they've changed in different editions. So this was helpful to put text side by side, but not exactly where we wanted to go. We also looked at Jexta Commons, and so this allows you again to look at different versions of the same text. It does a little bit more analysis and mapping of the text. But when we took our text from these two different fields and put it into this database, it highlighted everything, everything's different. It's not allowing us to compare two different fields. It's still looking at the same text, basically. So we found one other resource called Constitute, and this site did a lot more of what we were trying to do. This site allows for comparison of constitutions from different countries. And it even allows you to search via a topic list on the side and to get your search results back at paragraph level. So we were really interested in this database, but it didn't quite go even as far as what we wanted to do because this is still diving into one particular field, one particular discipline, and we wanted to be multidisciplinary. So at this point, we went to the library and said, okay, we wanna create this research tool. We have these texts, we don't know what we're doing. And I was telling John the other day that as a student and as a scholar, I always thought the library was a place that you go to study, that you use the space. I didn't really know the library had other things going on or did other research or other projects. And so we approached the Center for Digital Scholarship at the Hesburgh Libraries and said, we don't know what we're doing, we need some help. We've got these texts and we don't know what to do with it. So John will talk more about the library application and I'm gonna go a little bit more into the path that the library shared us down and what they recommended that we do. So in looking at trying to create an online tool that was multidisciplinary, we were running into problems with variation in language. We're looking at two different fields that use different language. Oh, I guess when we switched here. So one of the examples that I like to use is using the language of remuneration versus wage. And so in the middle pie chart, it's supposed to say that the blue is wage and the green is remuneration. And so if you're a user coming from, let's say Catholic social teaching, you might put in the term wage. And as you can see on the pie chart below, most of wage is related to Catholic social teaching documents, but some documents from human rights law use the term wage rather than remuneration. And vice versa over here, most human rights law documents use the language of remuneration, but some use the language of wage. And so we wanted to be able to return search results back to the users without having to worry about what particular language they were used to using. So we wanted a user coming in and searching either wage or remuneration to get back all of the search results. So the other problem we were having trouble with was we wanted to really narrow in on search results for the users. We wanted to bring back search results at the paragraph level, but we wanted to get back more than what would come up if you just did a standard text search. And so the one particular example we looked at was human development. This is something that both fields are very interested in, but a lot of times the text doesn't actually say human development. So it might say personal growth or enhancing personal worth. Architecture is on success for example. And so we wanted these paragraphs to come back when a user was interested in human development. Now I've come to realize for me as a researcher and the team I was working with, this was a very novel concept for us because when we go into the other databases, once you're in those databases and you're searching, they just search the text. So it just comes back only those texts that say human development. I've come to realize after diving into the librarian world a little bit, we were trying to catalog the text. We were trying to apply metadata. We didn't know what any of those words meant. People kept talking about metadata. I was like, I don't know what that is. What do you mean metadata? What do you mean cataloging? I don't understand these terms. So this was a novel concept for us and very unique for what we thought we were trying to do. Other things that we were trying to pay attention to, imagery language, Catholic social teaching has a lot of imagery language, especially talking about child of God and things like that. And we felt that if someone was trying to compare these two fields and bridge them, they might only be interested in those texts that talk about child in the biological sense, because usually if you're dealing with human rights, that's the sense you want of child. So making those distinctions, we were really concerned about document length, wanting to focus in on paragraphs, having different users issues with spelling and synonyms. So what we decided to do was to create a topic list. And so here's where there's also variation in language. I say topic list every time I present this to the librarian, they mean you mean controlled vocabulary. No, I mean topic list, no, it's controlled vocabulary. So what we wanted to do with our controlled vocabulary was identify concepts without being stuck to a particular vocabulary within the text. And we wanted to tag documents from both fields relating to similar concepts and ideas. So we wanted to bridge this gap between the two fields and make it multidisciplinary. So the questions were, how do we create or structure the topic list, what terms do we use, and how do we account for different users? How do we make sure that it's broad enough to return search results, but narrow enough that it doesn't bring everything back in the database and that it's actually useful for users? So in actually creating this controlled vocabulary, again, I had the librarians telling me, well, there's already language for this. You need to go to the subject headings, Library of Congress, and I'm getting all of these terms thrown at me and I'm like, I don't know, Library of Congress, subject headings, what are we talking about? And so eventually I figured out kind of what they were talking about, and I looked at these terms and I said, this isn't helpful. There's only a few terms that relate to the fields we're interested in. They're way too broad. They say human rights, or women, and there's not enough specificity with what we're trying to do. The terms aren't going to allow us to compare and be multidisciplinary. They're very specific to one field or the other. And I said, these aren't going to help. We need something unique for the research that we're doing. So what we did was beyond exploring Library of Congress, we looked at other databases that were focused on just one field or the other and what topics they used, what controlled vocabulary they used. We went back and actually read some of the documents that we were gonna have in the database in view of that vocabulary to see how it related. Developed a draft of our controlled vocabulary and then we sent it out to the field. We sent it out to field experts, research experts that actually did research in this space between the two fields and asked them, are we missing something? Should we add something? Should we take something out? What do you think of our controlled vocabulary? We had a ton of feedback and everyone, I think every area of our controlled vocabulary was touched and changed. And then we worked on actually applying it to the text. Seeking feedback where there were still areas of discrepancy, we weren't sure how to apply it. We weren't sure exactly what they meant. And then had our final topic list or controlled vocabulary. And then are still continually seeking feedback and usage on the vocab. So then in applying it, so we started with just hand tagging. We have over 11,000 paragraphs and over 250 topics. I don't know in the library world the scale of that, but to us that seems like quite a bit and a lot of work. And so we kept detailed notes on how we applied the topics and the tagging, but we realized at some point this wasn't going to work. And I sat in a meeting a handful of years ago with about, I don't know, 10 people from the Center of Digital Scholarship and other areas through the library. And John said, well, why don't you use text analysis? Text analysis, my boss and I looked at each other with big question marks on our face and we left them in. I said, do you know what he was talking about? No, I said, I don't know what he was talking about either. I said, I thought we were just going through and hand tagging and putting it up and that was the end of it. So come to find out, we did delve into some text analysis and we started with topic modeling as our next phase in tagging the documents. And so what we did was we did some topic modeling of our text, sped our text in and then compared the topics that we were given with the topic modeling to actual documents within the text and then to actual controlled vocabulary from our topic. And so this allowed us to get a lot more data to get more robust tagging of the text and then get really a good base training set. And so at this point, excuse me, at this point what we're doing is we have a base training set and we are using that then as we continue to add documents we will use that as our training and continue to use text analysis to tag further documents as we add them. So the database is called Convacate. We launched it this past April. It is live and functioning and freely open and available to anybody who wants to use it. So I'm gonna walk you through a few screenshots of the actual database and how it works and point out a few other features and ideas and then I'll let John take over with the library application side. So when you go into the database you do have the option of searching by topic or searching by keyword. And they are linked behind the scenes so that if you do search by keyword it will help bring up those tags that searches the metadata also. So when you go in here you can choose to search any topic you want or any combination of any topics you want. And it brings back your search results, Catholic social teaching documents on one side, international human rights law documents on the other side. One of the topics that I like to point out and use as a demo for searching, I don't know if you can see on the bottom left corner there that's checked is solidarity cooperation. So another thing that we've done with some of the tags is we've combined terms that, so that users are aware, depending on what field they come from kind of what it means. So solidarity is a very Catholic term that's not used a lot in international law documents. Cooperation is used more in international law and isn't quite the same as solidarity but they're related. So we've tried to also bridge the field and help users in some of the tags that we've applied to the text. So here you see in gold it gives the title of the actual document and then you can choose to open it up and see the actual paragraphs that relate to the search you've done. So we really wanted the user to be able to narrow in on the particular paragraph of the text rather than bringing back the whole text and having you muddle through and try and read a 30 page document or something like that. We really wanted to make this a helpful tool. So on this page what we ideally assume is that users will go through and open the different documents and look at the paragraphs and see what they're interested in, what they're not interested in. And then if you hit this compare button here it'll tell you that you've added one paragraph to your comparison and you get that gold color up tap. And so the developers who helped create this kind of compare this page to a shopping page. So basically you're going through shopping through your search results, choosing what you're interested in and sticking it in your comparison cart for later to look at in more depth. So once you choose that compare button and go more in depth into your shopping cart then it gives you all of the documents on the left hand side by the title. And you can choose to open any two documents side by side that you want. At this page you're not constrained to keeping both fields. So if someone is just interested in one field or the other they can choose to just look at human rights documents or just look at Catholic social teaching documents. So on this page what we wanted to do then because a lot of people especially students will look at text, read a snippet and say oh this says this without actually putting it into context of the full document. So what we wanted to do here was help especially the student be forced back into okay let's look at this within the context of the document what comes before and what comes after. What does this paragraph really say rather than just having it isolated. So on this page you open the full text of the document and you can use the drop down navigation buttons here to navigate to the particular text that you had previously selected and then they're highlighted. So that you can again compare side by side but scrolling within the entirety of the text of the document. There's a couple other features I'll just point out. We have a share save feature. So you can save this shopping cart basically and send it to a colleague, send it to a classmate. Professor can send it out to his class as the reading list paying attention to what paragraphs are highlighted. We also have a page at any point you can choose the title of the text and go to the full text of the document. On the right hand side you'll see a mini map of the document for easy navigation and on the left hand side what we've done is we've put in listed all of basically the metadata but the topics, the controlled vocabulary that we've applied to that document. And you can scroll up and down and choose any topic that you want and it will highlight the paragraphs within that text that I'll apply to the topic. Finally, one of the things that we're doing right now we want to be able to do some crowdsourcing. So we wanna get continuous feedback from users on whether or not they think this topic applies to this paragraph whether they agree or disagree. And so this is not currently up and live. It's on a different URL. But there's this option at the bottom here where it shows you what tags have been applied to the paragraph and you can give it a thumbs up or thumbs down. And then all of this data will be collected. We're helping to do some hackathons and some other events focused on getting some feedback on the controlled vocabulary that we've used. One of the things that we have done in our text analysis that I've asked the computer scientists and people who are running the algorithms is that I wanna throw a wider net rather than a narrow net. I wanna allow the scholar or the user to say, no, I don't think that this actually applies. I'm not interested in it. Rather than not getting the search result back and never being able to find it to begin with. So with that, I'm going to pass it over to John to give some more of the library perspective on our project. My name is John Long. I'm one of the Social University librarians on the digital side. That's why I'm involved in this project. I would say Christina definitely talked like a librarian after four years of training and without a degree, but really she understands the problem librarians are facing today, right? So this kind of summary of slides of what a library would offer in that project. So I remember before we started the project, our librarians and staff really did not know whether that was a project we should take on or not because it's simply because that was not something we normally do. And also it appeared to our staff and librarian, something like our faculty just want to leverage our web developers to design another website. So we sit down and we talk about it. There are two main reasons why our library took this on. The first reason was really giving us the first sort of hands-on experience to support across-the-world research on campus. And I know every campus project right now, across-the-world research is a big topic. And as libraries, as librarians, we don't really have a lot of experience in that area and that really give us the opportunity to learn from the academics and what's really their challenges, how we can help and support them. And the second one's really simply because what they are really asking to do really bleed over to the core competencies of librarians. So everything on the list we offer during the project, particularly for some online materials, even with the UN documentations, many of them are really experienced, link-wrapped and our librarians really have to do a lot of curatorial work to find the document, preserve them, make sure they're long-term accessible for the project, for the scholars. Of course, we did some tool development and librarians know how to do the search and do the access. But that's really kind of traditional side of librarianship we offer to this project. But there are some new practices we learn. We also put in place for this project. So for scholars, it's very kind of normal and probably sort of common for them to start doing the tagging work and curate their own collections. But from the librarian side, during this process, we really made two big leaps. So we give away two things, right? So first of all, we give away the collection development process. So instead of having a librarian doing the collection development, and we actually have the scholars to work through the field and find the target of the literature and really select the collections. So the other one we give up is really the collection description piece, usually done by our catalogers. And we, and the students and scholars really come from the field and telling us, give us feedback about the terms, what we should be using, what the terms of scholars and students feel like more relate to their fields, their own lingo and lean partners. The only thing we keep is really on the curation side. We do the curatorial work and we provide a lot of consultation work to students and scholars how to get the work done, okay? That's really a big change we learn during our process. So the interesting thing is when we give away the librarianship to the field and instead of we're losing our jobs and we actually gain more close collaboration with the academic side and they see more value while working with librarians. So here's my vision. I do think in every scholar, in every student there is a librarian inside of them. I mean, seriously, right? So the only difference is you don't know how the librarian should come out or how it should work. So what we should really do is to activate that librarianship and helping them to learn and grow. And particularly under this current climate it's very important to having more scholars and students do librarians. So that can sustain their own work and helping their scholarly work to be discovered long term. And the most important thing I do think is very important from the library's standpoint of view we had a cross-disciplinary topic model put in place. So I call it a new form of subject heading. So our formal subject heading is very sort of item level. So describing a book, a piece of an article and our new scholar subject heading is really at the full text, a paragraph level, that giving the control and the search and discovery to the vocabulary to the term at the full text level. That's really, really important for cross-disciplinary research for the reasons Christina already talked about because the differences are the lingo that people are using in different fields. So even within the project when we were working on this we started using text mining, text analysis, trying to sustain the process. I think as a librarian, as a library science, how to scale that process and make the system work for long term is a very, very critical question. And I was thinking, what if we add another 250 documents or even a thousand documents? So how long for us actually going to go through the tagging process and getting the work done and really having that process scale? Is it really, really hard questions for a librarian at this point in time? And but that's a really good question for us to think about how to tackle those issues and make the process more sustainable. So here we're thinking about our next steps. Really basically we're asking two questions, right? So can the project provide utilities for other cross-disciplinary research and scholarship? Or how can librarians facilitate cross-disciplinary studies? Christina talked about some of our limitations of the current knowledge systems and we're actually trying to think about since we already have a training set of the documents we put in place and we have a scholar and student already went through to tell us why this word child.gov means child.gov. The other one is a biological child, right? So there are already intelligence from an expert side putting in place in that training site. So actually can we, leveraging what we already learned from human expert and translate that into some sort of an algorithm and that can predict and also could really start parsing the document and present scholars, you know, the interrelationship of two discrete disappoints, right? So that's something we're looking at to do in the next step and we're looking for partners and I think there's probably the right audience to talk about this and if you're really interested in this type of work and we have four years of experience behind our back and we really know some of the scholars how to really working with us to think about how to computationally to solving this interdisciplinary research in our field. So with that said, I think that's our presentation. Thank you.