 Good morning, everyone. I know we are just on top of 11-15. I'm sure there are folks who are going to start coming in. But two minutes ago, things started to get very quiet after the chime. So I thought I would go ahead and get started. My name is Beth Nama Chevaia. I am here on behalf of the project team that was funded by the Institute of Museum and Library Services to host a national forum on text data mining research using in-copyright and use limited text data sets. That forum happened last week in Chicago. And in the room today with us, there are at least a couple of participants who were there at the one and a half day forum. So I'm going to ask my colleagues if you have input that you would like to add. Please feel free. I would welcome that. So when we proposed this idea for the forum, we identified several stakeholder groups that we felt really needed to come together at this forum to talk about the challenges and also the way forward to facilitate text data mining with in-copyright and licensed text. So our goal was to situate, and I'm going to call it TDM for short, to situate TDM support and education and conversations by academic libraries within a broader landscape. Because we felt that TDM really appeared as though it were a niche activity, something that was really done by a real erudite group who were both technically savvy and also had the wherewithal to work out not only the technical problems, but to also work through the methods that were necessary to successfully do text data mining and then incorporate that into your research. So we really wanted to situate TDM within a much broader landscape within research libraries. We wanted to articulate points of convergence and divergence among the stakeholder groups, and we wanted to develop a strategy for libraries to expand research data services to include support frameworks for text data mining. We also wanted to leverage partnerships that libraries could and have been forging with researchers, professional and scholarly organizations and the legal community to support more open and accessible TDM. We felt that that was really a necessary part of what we were going to pursue together. So as you can see, the attendees at the meeting represented a really impressive scope and depth of expertise. We brought together researchers who are at the heart of text data mining, librarians, content providers, and this included both commercial and openly accessible content providers. We brought together legal experts, and that included practitioners, both in libraries and in the broader academic legal networks. And we also brought together legal researchers in these groups. And then we brought together representatives from professional associations, organizations that advocate in the space of research and networked information, like CNI. Professional association members included a representative from the National Academy of Sciences, ARL, ACRL, Burdy, the board of research data and information. And all in all, we identified a team of 25 experts in this area who met, as I mentioned, last week in Chicago. So with these questions framing the work that we did, the project team set out throughout the fall and winter of 2017 and 18. And we worked on two things, a scoping literature review that formed the foundation of a pre-formed discussion paper. And we worked simultaneously with the project participants to explore their perspectives on the landscape of TDM. We asked them to develop SWOT analyses. We did individual interviews with them. And just prior to the forum, we asked them to write forum statements. So the forum attendees were really a diverse group. They had a wide range of ideas and opinions. So when we entered the room last week, there was one statement that we all agreed on at the start of the forum. And that was the fact that copyright law and resource licensing complicate research with text data. So we started from that point of reference and we worked forward. We also wanted to make sure that we framed a definition of TDM. And really, we defined it as computational processes for applying structure to unstructured electronic texts and employing statistical methods to discover new information and reveal patterns in that process data. These data might include electronic journal articles, newspapers, books, or more informal textual data, such as consumer reviews or blog posts. In scoping the project, we set aside numeric data, non-textual content, such as static images, audio, or video. We got a lot of pushback about that throughout the process of the forum. And this was a really interesting tension that we needed to explore, not only with the attendees, but as we scoped the National Forum Proposal and worked with IMLS and got feedback from the proposal reviewers who suggested, we want you to narrow this. We want you to really focus on text. And then as we got to the forum, folks in the forum said, you know, there were really relationships between what happens in TDM and what happens with mining and analyzing other media. So we understand and we feel that this is really an important backdrop for what happens in this space going forward. So we also struggled with something else. And that was with what to call this data. We finally came to terms with this. And we called it use-limited data. You know, coming to terms with it, the intellectual property dimensions was really difficult for us. In the early proposal development, we referred to these data as IP restricted. We found that term sort of hindered rather than facilitated our discussion with stakeholders. And as we submitted the proposal, we settled on limited access. We didn't really feel that encompassed the full spectrum of challenges that scholars face throughout the research lifecycle when they're trying to work with these data. So that's how we came to grips with talking about use-limited data. We feel that that better describes the more restrictive facet of research with these data, how they may be used. And it encompasses a spectrum of activities ranging from modes of access to redistribution for validation and reuse. So we did use several methods, as I mentioned earlier. One was the scoping review of the literature. That was a targeted review of scholarship on issues relating to binding texts that are under copyright, subject to licensing agreements or otherwise restricted doodle intellectual property. We looked at works primarily in English for the past 17 years. We focused primarily on the US, but the team also included scholarship that addressed other legal jurisdictions, including Canada, Australia, the UK, and the European Union. We did searches on prominent databases in using terms related to law, library and information science, computer science, linguistics, e-science, digital humanities, and computational social science. We also did interviews with each of the forum participants. We reviewed the notes and the interview transcripts and we identified prominent themes. You might say that we did a lot of mining of the input that we got from the participants. We also mined SWOT analyses that we asked the participants to do. We asked them to look at and articulate very succinctly the strengths, weaknesses, opportunities, and threats in this space. We also asked them to develop a forum statement, a very brief and succinct one to two page statement about what they felt was important that needed to happen in this area to make TDM more accessible. And another, it's not a method, but I feel like I wanna call your attention to something that we used to really facilitate conversations and sort of action focused work at the forum. And that was a framework called liberating structures. If you Google liberating structures on the web, you'll find a website which I understand is cribbed from a publication, very largely from another publication that essentially provides some really good ideas for eliciting input from groups and essentially getting groups to interact with each other, but also to focus on the places where you wanna go. I wanna talk a little more about the SWOT analysis because I lived and breathed with it for a couple of weeks all by myself in my apartment. I identified a number of themes coming out of this SWOT analysis, but I did something that I thought was really helpful. Each attendee did a SWOT analysis. I coded those according to these themes, but I also coded them according to what stakeholder group they represented. And then I took the information that fell under each of the themes regardless of which stakeholder group, and I combined all of that information. So under a theme, I could see comments on strengths, weaknesses, opportunities, and threats from all of these stakeholders together. So mining that, I wanted to talk about a couple of themes that really jumped out, and they jumped out of the SWOT analysis across these groups. There's tension over working with content and the research process of working with content. There are differences of opinion among researchers, librarians, and content providers about the best ways to provide access to use restricted data. Okay, that's a no-brainer. But it goes a little bit deeper than that. All of these groups shared the concern that there really isn't a shared terminology across the disciplinary and professional boundaries. There are ad hoc procedures for transferring data, uneven data quality, idiosyncratic use of data formats among content providers. All of these things really hinder greater access to and deeper analysis of these data. So there was a lot of shared concern over this. Practices to providing access to these data are all over the map and everybody to a person agreed that that really doesn't make sense if we wanna promote this research. Even well-resourced universities struggle to provide access to content that has been delivered, say, on a firewire drive that shows up in the library with the caveat, make sure you destroy this after you've done your research with it and don't redistribute it. We also noticed that there is a chilling effect of use restrictions in TDM research. Those folks who said that researchers continue to do work with use-restricted data, but they don't openly communicate their methods and their data sources. We're also in this sort of rough area where researchers really don't have a way to communicate how other researchers could replicate or repurpose the findings that they have from their research. And I think this is one of the things that everyone agreed was probably a significant obstacle to TDM uptake throughout the disciplines. There were a number of legal and policy issues that were pointed out. It is not surprising, I'm sure you will not be surprised by the fact that this theme was the most commented on across all of the stakeholder groups. In the United States, the Fair Use Agreement for text-binding frequently is grounded on the concept of non-consumptive research, which, although it was defined in 2010 in the rejected settlement agreement in the Author's Guild versus Google, in practice, it's more complicated than it first appears. The boundaries between consumptive use and non-consumptive research are really not well developed. The line between checking results, which is permissible in say, the Google Books decision, and that line between doing that and human reading is not a bright line. And often, that is the thing that researchers want to do. They want to look at the summative results that come out of running algorithms across one or more text corpora, and then they wanna go back and read portions of that and really more deeply understand what they're doing. There was also some interesting conversation and tension around business models. There's tension about the role of commercialization in the text mining services. Some fear that if they haven't already, universities are gonna lose ground to large corporations such as Google, who will service data brokers for researchers instead of libraries. Others noted that publishers' interest in data mining extends beyond building TDM platforms and provisioning data access, but also to mining journal content for internal business purposes. Some folks noted that licensed datasets are a source of economic viability. This is a way to extend a thriving publishing industry while a number of stakeholders are concerned about further monetizing access for mining purposes. So, what were some of our outcomes? One of the areas around which there was a pretty broad consensus at the end of a day and a half of conversation and thinking both as a group of the whole but also thinking in small groups and working through some of the tensions that exist among stakeholder groups. We all came to this conclusion that TDM is really part of a larger conversation. It's really, if you think about the library's role, about libraries making content more useful and more usable in the digital age. We struggled as a group around identifying words and phrases that would elicit a pretty high comfort level across our stakeholder groups. We talked about, is this really about open access for data mining? And we realized that there isn't a strong comfort level across all of our stakeholder groups with talking about making everything open. But there was a comfort level with focusing on making content more useful and more usable in the research process. Building on that, there was a pretty good shared understanding that more useful and usable content really does mean it is accessible. And we need to figure out how to frame the conversations that say libraries have on behalf of researchers or libraries and researchers and content providers have about figuring out how we really have conversations that express what it is a researcher wants to do, how it can be done, and then how it can be done within the context of working with use-limited data without raising tensions that sometimes lead to, and unilateral no, you can't or you can only do this, but you can't do that. We also realized that reading and content mining as they were sort of outlined in that 2010 conversation about consumptive and non-consumptive research, these are really not mutually exclusive research activities for all researchers. And probably they're not mutually exclusive for most researchers. Because inevitably, if you see something in a summative that's referred to or occurs in a summative way, you really wanna dig in and understand, well, what is the context around that? How can I apply what I know about this body of research to these pointers in the non-consumptive research? And we also came to this realization that content mining can drive business models and revenue. If we work on this and if we use it appropriately, it doesn't mean that content mining as a revenue generating activity needs to be necessarily and add on to something that is already fairly expensive and out of reach for a number of institutions and groups. So I mentioned at the beginning of the presentation that we were really geared toward making commitments and we were geared toward asking individuals to make commitments. But we also worked in a very focused way on getting groups of people together, not just according to their stakeholder groups, but also giving them opportunities to talk across the stakeholder groups, to identify things that they wanted to do, things that were actually coming out of their conversations together. So at the end of the one and a half days, we had a number of activities that groups are working on together. One group is working on a declaration for principles around text data mining. Another group is working on making recommendations for academic library services. So a pragmatic approach. Another group is working on legal infrastructure for computational research. Yet another group is developing a grant proposal to develop legal and intellectual property workshops for librarians and for researchers. There is a group, there are conversations happening around a pilot TDM service working with Hathi Trust, Portico, Publishers and Crossref. There are more things happening, but we were really quite excited about the fact that there are a number of on-the-ground activities that working groups are pursuing, and we're actually in the process of setting up Google groups for them to continue to do this. Next steps in addition to that include a white paper, which ACRL intends to publish this summer. And I also wanted to extend an invitation. If anyone wants to get involved, feel free to contact me, that's my email address. We will set up a more general email box for folks who want to know more about the project. And in the beginning of the, on my first slide, if I can page back to it, which I will in a moment, there is a link to the website for the project and we're gonna use that to keep people informed. So I would like to break and take questions, comments from those in the audience. Yes, Robin. If you want to get the URL for the project site, that is another way to track on what's happening with the project going forward. Thanks everyone, I really appreciate your comments. Thank you.