 Hi, my name is Rebecca Bryant, and this is a presentation on obstacles and opportunities and research information management or RIM systems in the United States for the CNI fall 2021 membership meeting. Again, my name is Rebecca Bryant, and I'm Senior Program Officer with the OCLC Research Library Partnership, and I'm co-presenting today with Jan Franzen, who is the service lead for research information management systems at the University of Minnesota. OCLC has conducted a number of research projects over the last several years related to research information management, including these three OCLC research reports. The last one is one that I have circled because I want to call it out. It's called practices and patterns and research information management. And it was a project that we undertook with Eurochris. We conducted an international survey of research information management practices, and it was very informational in helping us understand the landscape with RIM or in Europe, as they're called Chris systems in Europe, in the UK, in Australia. We had great samples from some parts of the world, and we had a lousy sample from the United States. And so, wanting to really understand the types of research information management activities that were happening in the United States, we decided to change tact. And so, instead, we imagined a project where we would take a case study approach, and we developed a project where we did a case study of these five institutions, Penn State, Texas A&M, Virginia Tech, UCLA, and the University of Miami. Because we chose these institutions because they represented a diversity of use cases, products, scale, whether below the institution, enterprise wide, or maybe even at the state or system level, as well as stakeholders. And so, we pursued this project through a series of semi-structured interviews, and in all, we interviewed 39 people at eight different institutions. This was undertaken as an OCLC research project with Jan Franzen and I as the co-leads on this project, in part because Jan was able to spend her sabbatical with us on this project as well. We were joined by Pablo DeCastro from Strathclyde University in Scotland, who also is active with Eurochris, who gave us this nice outsider perspective on the U.S. landscape, as well as Brenna Helmstutler from Syracuse University and Dave Scherer from Carnegie Mellon. Our goal with this report is the report publication itself coming out the week of November 8th, as well as many other derivative works, including a blog series that is taking place across this fall and early spring in the OCLC research blog called Hanging Together. And as well as presentations at EDUCAUSE, at this meeting for CNI, user group meetings like the Pure International Conference and much more. And we're also going to be engaging the research library partnership in 2022 in discussions and webinars and much more. So the goals of this effort was really to differentiate U.S. research information management practices from those in Europe and throughout the rest of the world, because things are evolving differently here. We wanted to take a look at the variety of U.S. use cases and unite these under an umbrella sort of sensibility. And we also wanted to then synthesize that information so that we could offer recommendations to institutional leaders. Throughout all of this, we observed the central role of the library across a variety of use cases. We wanted to document that. And ultimately, we hope that our work will accelerate collaboration and innovation by seeding a vendor agnostic community of practice in the United States and perhaps across North America more broadly. So here are the two research reports that are resulting from this project. The first report in blue really is the synthesis and overall findings and recommendations. It's about 30 pages long and it's the short read. The white document is much longer because it provides the real details, the real evidence that informed the first part. So you can read in detail about all of the different case studies that we documented in this report. So let's move on just to talk a little bit about what we found with this report. We looked at these five institutions and we found that most institutions have multiple rim systems and they use these to support multiple uses. And you can also see that for two of these institutions, we just documented one system, but for Penn State, UCLA, and the University of Miami, we documented three systems each. So here we have in the red circles are all of the use cases that represent some sort of public portal that displays the work to support perhaps reputation management or expertise discovery. The dotted lines represent systems that are not fully deployed at this time, but that are in development. And as you can see, this is a sort of near universal use case across the US research institutions we studied. And you can see we also have three institutions that had systems that supported a second use case, which is faculty activity reporting. And with Virginia Tech, they have the same system that's being used both for the public portal, which is coming and for faculty activity reporting. Whereas at Penn State and UCLA, these are separate systems. And then finally, we also had two institutions that have systems that support some type of open access workflow in support of a campus open access policy. This is another way to look at the rim use cases. And so I mentioned already the public portal and the faculty activity reporting and the open access workflow, but there are three others that I want to mention. There's metadata reuse, which may be using an API to provide, you know, to reuse the information to show the publications of an apartment or a specific researcher on their website. And then we have a very specific type of metadata reuse, which is strategic reporting and decision support. It's much more than just reusing the data somewhere else. It requires much more granularity of the data, and it may also require special tools like Tableau to make sense of it. And then the last use case in the blue box is compliance monitoring. This is a use case that is really the core of systems in Europe. But it's something we saw just a bit of here in the United States, but that room systems are able to do. So in this slide, we have a different view of that landscape. And so you can see it's still positioned from left to right with faculty activity reporting and compliance monitoring on the right. And so you can see that compliance monitoring, we only had the one institution that was using this in a very limited way, but that you can again see that all the institutions are supporting multiple use cases that we're defining as RIM use cases. One last thing to point out is that under faculty activity reporting, all institutions do this. That's no surprise. They have faculty and they need to do annual reviews. But we only documented the practices at Penn State, Virginia Tech and UCLA. That's because at the time that we've been conducting this research, both Texas A&M and Miami currently have decentralized systems. And we just, there just didn't seem to be a large return on the investment of our time to document multiple systems. And so we chose to do, to document the faculty activity reporting sort of efforts at the institutions where those are centralized. So again, a major effort of our research is to understand RIM as multiple use cases and to combine them under this sort of umbrella. So research information management systems support the transparent aggregation, curation and utilization of data about institutional research activities. When you use this definition, you can see that all of these three letter acronyms like research networking system or faculty activity reporting system. Or even a four letter acronym like current research information system or Chris, they all fall underneath this umbrella. And furthermore, it also, here's a different way of looking at it is that it also unites a number of different products that hail from a number of different sectors serving the university community. With Elsevier pure and elements, you have sort of traditional Chris systems developed in Europe coming generally from the publishing sector. These are Chris systems in use worldwide. You also have systems on the bottom left like profiles and Vivo that are open source products seeded by the NIH, and that have their main goal is to serve as a public portal. You have products that are coming from HR or academic affairs, like digital measures or inner folio that specifically support faculty activity reporting workflows. They still capture a lot of this publication metadata. So meets that definition. And then we also have new entrance in this space, like the ex Libris Esploro platform, which has an institutional repository at its core. And at this point, I turn it over to Jan who's going to talk about the room system framework. Thanks Rebecca. So as we were writing up the case studies we found we needed a common language to describe the components of each system. You know it's one thing to say that a system licenses elements or pure or uses Vivo, but that's only part of what they do to keep their room system running. So we came up with this framework. It starts at the top with the data sources, and then it flows down through the data store to the consumers of the data that fulfill those use cases you just heard about. Now not every case study has or needs each component. And as we applied the framework we described what the system actually uses, not what the products they license happen to offer. So let's start at the top. We described three types of data sources, we considered the research outputs to be the core of the rim systems, and the bulk of those research outputs or publications. The rim systems use metadata and databases and indexes that they're either freely available or licensed by the institution. Now the other major source of data is local. Most systems use the institution's HR system to identify the employees that'll be included and their affiliations and other information about them that's kept at an institutional level. awarded grant data and courses taught those typically also come from existing internal data sources. And then there's local knowledge. Unfortunately, not all the data one might want in a room system is readily available in another local source. We're thinking here about information that individuals might enter for themselves, like research statements, as well as information about centers and institutions and who's affiliated with them. One of the institutions we studied had no documented organizational hierarchy that could just be pulled from a local data source. The rim system became the first complete documentation of those relationships. Next we'll move down to the data processing layer. Once the sources are identified they need to be transformed into a single database that can be used to meet the use cases. Moving the publications into a single database is often the biggest lift. There's no one database that has all publications for all researchers at an institution and persistent identifiers just aren't used as consistently as we might like. Also the publication databases themselves are subject to changing web services and structures. Most RIM systems rely on a publication harvester that polls candidate publications from one or more databases based on what's known about each of the authors. We identified for publication harvesters in the case studies, symplectic elements Elsevier pure ex Libras Esploro, and the profiles RNS author disambiguation engine. The preferred publication databases vary a lot by discipline. They don't vary by institution, but the same can't be said about HR systems grant tracking systems and core systems. Even institutions using the same software will implement it differently. And that's where ETL processes come in ETL stands for extract transform load. A lot of the institution's developers spend a lot of their time. You might have also heard them called crosswalks. Well, some institutions rely on data analysts to occasionally extract data from the local sources and then run queries and other processes to get the data into a form the room RIM system can consume automating these processes ensures that the RIM systems data is current and accurately reflects the institutions records. In other words, if you're planning a RIM system, we suggest you plan for an analyst and a developer to fit the internal metadata to its purpose. No matter how much you automate there's always going to be a need for a metadata editor. This is where you review everything that came in from those data sources through the processes above. And also where you enter that local or even individual level information that isn't stored anywhere else. And now we get to the data store. The data store might be part of a licensed product or a custom database developed hosted and maintained by the institution. The data transfer methods are the various methods used to extract data from the data store. In most cases, the system had a web service as well as tools for exporting and reporting. And some also have ways of querying the data store directly using SQL. The items in the bottom layer refer back to those RIM use cases. As we review some of the case studies, you'll see versions of this framework for each system we covered. If the use case is met by the system, it'll be listed on this layer. So we decided today that we would cover just two of our five case studies to give you a sense of how the systems look at different institutions. I'm going to start with Penn State. Penn State is a very large institution with 24 campuses across Pennsylvania and a total enrollment of around 90,000. We explored three different systems that meet different case studies. Elsevier Pure is managed by the research office and the medical college, including its library, they're heavily involved there. Activity Insight is managed by the library and the researcher Metadata Database is managed by the library but incorporates data from both. All together the three systems are used to meet almost all the RIM use cases we identified. So here's the RIM framework for Activity Insight. There's no publication harvester here at all. Individuals and their delegates enter their own information including importing from various databases, mostly for the purpose of faculty activity reporting. A team of library staff members also support faculty members both answering questions and also entering data from their CVs for them. The framework for Penn State's Pure system is mostly filled by Pure. The automatic scopus feed comes through the profile refinement service, which is a curation service from Elsevier. People can also import from other publication databases. And then down at the bottom of the framework, the portal is also a Pure component. The two systems together feed into the research metadata database, which is a very recent addition. Moving on to Virginia Tech, we see something a bit different. Virginia Tech is a public research intensive institution. They have about 37,000 students to give you a sense of size, relatively speaking. Virginia Tech relies primarily on a single licensed product for their RIM system elements. In an excellent example of both technical and social interoperability, the administrative responsibility for that system is shared among three entities. One thing we particularly noted, the publication metadata is seen as an institutional asset. Among other things, that means that the publication metadata is stored in a university data commons where it's governed centrally like other types of institutional data and available for budgeting and other institutional analysis purposes. Although it's made up of multiple pieces, they are connected under one umbrella to meet most of the use cases we identified. The public portal is scheduled to roll out sometime this fall. Faculty members can deposit open access versions of their publications to the institutional repository from elements and a robust open access workflow is under consideration. The RIM framework for elements at Virginia Tech shows the elements product as the publication harvester, the metadata editor and a data transfer method. Elements acts as a data store, but the data in elements is also fed into the university data commons, which then acts as the data source for strategic planning. And now I'll pass it back to Rebecca. In the course of our study, we also synthesized six recommendations for institutions to make the most of their RIM system. These recommendations are specifically for university leaders. So I'm going to quickly walk through these. The first one is that we encourage institutions to invest in institutional data curation. If you don't have good quality data that is complete, you can't use it for multiple uses. So it's really important. And this is an important role for the library in metadata curation. Secondly related to data quality is we encourage institutions to support the adoption of persistent identifiers. These can help support metadata harvesting at scale and ensure that the content coming into your RIM system is accurately matched to your researchers. Thirdly, we encourage institutions to not expect a turnkey system. A vendor may promise one to two months delivery, but that's if everything is okay with your own internal data. And what we often find is that those ETL processes can take a lot more time than anticipated. In fact, a lot of institutions, as Jan mentioned, find that it's not until they implement the RIM system that they have a full view of all of the units and hierarchies at their institutions and how they map to the researchers who are affiliated often with multiple units. We encourage cross functional teams to implement RIM systems. And with that, you also must invest in dedicated personnel. This is not something that should be managed with 10% of one person's time and 10% of another. You do need dedicated personnel because this is a very large and important undertaking for your campus. And then finally, institutions have long dedicated resources to supporting student information and including that in their enterprise data warehouse. We encourage the same enterprise data governance efforts related to research information, as was exemplified in the Virginia Tech case study. Through all of these recommendations, a theme is that libraries are essential partners in research information management. It's a key finding of our reports and we see libraries engaged in virtually all of the use cases that we documented. So as I conclude, I want to encourage you to pick up our reports. And you can go to this link, which will send you to a direct link to the publication themselves. With that, Jan and I conclude. And if you want to contact us, here's how you can do so. Thank you.