 Hello, my name is David Lewis. I'm coming to you from Lancaster, Pennsylvania, and I'm here today to talk about a project mapping the digital scholarly communications infrastructure. I'm speaking on behalf of my two colleagues, Mike Roy, the Dean of Libraries at Middlebury College and Catherine Skinner, the Executive Director at Education Copia, and this was the team that has been working on this project for the last couple of years. A little bit of background on where this project came from and how we got to where we are. It began with my little paper, the two and a half percent commitment and I circulated that to a number of my friends and colleagues and Mike Roy and I began a conversation and we decided that what we would be useful to do would to do a map of what the open scholarly infrastructure looked like. And we had an ambition of trying to create a map of that infrastructure that looked something like this map of the commercial infrastructure that Posada and Chen had put together. We were we were jealous of this accomplishment and we wanted to try to do something akin to this with open. We did a number of presentations and that led a couple of years ago to a grant from the Andrew W. Mellon Foundation to do a study of the digital scholarly infrastructure. This was not quite what we had in mind. We wanted to look both at infrastructure and content in an open environment, but the funding we got from Mellon. We've just looked at opens just looked at infrastructure and we looked at both the open infrastructure and the proprietary infrastructure. Both of which were provided on occasion by both commercial firms and not for profits. So this is the team that we put together shortly after we got the grant we brought in Catherine Skinner as a core member of the team. We put together a really nice and very talented advisory committee that we consulted with and a variety of points along with the project and the survey and visual visualization work that I'll talk about a little bit later was largely done by Nathan Brown and his firm true bearing. So in the beginning it's important to talk about what we mean by infrastructure. And this is a little bit tricky and in some ways about the way we thought about it in general was it was the tools services and systems that underpin scholarly communications. There are a couple of dimensions of this that that become a little bit tricky. There is a gray area between content that lives on a particular system and the system itself, and in many cases that's really hard to disentangle that. So things like hotty trust or archive. We tended to include some of those when they were very large and general. We also didn't look at discipline discipline specific infrastructure, but rather the things we included needed to have some general application across disciplines. And we looked at communications what that meant and we defined that very broadly from discovery to access to preservation. But it was not the things and tools that were used to create scholarship. So it's the communication of scholarship broadly defined, not the creation of scholarship so we then and also there's some gray area there as well. Tools used to write and bring data into articles we tended to include those. But if it was a data manipulation tool or digital humanities tool we left those out. We tried to create a higher a description of the different types of tools. And this is the one that we mostly worked with but this kind of categorization really needs some work and it's one of the things we'll recommend at the end but we started with researcher tools there's some writing tools collaboration tools. Repositories pre print servers leading into a variety of publishing tools both for monographs and journals. The whole discovery piece. Evaluation and assessments of various sorts preservation, and then there are a series of what we've called general services that overlay the whole system so things like orchid or. Things of that sort that make it all possible are also included. So the project had six goals, but the first of which was to create a census of the infrastructure providers, we, we thought it was very important to try to understand who was out there, what they were doing and how they all fit together in the to the best of our ability. So we tried to do a survey of that we called that the census of. So that was the first piece. The second piece had to do was a literature view we looked very diligently and across the web and and harvested a lot of literature about a variety of general issues and also about specific projects. In order to, and both of those look at the provider side. We also did a number of case studies of the providers to do some qualitative data that would enrich the numeric and quantitative stuff that we pulled out of the census. And that we wanted to look at was how particularly libraries invested in these providers to get a sense of who was making what kinds of investments and why. And we did this in a couple of ways we looked at focus groups we did a number of focus groups with library leaders, and we also did a survey of library investments in an attempt to capture the amount of money that was being invested by libraries of a number of sizes and types. And what things they were putting their money into in terms of supporting the infrastructure that they rely on. And then as I said at the beginning our ambition was to create this map of the digital scholarly infrastructure and ironically we accomplished at least to some degree all the first five things but the map. We have a draft that's very preliminary but that was the one piece that we didn't accomplish in the way that we had really hoped for. This is another way of looking at the things we tried to accomplish you can look at the library side of it and you can look at the infrastructure provider side of it and qualitative and quantitative views. So the focus groups the literature review the case studies were qualitative the survey and the census were web based surveys that attempted to collect mostly quantitative data. So I'm going to run through the five pieces that we really managed to accomplish. The first of those was the census of providers. This was a web based survey. It was done largely by true bearing and Catherine Skinner wrote it up. It was based on a tool that Jocopia had put together that looked at the organizational maturity and financial maturity of organizations and this is a fairly extensive document if you really interested I would encourage you to take a look at it. Catherine also wrote a blog post called running a queen red Queens race, which is her interpretation of the data and I would I would recommend that piece to you as well. In general, the findings of the census the first one I think is is pretty stark we only got 42 responses to the census after haranguing and harassing a number of providers for some time. So we were we were disappointed with that result. And we think it's pretty important but to try to continue this effort and I'll talk about that a little bit farther down. It's pretty clear that we need the taxonomy that I talked about a little bit at the beginning in order to get a better sense of how to think about and look at what's out there. The other thing that became very clear to us is that many of the providers were challenged to provide the data that we requested and a lot of what we were looking for was financial data and particularly open providers. Often they are project embedded in different organizations and or our grant funded or multi organizational projects and it's not easy often for them to bring together the information that we were requesting quickly enough to make filling out our census instrument worth their while. A lot of people got frustrated with that. And so I think that it's that effort needs to continue and it's a sign that many of the providers are their institution their organizations are not as mature or as robust as they ought to be. The data we were requesting really should be straightforward if you had a good annual report and a lot of people really didn't have that easy. I think that was easily at hand and many were especially small projects. Just the time to even do a very relatively short survey was was a challenge. Among the open providers who did manage to do this survey and actually many of those who did found it a really valuable exercise because it required them to pull data together that they had before. I think to expect many of them have a hard time raising and sustaining the level of funding that that's needed to really maintain the projects. It also became clear that many of the providers are in need of guidance mentorship training other kinds of opportunities in order to enhance their organizational and financial health. So again we have a lot of organizations that are providing important infrastructure that are not as robust as we would hope for the other thing that's quite clear and looking at this is that there is no coordinated or even uncoordinated end-to-end workflow in the open environment and that the varieties of technologies and strategies that the providers are using will make it a real challenge to put all of this together. And this is true at at a time when the commercial providers particularly the large publishing houses are working very diligently and with significant resources to try to put this end-to-end workflow in place. And so if you believe as we do that that open is important. This is a very dangerous sign for us. We really need to work on trying to figure out how to make this happen. So the second piece was the literature review and I did this work and I've called it a bibliographic scan of the digital scholarly communication infrastructure. It is an extensive bibliographic essay and that looks at both the literature of the deal with the general issues involved in this and then the literature that documents the activities of the sector. I identified just over 200 projects. About two-thirds of them were not for profit. Many of these were very small projects. And then I also identified 67 commercial projects in a variety of firms. It's insightful, I think, that many of there are relatively small number of organizations that provide a significant number of the projects both on the commercial and on the not-for-profit side. So if you look at organizations like the Space Public Knowledge Project, they have a variety of projects involved and on the commercial side the big firms do as well. So this is as close as we get to a map. This shows the sort of the workflow is the yellow arrow in the middle running from sort of creation just off the screen all the way through to preservation and assessment. You can see the number of different projects in each of those areas. I think it's interesting if you look at the researcher tools. The ones that are really significant, particularly around collaboration are controlled by large commercial firms. The discovery layer is also the important tools are managed by large commercial firms, but in this case the firms are not part of the scholarly communications environment. They aren't really their Google Scholar and Microsoft and the ones that really matter are from big firms that are sort of outside our sector. In the repository sector and in the publishing sector there are a large number of not-for-profit open alternatives. One might say that there are really maybe in some cases too many and that there are redundancies in there that are unnecessary and that we maybe need to look at sunsetting some of those projects or leading them out, although how that happens is a really difficult way to think about it. There are a variety of preservation strategies, many of them open although probably the most important is a commercial firm. And then when we get to assessment, again, the large commercial firms dominate, particularly in the CRISP systems. And so at both ends of the research workflow that are dominated by commercial firms in the middle, there are many good open alternatives, but those are not coordinated in a way that you would be able to piece together a consistent workflow easily. So the next piece has to do with case studies. These were done by Katharine Skinner. You can see the four firms that she looked at. Again, we have a fairly nice publication that brings these all together. I would encourage you to take a look at them if you have interest in any of these projects. There also are a series of case studies that have just been released by Spark Europe, and I would encourage you to look at those as well, as well as some conversations that are similar in nature that the invest in open infrastructure group has recently released. So there are a variety of case studies beyond what we did that can give you a feel for the particularly the open side of the infrastructure system. The next piece I want to talk about are the library focus groups. We did a series of groups we did some at ALA in the summer of 19 and at ARL in September at CNI last November and we did a series of virtual sessions in January and February of this year. The majority were from large research universities, I think, mostly because we were at ARL and then a smattering of other kinds of libraries. We asked them how much do they invest, where they invest and why they invest, as well as what the challenges and opportunities they were from where they sit. I think the most striking finding that we had here in the focus groups was that often people really have a hard time sorting out how much money they invest in the infrastructure, particularly the open infrastructure. And these were primarily library directors. So unlike collections where the definition of what ought to be counted where are pretty clear and you could ask most library directors how much they spend on collections or staffing and they would be able to give you a number off the top of their head. Almost immediately and this was not the case with open. They really didn't know how to, they really hadn't done that exercise before and so they often didn't didn't have a number at the tip of their tongue. And which I think is really important and it's an effort that I think would be important for library organizations that think about statistics to start to define these things so that we really have a better picture of how much investment is going in which directions. When we asked them about some why they invested. A lot of the answers for what you would expect they wanted to be part of a community, particularly if it was a tool they were using that they wanted to influence. Sometimes the investment that got them a seat at the government's table table was useful. Interestingly, a lot of people said that it's the keep up with the Joneses or I trust my friends at a comparable institution and she invests so unprepared to do it because I trust her judgment, not so much that I've done the assessment but that it's what everybody else is doing. And a lot of people admitted that they really didn't have a good sense of whether what the trade offs were and whether or not they were making good, good investments. When we asked about the factors we got the kinds of analysis when they were carefully looking in some cases that you would expect about privacy costs, exit strategies that kind of thing. There was a major concern about the sustainability of the system and as I've talked about when we talked about the census. That's really justifiable. I mean the sustainability of the whole system is at risk I think and not as robust as we would hope. And there was a frustration with the funding model, which was, you know, an organization that a tool we use comes to us and says, give us five to $20,000. And there's no overall strategy and there's no clear way of creating an overall strategy and for investments that would invest in the whole system. And there was a sense that it and often this was based on a campus, rather than a library perspective that you really needed to invest in strategies that were were quote unquote winners or that had a sustainable strategy and often those were commercial players so be pressed might be a better investment even though you hate the idea of doing it because your computer center says you need something you can trust for the long haul technical glitch here. So when we looked at the live. So, excuse me here. So we, the next piece is the library survey. Sorry. Okay, here we go. The library survey again was a web based tool. We asked about investments, numeric investments in particular tools which we had classified, and then some other data on staffing. We got 91 responses, two thirds were large research universities mostly about a quarter were small liberal arts colleges and then a smattering of the other types. A couple of this was obviously a very low response rate considering the number of institutions, and primarily we were focusing focusing on the US. It occurs, it appears to us that there was a bias towards people who were invested in open. And often that the data was incomplete and again I think this is the issue that often the people responding to survey had a hard time pulling the data together. So here is some of the data that we were got got here you can see that there was 14 million dollars of investment by these 91 libraries that's about 150,000 a piece, one and a half percent of the library budget and two and a half percent if you take off of salaries, and that was $8 a student. We based our survey I should say, in large part on a survey that the Canadian Association of Research Libraries have done and I would strongly recommend their survey if you're really interested look at their survey if you're looking if you're interested in that data. Here's another couple of ways of looking at it, a large portion of the return 10 million over 10 million of the 14 million was in staffing so less than $4 million actually left the campus. So that's a relatively small investments in infrastructure providers, the majority of that was for hosted repository solutions, and then you can can see that down the way. This is another interesting way of looking at it you can see the graphs here the, the person, the higher up you are the more as a larger percentage of the investments you're making. As a library, you can see the one really high large library and that's a university that supports a very large project so they make a significant investment in it. The majority of the respondents regardless of size invested less than 2% and there's a great deal of free writing. Often, we asked about which projects they used and often people would use a project but not make an investment in it. So that's the work that we did. I think it's probably important to indicate that we did all of this before the pandemic pandemic set in so our data is all based on that and maybe subject to change as a result of that. We have a series of recommendations that we made and the first three are the most important, we think that continued efforts to try to get a survey of the open, particularly open providers, so that we can get a picture of what that universe looks like and can begin to think about it as a coherent whole rather than bits and pieces. It's also clear that a variety of strategies that work to enhance the organizational and financial robustness of that community of providers is important, whether it's a community of practice or other kinds of work. We think that's pretty important and we think that the library survey ought to be continued in some way, whether that's groups like ARL or other library organizations that might try to collect that data. It would be useful if all libraries did it but if we could get even some large consortiums or groups to work on it, they could get some of the kinks out of it. The other thing is a sort of annual report of the sector would probably be useful, case studies continue to be useful. Some idea about working with librarians to give them a better sense of what is going on would probably be important as well. So we have a variety of reports and resources. Catherine Skinner's done a couple of these. I did the bibliographic scan. We have a couple of blog posts I would recommend as I said before the second one here Catherine's Reds Queen is pretty good. A couple of other resources that are part that you might want to look at and I put a PDF of this presentation up at this tiny URL so you can then use the links to get to this. So if you have any questions about our project. Any of us would be happy to hear from you by email. So thank you. Sorry for the little glitches going back and forth couldn't quite my screen to work. But again, thank you very much.