 in the 21st century, the role of information specialists and the global network of information resources curating the scholarly records. So, I'd love to get you at your take on how this sounds. Are we going in the right direction? Are the things missing? Well, I've not read the entire plan, but just the kind of bulleted material that you just highlighted here. So, I guess the first thing I should say is that I'm really delighted to see an institutional strategic plan that puts this much focus on this collection of activities and the library's role in it. And I think calling out these activities as well as the library's role is very important. It seems to me that the practice of scholarship and the way it's transmitted and documented are changing pretty radically. And I think many institutions have been slow to realize how extensive the implications of those changes are going to be. And that includes coming up with institutional strategies. And genuinely, they are institutional strategies, not library strategies for things like research data management. I mean, the library is an instrument of the institution strategy, but it's something that really needs to be absorbed at a high level in the institutions, budgeting and setting up priorities and everything else. So, I was very happy to see these kinds of things come in center stage. The other thing that struck me as really significant was the explicit recognition in, I forget which probably the third bullet there, that Carnegie Mellon's library would be part of a network of peer research libraries that would collectively manage a record of information and make it accessible. And I think that's a very, very significant thing to see. Again, recognize not at the level of the library director, library directors and their teams have for years been working with networks of peers collectively, but to see at the institutional leadership level recognize that we are not in a situation where any single library can successfully stand and function and do what it needs alone is really important. So, I think that those kinds of observations called out in this strategic plan really bode well for the future of the library and for getting support for what needs to be done in the next decades. One of the strikingly difficult arguments to make as a library, let's say, an institution, a setting is you do things as a research library that benefit the scholarly community as a whole, including your institution, but the benefits don't always accrue directly to your institution. There's a sort of a circle of gift phenomenon here where everybody is stronger by contributing to the circle of gifts, but that's been a very hard sell sometimes to administrations who want to insist every penny is going to our director benefit. And it's nice not to see that kind of thinking here, because in the long run, I think it will serve you better to think more strategically there. That last point is an interesting one in the context of open access, and we see a lot of debate in the community about moving towards a Gold Open Access model where the author or the funder or the institution pays to publish rather than paying to purchase access. And I wonder there about the shift in the burden of cost, because if we take a hypothetical journal bundle where the costs are distributed fairly evenly across thousands of universities around the world and flip that to an author pays model, we'll see the burden of cost be borne disproportionately by research intensive universities. I've been working with the society in the US where Carnegie Mellon is the biggest contributing institution. If we run the numbers from a subscription model, we pay about 20,000 a year. If we were to go to an author payment model, Carnegie Mellon's cost would be about 300,000. So yes, it's nice to have that notion of the public good and so on. But do we risk really undoing a system that wants to far from perfect at least levels the costs, or should we go for transformation? Well, I think that's a very interesting issue specifically in the context of open access. I mean, the kinds of things I were driving at before really go much more broadly than traditional open access and deal with the sort of collective stewardship of material that is of scholarly importance but not heavily used and that kind of thing. The open access situation, and particularly the shift over to author-funded gold access, is really scary. And I've seen some of the kinds of numbers you're referring to where universities look at the scenario of, well, we are a major producer of research here and a major source of articles for the scholarly literature. So as we go to author pays, it places a disproportionate burden on us. The University of California, for example, has done some projections there that are kind of disturbing. I've also had some conversations with the folks in the UK who are, you know, sliding around to this. So I think there are a number of responses. One is that I'm not sure that author pays is really a great model for open access. I think that going more directly to the funding of journals, perhaps by funders, perhaps by coalitions of universities, may be a better strategy and a fairer strategy. It also gets a lot of the transactional stuff out of the mix, which I'm always eager to avoid. It would help to deal with some of the problems who so-called predatory journals and take a bit of the burden off of faculty authors who are trying to sort through whether new journal starts or legitimate. Now, I can also see a whole series of downside problems and dangers in that model about potentially restricting or constraining the start of new journals, which is a mixed blessing and other things. I think we really need to talk about those sorts of things analytically. Again, when you come back to this working within a peer network, that's the group of people that need to sort these. Just having the publishers pop up and say, you can ransom your articles isn't necessarily the right thing. There are a number of very well established open to access venues that don't take author fees. Those are wonderful. You sort of wonder how they live along on grants and the personal commitment of the scholars who run them and things like that. But there are existence proofs that those can operate. I also think that getting to a purely author side funded model is maybe going to be harder than you think. These hybrid journals and the rate at which they flip it from subscription funded to author funded or fee funded is very problematic. Right now, there's at least some evidence that there are folks doing very well working both sides of the equation. Yeah, I think there are a number of interesting threads. There's a data point from the UK. They're spending about £40 million a year on author fees and the rough calculation of the administrative costs across the system are about £7 million when you add up institutional administrators and so on. So it's not a cheap system. When the government brought it in, they said, we'll fund this for five years, which will give libraries time to transition and cut back on their subscriptions because authors will be paying. But they had neglected was that the UK represents about 7% of the global scholarly output and UK researchers still wanted to read the other 93%. Therefore, library subscriptions were never going to be reduced, which makes me believe that the solution, if there is to be a solution, is to be found in some concerted global action because no country, even not in the US, has the dominant publishing power that can overcome that national system. How do we get the world's research funders together? Or is it to say that the E-Life model needs to be pursued more aggressively? Well, I mean, I think that it's a remarkable oversight that some of these national initiatives towards open access did not realize very explicitly and very early that in the majority of cases, scholarly publishing is an international enterprise and is not fully dominated by any single nation. You probably could put together a coalition of not a huge number of nations that would be adequate to turn the tide, but it would take genuine coordination between the countries. I know that inside Europe there are some discussions at the European Union level about harmonizing or coordinating open access policies. The UK through the GISC is in there, the folks in the Netherlands are in there, the Max Planck Institute is active in there representing the German interest. But CNI, for example, pulls together a small meeting every other summer with some of the UK academic leadership through our partnership with the GISC. And this was very much on the agenda when we met in the summer of 2014. But the problem that folks over there have is they can kind of bring three people to the table and say, well, we can't exactly speak for the UK, but we sort of can. And you say, well, who comes to the table from the US? OSTP, we've seen how varied the agency interpretations of the OSTP directive are here for public access, which hasn't been the most helpful thing in the world. The private foundations, there are too many of them, they're too diffuse. So there's a great problem because I can't envision any international coalition that would have enough weight in the scholar publishing arena without the US there. They are clearly a large player. But it's not obvious who should speak for the US. And there's also the financial aspect which we can't ignore completely. And I know that our president, when he was head of the NSF, was asked by the research councils from the UK to form a global alliance. But the cost to the US science system for the NSF alone was going to be about $700 million. And I just can't see the scientific community giving up that amount of research grant. And I haven't heard anything in any of the presidential debates where they're saying, let's forget about building walls and sort of put money into the scientific enterprise. One could only have. Maybe the publishing you never know. The other piece of this that I think gets a little bit too much of a pass is the actual magnitude of the author costs. It's not obvious to me why some of these should be as expensive as they are. Which is a segue into the other player in all of this, which often gets ignored, which is the professional or learned societies. And in my time with Wiley, I really was astonished by the power that the societies have to drive the costs of journal subscriptions because they're in a very lucrative market. Publishers want to publish titles on behalf of the best societies. The societies see this as a cash cow. And it becomes a cycle of income from publishers to societies driven by author fees. I recognize that societies do a lot of good with the money they receive funding, are the career researchers and so on. Do they have a stake in the system that needs to be addressed somehow? Well, you know, we do see a lot of cross subsidy. Some of it really strange where you started out with a society that if you looked at its finances in 1940, basically was taking on most of its revenue from member dues and a little revenue from conference that typically maybe made a little bit better than breakeven. And a little bit from library subscriptions, which back then were about the same as individual subscriptions. And now you see these tremendously distorted revenue streams where libraries are a huge part of it because the library subscription rates are in order of magnitude more than the individual subscription rates. And that's triggered a vicious cycle where memberships eroding in a lot of these individual memberships because they say, well, the only thing I get from the society anyway besides a bill is the journal. And now I can get that through my institution. And then you see the societies, as you say, doing unquestionably good kinds of things, subsidizing early career people, high school students to move into or get exposed to the disciplines, advocacy of various kinds. All good, but I'm not sure it should be coming out of library budgets, which is de facto what's happening now. There's a certain entitlement mentality at some of these societies that sort of says, well, we can just keep raising our rates and don't need to recalibrate what we're doing. I mean, I remember just to name and shame the AAAS, which publishes science. I did a stint some years ago as the president, elect president, past president of Section T, which is computing and information science. And that permits you to go to a several hour presentation on Sunday morning, where basically the society leadership talks about how the society is doing the strategy. There's a board, which is separate and does really strategic things, but this is a kind of reporting out to the membership in an intermediated way. And they had these wonderful charts up about the subscriptions to science and the revenue stream. And science is very unusual because they actually make a meaningful revenue stream off of advertising as well, except that it turns out that advertising is much less valuable online, at least for that marketplace. So as more and more of their viewing moved online and their paper subscriptions right up, their advertising was concomitantly diminishing. So they said, oh, well, no problem. We know exactly what to do. We expect the, you know, the net revenue from science to stay constant or go up a few percent a year. And we're just going to raise the subscription rates on the universities till it does. Simple. And I think you see a lot of that kind of, you know, financial strategy. They feel they've got a thing people can't live without. And they're just going to do that until the system really comes in blue, which it will. And I think science and nature have, at least in some parts of the world, the benefit of being a distinctive component of the Shanghai Chow Tong University rankings. The more you publish in those two journals, the greater your position in the ranking system. And I remember when I was in Australia, nature putting up the subscription 23% from one year to the next. And we thought about having a bit of a camp here on campus against this threatening to cancel. And we were very firmly told that nature and science were sacrosanct because they were such critical components of our institutional reputation. And I mean, there are questions that I've not seen asked much to both of those are these sort of sacrosanct things that are just viewed as so high impact that they've got to be accommodated. And, you know, in fact, careers ride on getting a nature paper or a science paper accepted. It's very odd at the same time that these are about that these are among the very rather few journals I can think of that come with a whole news apparatus around them to ensure that they continue to be high impact and you know, maybe maybe there's a lesson there for other journals. Maybe there's something there we ought to think about and we're designing the system as I either it should be more common or less common. I'm not sure which but it is an odd, you know, sort of they're they're both oddly dual function. So another another couple of strands here that I'd like us to talk about one is the converse to the the author case model and have a bit of a conversation about repositories. And secondly, something about arts and the humanities, because we're very much focused on the scientific article, which on a campus like this has its place on an account like this, we also have tremendous interest in the arts and humanities, which would you like to go with first? Oh, why don't why don't we finish? Why don't we start with repositories and go around from there? So I guess there are a number of interesting aspects that you've been writing and talking about repositories for at least a decade. So thinking back to the early 2000s when momentum was picking up. Do you get a sense that we're moving in the direction that was intended or have publishers been able to impede progress? Have institutional mandates helped or hindered? So so let me say a few things about this and let me also point at this very nice book that has just come out. And it's a good kind of compendium of what's happening right now. And Daniel can tell you much more about it. So we'll leave that for for later. But I think it's there are a couple of things I want to say here. So I wrote this article back around 2002 or 2003 about repositories, which I guess was kind of in the right place at the right time along with a couple of other articles. So people people connect back to it. But it's really interesting because to me, because there really are two quite distinct visions of what institutional repositories are about that go back to that period. And they get, you know, kind of alighted and cross referenced in very interesting ways. So there was there's one view that basically says primarily institutional repositories are in the service of open access. And the reason you have them is to allow your faculty to make their publications available to the public. And from this comes all the kind of ideas that you hear from people like Stephen Harnad about authors self archiving. And there's some very nice things about that model. One of the really nice things about the model that that view of the world is that you can you can measure success and you can measure impact fairly easily. You can measure your success rate by doing estimates, which are reasonably straightforward to do about how many articles the faculty and students that have given institution are publishing per year. And then you can look at what proportion of them are represented in the repository. And you can do various things, funder mandates, faculty policy mandates, etc. to up those numbers over time out just regular old outreach. You can measure impact by looking at how often those articles are getting downloaded by doing various kinds of metrics on them. This is this is a pretty tidy world actually. It is the final thing I'll say about that view of the world is that at least in the States the publishers have not been I'm sorry the funding agencies have not been very good about this. We spent a decade talking up the idea of open access to faculty, which and I think it's it's the right idea. You know that the fundamental notion that the products of research should be open to the public and to other researchers and that's how you how you move scholarship along and move society along. But the kind of mechanisms that we mostly recommended to faculty, publishing and open access journals, putting things in institutional repositories, actually don't meet the funder requirements right now. So we're in this insane position where we have to go to our faculty and say well we've been telling you about open access and you got on board and we've all been doing the right things and now your funders are on board too except that they have a whole different list of things that you need to do in order to satisfy the funder requirements for public access. And personally that feels horribly awkward to me and I think it's going to be a source of great confusion and problems over the next five years and I'm very disappointed that we haven't been able to put more pressure on OSTP and the funding agencies to rationalize this better. There are some technical solutions as we get better at doing cross repository propagation and things like that. One can envision the notion that if the faculty just put it in the institutional repository we can build and connect systems that do the right things on their behalf. But that actually is a non-trivial collection of intersystem linkages and it's not particularly facilitated by the fact that intersystem linkages work best when all the systems want to interoperate. So anyway that's one view of repositories which puts them as essentially infrastructure for the open access movement. There's another view of repositories and it's the one that I espoused in the early 2000s and honestly still espoused which is that in a world where scholarship is becoming more digital and the artifacts of this aren't going to fit neatly into templates like journal articles and where universities which have great universities have always been intellectual centers. If you think of the number of events, performances, symposia, lectures that go on at a place like this every day this is very significant. It's typically not well represented in the traditional scholarly publishing streams and it's getting easier and easier to document to persist through various forms of recording and capture. All of this kind of material, all the institutional material, all of the work of individual scholars that shouldn't and can't necessarily be constrained to the simple templates of printed monographs and traditional journal articles needs to be cared for. It needs a place where we can put it and take care of it and share it and that was my view of institutional repositories was as platforms for experts at the institution to work together with faculty to take care of this material, organize it, preserve it and make it public and sure as an incidental matter that should include all the articles and monographs and indeed maybe rough drafts and the stuff that wound up getting cut out of the monographs because they needed to keep it to 500 pages and really there's more scholarship under there that is valuable and should be available to the people who want it. All kinds of things like that to my view belong in the repository and I continue to believe that they're important most of all in that role. If you just want to do pre-prints or e-prints or whatever you want to call them it's probably easier and cheaper to do that on a disciplinary basis. I doubt that most universities can drive down costs as low as things like the archive that Paul Diansbar runs at Cornell or well I was going to say PubMed but that's expensive but there are also reasons why that's expensive but it just seems like if you just want to aggregate e-prints there probably are more efficient ways to do it. So while that's a good thing to do with the repository it's not why I build one. Now I've just I've gone on a long time about this but it's a sort of story that needs telling. And I'll just note two things to more points. So unlike the open access vision of repositories one of the troubles with the proposal that I'm arguing for is that it's hard to tell how well you're doing or how much impact you have. How much material is in there relative to how much should be in there or how much is out there that you might collect. Those are very speculative numbers when you move away from these kind of transactional publications that we have been understanding and documenting and what's the impact of this material. Very hard to know in some cases. It's very anecdotal. Computing things that I have to say I consider a little dubious anyway like impact factors and things gets we don't know how to do that at this point. We really mostly know raw numbers about downloads and anecdotes about impact. Some of the raw numbers about downloads are really weird too by the way. I have seen for example numbers out of places like Virginia Tech when they started putting master's theses online that are astounding. I mean nobody reads master's theses. Certainly nobody downloads it 30,000 times in a month. So there are things going on here that we just absolutely don't understand and they may be as banal as robots that need a little programming help or there may be something deeper going on. But really understanding the impact of institutional repositories in the broad sense that I'm describing them is a generation program. I mean we're 10 years in in running these things at this point. It's probably time to do some surveys probably in another 10 years it's time to start looking more seriously at what's in there and in particular what's been preserved because those were there that would likely not have been preserved otherwise. We actually need to go long enough down this path to see a significant number for example of faculty retirements and see whether some of the scholarship can transcend the active professional life of the faculty that created it because of this infrastructure. So that's that those are some of the things I think about there and you know one of the one of the great mysteries to me is how little effort is put into dealing with faculty towards the end of their scholarly careers. Yes you know their papers get holed off to the archives and things like that but many of them have amassed tremendous amounts of unpublished material of underlying data of things like that and I'm trying to figure out ways to organize that and make sure it's not just totally lost. It seems to me to be a very fruitful and relatively inexpensive activity and one of one not the but one of a set of strategies that should be put in place around repositories. I think as you signaled in the early days people thought of repositories as containers or buckets of journal articles and now we see that soloist being surrounded by an orchestra of other products of research data, executable content, images, videos of performances, music scores and so on and all of this becomes increasingly complex but also increasingly expensive to curate and I wonder what your thoughts are about that that cost versus possibility of reuse and do you is it better to err on the side of bearing the cost up front just in case or trying to be more selective in what we curate so that we're not sucking up money just because we can. So I think that is one of the central challenges right now that libraries and archives and research data managers for example are facing. There are different nuances of it depending on whether you situate it in the context of say research data management or something else but this is a really hard problem that I would like to see really addressed much more head on so there is one view that says that if you haven't done a thoroughly exquisite documentation and attachment of comprehensive metadata you might not have bothered might as well not have bothered in the first place. I would speculate that we have a very poor understanding of the interactions between the existence of certain kinds of metadata and reuse in various sorts of scenarios and that there is an awful lot of mythology there that isn't worn out. There are bits of evidence that surface that are quite disturbing and I can come back to a couple of those in a minute but when we talk about sort of mid-term preservation and stewardship let's say think in terms of keeping something alive for 40 or 50 years not 200 but 30 40 something like that a bit beyond the time when the creator of the material is likely to be around to help with lapses in documentation typically somewhere between half and a good deal more than half of the costs of that preservation activity or the initial ingest and documentation and the attachment of metadata and things like that so that means you are really spending a lot of money on the hope that it might be reused. It would be much better I think particularly given the dubious connection between documentation and reuse to be able to save more and the way to do that is to reduce that kind of initial barrier to intake so I think being smart about that in various ways and some of that is not collecting metadata that's expensive to collect and that you're not sure you need automatically picking up as much metadata as you can and I think that we can do much better in the design of scientific experiments in our use of notebooks and smart instrumentation and things like that to pick up a lot of things just as byproducts of doing the work rather than coming back later as an explicit extra and sort of redundant step I think they're very fruitful things to do there. Right now we have this tremendous problem when we think about data reuse about the discovery side of it. We don't really know how much of the data that gets reused is discovered sort of totally from scratch by somebody rummaging around in a database as opposed to someone who says I want to build on these two studies and I'm going to contact the two authors and get their data and maybe even bring them in as collaborators if they're interested and build on from there or you know this is a this is a sort of a benchmark data set that's well-known in the literature or even I know that Professor Jones back in the 1990s studied this stuff extensively and I never knew Professor Jones and he's been dead ten years but if I start looking under Professor Jones in the archive I can start figuring out what's in there and whether it's relevant which is a very different discovery mechanism than you know specifying the attributes of the data you want. We don't know what goes on there. We do know that it is exceedingly difficult to build systems that do precision discovery among very heterogeneous things. So you know if you are interested in fossilized dinosaur fingernails that are at least three inches long and you're looking in a database that mixes up paintings and weather observations from you know the British Navy in the 1870s and things like that you probably can't formulate that query very well and you probably won't get a very good answer back. These kinds of you know enormous data repositories I think we've hyped the we've hyped the sophistication of the discovery capabilities we create quite a bit so I'd be careful there. I also think that we need to we need to accept the progress and the vector of progress about content-based retrieval for large classes of material. So for example one of the ways you can spend a lot of money is attaching subject headings to records describing things or subject description. Done very well for example by the kind of folks who write abstracts for high impact stuff like medical journals. This can be quite useful but it's expensive. Now doing this dates from a day when it was hard to get the underlying material and so you wanted a circuit for it. If you've got the whole underlying text to compute on I would argue that there is a very significant class of material where the investment in this kind of secondary description probably doesn't make sense economically but we've been very slow to let that go and it's been very painful. Every year our ability to compute on text corpora gets better. What we can do searching on Google now you know with unthinkable 20 years ago when those texts were starting to appear. We're getting really good at this and we're starting to be able to do other hard things like voice recognition and voice to text kinds of discovery mechanisms, image recognition, facial recognition. Some of the things that you know we're able to do are untidy but there's still things that can be brought into the service of discovery and as long as they're computationally driven they're probably cheap and going to get cheaper. So I think that you know postponing that kind of human intellectual analysis till you're sure it's justified is probably a good strategy. That may be a nice branch into a topic that C&I had an executive round table on last year which was digital humanities and be interested to hear your thoughts on how libraries in particular can support digital humanities both in a sustainable way and at the scale that the technology is enabling. Okay so I guess the first thing I should just say is how much I hate the term digital humanities because it's so misleading it makes it sound like it's not humanities or something different. What I think of is basically this is humanistic inquiry that is conducted at least in part using digital tools of one sort and another and sometimes has some of its outcomes communicated using various kinds of digital mechanisms, digital media mechanisms. When you cast it like that even some of the most hardcore anti-digital humanities people have in fact been practicing digital humanities for a long time you know some of them use word processors some of them search online catalogs for books or even use Google when they think nobody's looking. It's very funny how this works but I think when you cast it that way what we really want to do about digital humanities at scale is to make it possible for as many of the humanists at our institutions who want to to be able to employ digital tools and resources that are beneficial in their work and as part of their working methods that's really what you're after and that's been hard to do. We've had terrible problems from the IT side partially because these tools are not well polished in many cases they're specialized and you see this problem in the sciences too but they seem to get enough money in most of their grants to be able to hire people who make recalcitrant software behave you know so you look at the physicists putting up something like lobus I mean this is not a really you wouldn't want to put this on the consumer market and have your grandmother try and set it up on her machine. The the humanists have had less success attracting grants that can attract that level of hand-holding many of them are relying on institutional support and we've had a lot of trouble getting the right mix of institutional support and tools that we can afford to support into those humanities venues. I think actually we've had more success than we give ourselves credit for in some cases ranging from word processors to various kinds of databases I mean JSTOR is an amazing success in that setting heavily used and adopted and really not questioned at all. I think that the biggest issue in some ways for digital humanities is on the output or result side and this is the one where the libraries can make all the difference. If you look at the achievements of pioneers in digital humanities over let's say the last two decades they have produced an astounding array of outputs of various kinds most of which are technically very very hard to sustain or preserve and the digital the people who work in this area are starting to realize that gosh I've got a real dilemma here do I want to spend my professional life making discoveries and seeking understanding and to communicate understanding with the results of that work kind of quietly evaporating in a few years whereas my colleagues who are still writing traditional print monographs even if we pass them around as PDFs can be quite confident that their work will persist for 200 years just like the people 200 years ago who wrote monographs and you know yes the the digital environment has its amenities and its power and its attractions but that that ability to have confidence that your work will stay around is really problematic I think um libraries collectively need to step up to this for a lot of reasons most of the scholarly publishing world and the humanities isn't going to do it or isn't going to do it without a huge push and engagement from libraries as partners I think yeah I think we're seeing in the scientific field increasingly compound artifacts of publications the raw article with data embedded I'm not seeing that trend happening in the humanities and that's where I see almost that separation between the library and the librarian with the librarian bringing that information expertise into the research process not because the researcher couldn't do it but we can hopefully do it more efficiently more effectively and add that layer of integrity to the scholarly record that as you say might otherwise be bypassed I think the other thing that that that's really important here is about and this will sound strange constraining choices in many cases because so much of the sciences comes out in journal articles as they start building these compound things they still want to often deal with them in the context of journals and so the journals and the editorial practices of the journals even if there are things like well you got to put your gene sequences in gen bank you got to get an accession number the accession number goes to the article and that's how we do it but those editorial practices constrain and make uniform behavior around scientific communication one of the really interesting experiences I had some years ago was with a melon project called Gutenberg I don't know if any of you are familiar with this but the the sort of one minute story is that they picked recent PhDs assistant professors folks like that who were basically starting to take their PhD dissertations and move it to the first humanities monograph as is the kind of traditional pattern and they tried to cluster these topically they ran this for about four years in neglected areas in some cases I think they they did some military history and diplomatic history one a year and anyway so they probably gave out about 20 25 of these grants and this included a guaranteed publication digital publication stream with the university press I think it was Columbia that did most of them but one or two were placed elsewhere and they got a bunch of money to help with this with technical support for it so this was actually you know quite a feather in the cap of these assistant professors to get recognized by melon on a you know sort of first monograph and the deal here was that they had to produce an electronic version of the monograph they could also do a print one but they had to lead so I was able to sit in on the discussions where they brought these awardees together every six months or so to talk about the problems they were encountering and how they were doing and show you know show some of their work to each other as they went along and this was a disaster in a certain way because basically you took a bunch of humanists who were interested in the research they're doing and said hi you know here's here's $20,000 or $50,000 now what we really want you to do is totally reconceptualize the nature of the scholarly monograph as it specializes to your work in the digital world you have a blank slate isn't that wonderful and for most of them this wasn't wonderful at all they really didn't want to spend a career thinking about what does it mean to you know totally reconceptualize the monograph in the digital world they had some specific sort of digital affordances that they wanted to be able to exploit in documenting their research and in a lot of cases it was relatively mundane it was making better connections and navigation between underlying source material they came out of archives or you know old manuscripts or something in the exposition and analysis in a few cases it was really interesting I mean I remember one guy who was doing a very detailed unit history on the eastern front in World War II and he was you know tracking the movements of small groups of military formations day by day as they interacted and advanced and retreated and the treatment he could do of that through maps animated maps was really quite astounding and at least to my naive I hugely enriched the accessibility of the of the exposition but most of them really what they wanted was it was templated kind of things so that they could use this they didn't want you know a total clean slate and the you know the sort of ultimate final insult here was when they got done the people at the press and the library sort of looked at this and went this is a freak show of things we've now got you know 25 totally idiosyncratic works that are going to cost an incomprehensible amount of pain to preserve for 20 or 30 years because each one is its own little odd universe and that's that's what we've got to get digital humanities passed I mean there are a few people who want to explore that frontier quite explicitly and we should let them do that understanding that's risky ground and you know just like the people exploring digital art there are some things that may not survive for all that long and that's the price you pay but for the you know vast majority of humanities scholars we've got to give them a straightforward pathway to communicate and have their works preserved I think so we've talked quite a bit I think it's time for the audience to get some of it oh yeah David has got a microphone whilst you're looking for the first question David you might want to do a quick plug for your book yes please oh sure wasn't planning on doing that but if I'm gonna buy a copy so I can get a picture with Cliff at the end to do this and all by any means but the book that Cliff was talking about here with the introduction to our making institutional repositories work it's a book I co-edited with my colleagues at Clemson University and the College of Charleston it's a part of the Charleston conference series it's available online through the Purdue University Press and it will be made open access fully within the first year of publication so soon to be coming out purely open access hence the orange cover so please raise your hand if you have any questions I'll be walking around with the microphone and we will be capturing the questions and the comments for the recording so please just feel free to raise your hand I guess Cliff I'm not a librarian so I don't know what all the librarians know but at some point in I forgot exactly where you were in saying this but you said something about capturing as an alternative model capturing all the publications from a university by all the grad students and researchers and PhDs and all those how does that happen how do you how can you search and know what's being produced at Carnegie Mellon at Columbia wherever how does that happen so so let's let's differentiate back to the two views so one view is capturing faculty publication as it's taking place through the you know kind of well-established venues of journals and um uh monographs and conference procedures and representing that in the repository now it turns out that you've got a couple of windows into that some of the major bibliographic databases like um what used to be current contents and it's now called web of science carry institutional affiliations so you can in fact do searches on there that will try and turn up material by authors with specific institutional affiliations and in fact some institutions actually contract with some of these services to get extracts every year that represent an attempt to capture their look at their faculty's publication collective publication output now there are obviously limits to this so this works best in fairly well-recognized scholarly journals you of course have faculty who are writing important things in trade journals in op-eds to the new times you name it and you'll miss most of that stuff but you will get a really you know a reasonable cut at the scholarly output um monographs are much harder because um i don't really know of a central resource there are some um disciplinary oriented resources where you might be able to get some information um but i don't know of a centralized resource that's really accessible by institutional affiliation um the the typical bibliographic kinds of things don't carry that online catalogs and the records that feed those um i also should just note here as a side issue that um saying you want to retrieve by institutional affiliation is a much messier process than you might think um the right way to do this would be to have institutional identifiers and to connect those to the published works and to search by institutional identifier in fact um while there is some slow progress happening to do that um this is text search right now so you you end up having to look for CMU, Carnegie Mellon, there are all kinds of hyphenated variations um it's it's a lot like the problems you have with author personal names um and people who can't decide if they're Clifford Clifford from you know week to week but the amount of variation people can produce for a institution can be genuinely impressive there's also problems of time bindings on the affiliations and that sort of thing um but be that as it may that's you can kind of get a sense from there now the other side of this is that most institutions have some kind of um faculty reporting system so um uh it's very common as part of both the ten the explicit tenure and promotion system but also the um the sort of annual reporting on what you've been doing within your department or school to collect lists of publications and um there are many institutions now that are trying to do that in more structured ways and roll them up into various kinds of aggregations for analysis so those would be ways you can kind of see how you're doing in terms of um matching those against what's in your repository hi uh thanks for coming to speak with us uh I'm on a handgul again the second year clear fellow here oh cool um so see me speak for um one of the things that I think is part of the strategic plan for the libraries is to have the liaison librarians become more of embedded research data specialists right so to really be involved in that research data workflow at every stage from writing grants to saving operational data to data archiving and publishing whatever right so the whole thing but I guess my question to you is where where do you see within that research data life cycle the places where the library and and those of us who work here can really add the most value to what researchers are doing so I I at one level I really buy that notion of you know embedding with the research teams um I worry about the economic scalability of it um there are many researchers and few librarians relatively speaking and that embedding can be a very time-consuming process um my sort of gut instinct on this and I would love to see some efforts to um do some genuine data about this is that there's a very high pay off for getting involved early um thinking about things like how to collect description as part of the work process so that you don't have to be embedded in the work process but so that they just design the work process to do the right thing in the first place um would be much better it would be much better because you don't need any human in the loop um there are other situations where for big projects they are going to have IT people in the loop and if you can give them the right guidance they can probably take care of a lot of the curation stuff too um as as part of the you know sort of basic data management but um it's important to get in there um as the workflows are being designed and the data management plans are being written to um to get them to do the right thing so I would say front loading um these kinds of conversations is really really helpful and um you know I would hope we get to the point where um uh faculty say things like um you know I was able to produce a much more credible proposal including the accompanying materials like the data management data sharing plans because I had an expert in research data management from the library who worked with me on that and um uh you know now that I've got the proposal funded um you know we're we're just doing the right things that we said we would that were credible and got us the funding in the first place that would that would be really the happiest outcome I think in many cases and that's why also I've been very um supportive of strategies to inject the research data management folks into the proposal writing and and cycling phases through things like um the routing of of proposals through your uh whatever you call it your sponsored projects or grants office um those kinds of things those are just more ways to get you involved really uh hi I'm Matt Marsteller here from Carnegie Mellon and uh in your discussions on gold open access uh I was uh dying of curiosity about any specific comments you might have on scope three uh noting that uh well I'm on the governing council board um you probably know more about it than I do scope three is a you know in my view wonderful kind of attempt for a community to come in and um take its uh scholarly communication system from a place where it doesn't want to want it to be to a place where it does want it to be um I think that they have faced some very difficult negotiations and um I I wonder I don't know whether they've you know given up too much to get everybody on board to allow the transition to happen I mean one of the things um and this is not quite about scope three but it's not utterly unrelated that is striking to me is that um in one way the math physics archive which is you know really used for the vast vast majority of the papers in some specific fields of physics raises for me real questions about the extent of the value add for the traditional journals that sit behind it um you look at it and you say yeah but almost all the really significant refereeing and um uh discussion and and use happens very early on like it and that's what the archive enabled yet the physicists um for their various reasons have been able to afford to keep the piece by funding and keeping the subscriptions up for those archival journals on the back end of the process um I wonder if you know in the face of more severe financial constraints what choices they would have made I mean my I would at least like to believe that they probably would have been willing to throw an awful lot of things under the bus to keep archival live and archive is very cheap um if you look at the actual cost of the thing compared to the rest of the publishing apparatus there I wish I could say something more um coherent about um about the scope work I I've not followed the negotiations in in in great detail I think they illustrate you know how how genuinely hard it is even in a relatively constrained field with a limited number of publishers to to make a structural change and um certainly it should serve as a cautionary tale to anyone who wants to wave their hands about you know an international coalition to restructure um scholarly publishing across arbitrary sets of disciplines easily Erica linky from Carnegie Mellon so something you didn't talk about which but is related to sort of the publishing arm is what is your what are your thoughts about how libraries sees are granted are gifted are dumped on with taking on the means of production that is publishing scholarly journals on behalf or taking on university presses how do you see this fitting into the changing um and the the future of the scholarly record or the means of production oh that's a real that's a really interesting one um let me try to answer that in three or four different ways so uh because they're actually I think there's some different questions that are threaded together there so one question is about what are fundamentally platforms for scholarly publishing and um uh there are plenty of platforms around for example a very prominent one is the open journal system um and you find many many libraries now research libraries who are running um instances of that so that their faculty can run publications and it's kind of murky there clearly the library is putting some technical expertise and institutional you know consistency behind running the platform but in most cases that I've seen the library is not much involved in the intellectual work of running the journal there that's being carried out by faculty in terms of policies in terms of refereeing and editorial choices and that kind of thing so there there's somewhere between what you think of as a traditional publisher and a platform provider um I suspect that the landscape there is going to change um but there are two different forces pulling on it in different ways in terms of just running a raw platform um uh it's it's very hard for me to understand why individual institutions need to do that and why it isn't just going to roll to the cloud in the next few years so just as um so much of refereeing now for conferences is done through things like easy chair and you wouldn't run an instance of that locally um you know if you want to publish a journal you just register with um you know journals.com or whatever it is and do it um the other poll though is that um if you look at what's involved in running a journal you know there's the platform stuff and there's the content stuff but now there's a whole collection of other odds and ends about getting search engines to index you and getting identifiers and connecting to data repositories and things like that these things are becoming a significantly larger part of the work that people who publish journals do and um somebody is going to need to do that um and those those expertise are not typically the kind of faculty expertise is that make up an editorial board and a review cycle they're really about understanding the mechanics of the um you know sort of scholarly distribution system and the stuff you have to do to put content into it um archival relationships with something like portico or something um and I don't know where those come from um uh probably from an institutionally based thing um that's supporting a cloud based platform but that's I I I think it's a little bit of early days there um now another thing to say here is that there is clearly a growing trend of um changing the reporting structure for um university presses uh often those used to report to some random person the um you know vice provost of finance and business affairs or you pick it um different at every campus but now they are often getting attached to libraries and that's happening with various degrees of intimacy all the way from simply a reporting line but the press is like a separate corporation that reports up to you know very tight integration where they're sharing staff positions and lots of other things but uh and um uh the library is providing a lot of the um technical infrastructure that um some of the smaller presses could never afford to buy a standalone stuff I think the overall um impact of that is largely that it's forcing a rethink of institutional um strategies for disseminating scholarship and the role that the press plays in it and that's important going both ways I think that for too many years too many institutions blew off this question by just saying oh we subvena press for you know so much of so many dollars a year and they do whatever we're supposed to be doing here um and I think that's you know just a totally unsatisfactory answer particularly given the pressures for accountability for our public access to research you look at state institutions how they communicate what they do to their funding sources um you know it's very critically tied into this um so I think that's a very healthy um you know set of issues that are getting teed up by that um realignment uh exactly what role the presses come out of that realignment with are very unclear at this point um it may be for example that um people move away from the idea that the press is supposed to be self-sustaining um and uh more towards one that you should be subvening a press and the amount of sub subvention is really tied to much more strategic goals by the institution I should mention that um CNI um ARL the association of research libraries and the American Association of University presses have been working together and we're going to be convening a um meeting um in I think it's it's either May or June I'm forgetting the date of presses that report to the libraries on their campuses so bringing together the press director and the head of the library to try and better understand what's going on with those arrangements so I think with that I think we'll conclude if you have more questions uh for our our guest or for the dean uh please remember to come up after the um the session is ended to ask questions but you please join me in thanking uh Cliff and Dean Webster for the conversation. Thank you, please find us.