 Come back or welcome to the symposium. My name is Emily Bonjuivani. I'm the Open Knowledge Librarian at Carnegie Mellon University. Throughout the day, today's sessions and participant questions and conversations really demonstrated energy to disrupt the research ecosystem or call for some corporate publishing social responsibility as well as just this shift in what we think is collectively important. So I just wanted to make a note that this is really exciting and I love being a part of this community that's so eager for change and make movement and disrupt. It's really great down with the system, Friday energy. So thank you for that. Some quick housekeeping. As a reminder, I'll briefly describe the format of the session. Each speaker will give a short talk and then we'll have time for about one or two questions for that speaker individually. After all three of the talks we'll invite all of our speakers back on the screen for a panel Q&A. So if you have any questions that might be good for all speakers, I'll invite you just to maybe hold that for the panel at the end, at the end of the session. And please use the Zoom Q&A function to put all of your questions directly in there. So this is our last session. Our last session of the day is about open access publishing and some new developments in that area. Unfortunately, one of our invited speakers, Stuart King from Elife, was not able to be here to talk about peer review and Elife's new publishing model, but he has been kind enough to share his slides and those are available in the community notes which looks like will be linked again if for those who might need to find it again in the chat. So we have three really great talks lined up for this session presented by Sanjeev Singh, Joe Kraus and Alex Portney. Our first speaker in this session is Sanjeev Singh, a consulting professor at the Robotics Institute at Carnegie Mellon University and the CEO of the startup Near Earth Autonomy. He is the founding editor of Field Robotics, a new open access journal. So Sanjeev, please take it away. Great, thank you so much for that introduction. This is a topic near and dear to my heart and so I thought first as a title I'd be a little provocative about why not open access and I'll leave it at that as a teaser and sort of give you a little bit about my background and how I got interested in publishing. So first, I have been at Carnegie Mellon since 1985 first as an engineer then as a PhD student and then as faculty and now I'm adjunct. I have this role called consulting professor which means I go do something else. But meanwhile, what has been happening is that I've been sort of at the cutting edge of some of the work that is done in robotics and all over the world and started into kind of a research publishing world pretty early in my career as a graduate student and then as faculty members. And so I was essentially working in an area a subspecialty area we call field robotics which is robotics outside the built environment. So it was an interesting area. We didn't distinguish it as a kind of a distinct subspecialty for quite a while. And what we did notice was that people were pretty unhappy about the publishing world in this is like, in the late 90s, 98, 99 it was a very cumbersome kind of a process to publish and things weren't abstracts weren't searchable. You had to go looking all over the place for citations, et cetera. So a few of us started making noise in this area and quickly we're co-opted by the leading journal in our field called the International Journal of Robotics Research, which is sort of like typically the number one or two ranked journal in the field a very broad area of publishing. And so the editor said, hey, listen, why don't you help us do what you're seeing something, some deficiency rather than, he'd heard some rumblings about starting a new journal. So he co-opted us and we worked at it for a while we helped bring to bear a new kind of journal which had electronic access and that seems like so antiquated that you could just even do a searching of abstracts and you could find papers online, et cetera. So very happy, I was very happy with that. And then one day in 2005, I had our department had stopped by and say to me, hey, listen, Wiley is looking for somebody to take over a journal of robotics systems. This is a failing journal and they're looking for somebody to just revamp it, et cetera. So this was even a worse journal than the IJRR was when we help modernize it. And it had just completely hard paper review hard to believe that people would actually type up or print these things up with them, three copies in a manual envelope and send it out. And that's how you got to review sometimes in three months, sometimes four months. And you'd get an acceptance in the mail and physical mail and that would tell you what was going on with the paper and sometimes it'd have annotations, et cetera. There was no web presence. They would be sent out these journals that would be in the library, et cetera. So it almost seems like it's like World War II age now, but that's how most of the journals were. So the idea was Wiley was interested and I thought to myself, there's no way I'm gonna do this unless we can do something that is better than what we're doing already, which is with the best journal in the field. So we talked to Wiley and said, okay, look, we will take over this journal, but it has to be completely reformulated. We have to ask the board to resign and we'll take a new board. And that's the only terms we will do and we'll have to have electronic submission and electronic review and all the other stuff that we had asked for at JRRS. So there was this new journal that was born at the end of 2005. It was called the Journal of Field Robotics. There wasn't even like, we hadn't even really distinguished field robotics as a kind of a area before to focus on. And this thing now had almost everything you'd want, electronic workflow, paper copies, electronic, sorry, they were printing journals. Okay, they were printing paper copies. And as the editor, I would get like 20 of them of every issue at first. And then there was eight and then there were three and then there were zero at some point. But essentially, it had access. It was a website, everything was up there and you could get citations, you could search. It was like Nirvana, right? And the other great thing they did was we bargained for was what we needed was a managing editor, a paid managing editor and a stipend, basically a stipend for the editorial office, which was then used for having meetings and for paying for a part-time, half-time managing editor who would basically corral the authors and help with the tracking of the papers as they were going through. And then also they would help immensely with the special issue editor, special issue. So now, General Field Robotics got a big start by having a lot of special issues. And the special issues were fundamental people who come in and they would take over one topic and would do agriculture or underwater or aerial robotics or space robotics or something like that. We get a very energized community to work on this, but often these special issue editors didn't know what they were doing, so the managing editor would help immensely. So it went really well, despite the fact that every two years there seemed to be a management change. Okay, we're now Wiley, we're now Wiley Backwell, now we're Wiley once again. We're Wiley in Hoboken, now we moved to Boston, we moved back to Hoboken. All of this sort of stuff was happening altogether, but essentially the deal remained the same till about 2017 or so when a new management chain came around, what they said was, hey, listen, you're living high on the hog here, you're getting a stipend and we've got to get rid of that. And we've got to corral you, you're not accountable to the publisher and we need you to have a new contract. And the new contract would have been that rather than the editor-in-chief myself as the founder was now going to report directly to Wiley rather than reporting to the board. So we tried for two years to see if we could find some sort of common ground with them. They wanted to increase revenue, they wanted to increase the cost, they wanted to sell the cover, even though there was no cover at that time, there was no physical copy being produced. And so in 2020, somewhere in 2019 or 2020, what we did was the entire editorial board resigned and in protest and we started a new journal called Field Robotics, which had green access. Now, of course, what also happened in 2020 was we hit the pandemic. It was very difficult to move forward. So I took on the task of basically getting this thing going using my own resources, my own resources, partly my own personal funds but also discretionary funds that I had at Carnegie Mellon that have been basically a rainy day fund to essentially get this journal going. So now it has had some success and it has basically, it has a kind of a difficult future. I'll talk to you about how that's going on and the big question mark as to how to go forward and it all comes down to a few things and I'll help explain that, okay? So here's some insights and nothing here that will be surprising to anybody on this call here. So first, the role of the traditional publisher is being challenged, okay? And, you know, I feel for them, the traditional publishers have been greedy and difficult to work with. They have, you know, I don't need to give you that whole thing. You guys all know that. And they're basically with the electronic access. They, you know, there's a question about what value do they provide? When we had paper copies, they were, at least they were printing them and laying them out and printing them and distributing them. But all of that is, they don't even do that anymore. They run a website and, you know, what they're trying to do is, you know, essentially, even if they're electronic, they're trying to increase their revenue and they're trying to decrease the cost. So the idea here is the traditional publisher does not do much anymore, okay? You don't need them to be able to do it, but you do need some other things, okay? And so these are the key insights that were happening just as the same thing was being, the publishing world is sort of going through its own throes. I think they're having trouble with their finances and, you know, just justifying their own thing. But we as the people who basically generate the content, we review it, we, you know, give, do, and then cite it and all of that work, basically realize, hey, we don't really need to do, we don't really need them anyway. Everybody does LaTeX in my world already, forget about, you know, laying out on some other kind of thing. So why do we need them at all? Okay, so here's some quick thoughts about my things of basically having worked for about 25 years on scholarly publication. So what a good journal needs is, first and foremost, it needs a really good editorial board. And it's hard to get that, very hard to get a good editorial board. People are, you know, sometimes people sign on to editorial boards that are not really engaged. I think what you need is, sorry, I meant engaged community authors and readers really, instead of authors and community. So you need an engaged community, people who are like, you know, who would read the book and read the journal and then submit to it, et cetera. You need faster around, because if you don't have faster around, people get very cynical, they worry about it, they think about how they're gonna get these papers outdone, cited, all of that for their, for the promotion cases, okay? So that's sort of, these are easy to understand. But what we also need for a good journal is, you need money, okay? And you need money for managing the papers. And this is done sometimes, you know, all the big publishing houses, even the ones that do this well, have basically a rotating staff of whoever's there. It's kind of like a help desk rather than somebody who has a relationship. But we found that a publication story, editorial manager was absolutely critical to the success of a journal. We've built a kind of a really great following by having one person who would show up to conferences and talk to people, not just sort of an email address, and that they got, you know, anonymous kinds of responses from. But then also a workflow to do submission, review decision process that's electronic, right? Web-based kind of thing. We've used a scholar one and that's not cheap. So, you know, something that or something like that, you need to be able to pay for it. You also need to be able to pay for somebody to lay out journals properly so that people will take them seriously because the competition does that, okay? So you can take things that look like they're just very easily laid out and they stretch on, not well laid out, but there's a kind of a distinct kind of, you know, idea that papers are still treated more better if they are laid out professionally and they're proofread and they have good captions and they, you know, there's some consistency to them and not something that looks like it comes out of archive. And then you need the dollars to have a website that can actually distribute these papers so that, you know, somebody can go access these kinds of things. So, you know, that's what, these are things that go to journal needs. Now, you know, you can do most of this, the first three by having engaged community because, you know, they just do it because they need, they want to do it. But the, what's missing is really, and actually what open access I'm going to say doesn't have is, doesn't have good economic models. Well, you know, there's gold access and green access so authors can pay. It's not satisfying, you know? I mean, authors sometimes have a hard time paying for these journals. They have to find money, a couple of thousand dollars usually in their research grants to be able to pay for it. They're countries where, you know, that's a fortune. So there's no, that's not a really great answer there. Many journals have realized that they can have an open access kind of a, sorry, access this way, open access by having the authors pay ahead of time, typically a few thousand dollars in my world. And we could also have like a sugar daddy, right? And so I already used this informal slang here, but like, okay, somebody would say, all right, go. You know, here's a set up an endowment and a couple of million dollars and here's a hundred thousand dollars or something like that every year, you run this thing and then, you know, that's, but where do we find that? Somebody who doesn't have a conflict of interest. So there's been this question about can we have the readers pay something? You know, like, you know, if you listen to a song or if you have something on YouTube or somebody who consumes has some way of basically, well, one way or the other, having the people who consume this information provides a monetization is a question that we've been looking at. So what's happened with this is that, you know, this, the cost of running the field robotics thing, I'll sort of go back to this here, green access, which is completely free to submit, completely free to read. We've tried it for three years, 2020, 21, that's right, 21, 22, 23, we're in the third year of running this has come to close to $50,000 a year. And basically, I have personally taken on that, that challenge to get this thing launched. And we're kind of like at a basic an impasse to be able to find that kind of money from the right kind of place, kind of an arms line, like somebody would not have a conflict of interest at all. Or go back to a society to see if we would get, get them to take on this as a transactions that would be managed in the traditional way. So I've given a lot of thought to this over 20 years, 25 years, and I don't have a satisfying answer for this economic model. And I'd be interested to talk to anybody who would possibly have thoughts on how to make something like this sustainable. So it doesn't run with only light supervision and has a completely pro kind of look to it in such a way that it satisfies all stakeholders. So I'm gonna stop there. I don't know today, is that good for now? I don't know if I ran over. Yeah, Sanjeev, that was great. Definitely on trend with this change making energy and really interesting talk. Thank you for sharing that. I've heard a lot of boycotts in terms of purchasing the subscriptions to journals, but you don't hear a lot of groups and entire board of editors walking out. And so that's pretty, it was really interesting to see that that's what you found was necessary to at least be responsive. We probably have time for one quick question directly to Sanjeev. If anyone has one that are burning to ask, but as a reminder, we're gonna have all the panelists back together at the end of the talks for questions for all of them or directly to them. So I'll make a see the Q and A. Yeah, we do have one. Sanjeev, we here at CMU Libraries have a nascent journal publishing service. And would love to talk to you about, oh, potential pass forward, not a question, but a comment, but that was from Nicky, Nicky, our new associate dean for academic engagement at the CMU Libraries. So sounds like we'll be continuing some conversation. Yeah, sure. You know, we have spoken to the CMU library and I'm really happy to continue the conversation about how to make this thing happen. Yeah, there are some nuances there on how to do this and how to put content on there and how to moderate it. The big deal here is this, right? I mean, where do we find the money to run the management and the layout? That's the issue. So if somebody has, I mean, what I heard there was in the previous thing, I'm just gonna be completely forward here. And maybe there was to solve this is that, hey, look, if you have content that's laid out, then we have some way to sort of curate it, okay? Well, the question is, how do we do this, you know, this intuitive submission, review, layout and all of that, how do we pay for all of that? And that's the trick, right? Because I think the CMU thing doesn't do that. There was for a while, there was this thing called the open OJS and we were working with them, but during the pandemic, that that was something between Harvard and Pitt, I believe, and that thing was oversubscribed or they didn't have funds themselves. And we were, after going through a pretty extensive process, we were told that they were not gonna follow up, follow through with being able to host our journal. So happy to talk about that. If there's a way to do this, that the economic model still needs to be taken, you know, considered. Great, well, thank you for that talk. We'll go ahead and switch speakers. So our next speaker is Joe Kraus from the Colorado School of Mines. Joe co-administers the Mines repository and he was an editor of the journal Collaborative Librarianship from 2009 to 2016. And he is a founding co-editor of the Journal of Creative Library Practice. So Joe, I can see your slides in a PDF format. So I think that's probably what you're intending. It seems like you're good to go. Yeah, so hopefully I can just hit the down button. Okay, it looks like, does everybody see the next slide? I do, yes. Okay, so first I'll just start by adding in some, thank you very much, Emily, for the great introduction. First, I'm gonna plan on putting in my slides into the repository so that way you can find this later on so that way you don't have to try and take notes on everything. But see, at the Colorado School of Mines, I'm gonna refer to it as just Mines so that way I don't have to say Colorado School of every single time. So I've been with Mines since a little over four years and I've been helping them move the repository called the Mines repository from one system to another, both using dSpace. And before I was at Mines, I was at the University of Denver from 1998 to 2016. And I've been in open access advocates going back to 1993 when I first really started using the internet. And I think the first time I used the concept of open access was in 1999 and a presentation that I gave. But I think even in 1999, the phrase open access wasn't a thing. And Emily mentioned the two journals that I've been working on. The first one, Collaborative Librarianship that started using the OJS software. And I did work on layout with that. And I also worked on a section of website called Collaborative Librarianship News. And with the Journal of Creative Library Practice that's done with a small editorial board and we use WordPress as the platform. It's essentially just putting information into the WordPress website. And I also help with layout on that as well as other editorial duties. But they're both pretty small. I think the Journal of Creative Library Practice we usually get about seven to 10 articles a year. Collaborative Librarianship might be more like 40 to 50 articles or items a year. And I think Collaborative Librarianship has since moved to a different platform at the University of Denver using another repository system. Okay, I also wanna note that I'm part of a team. I help co-administer the Mines Repository with Christine Baker. I also work with a working group involving Lisa Dunn, Seth Voletich, Nicole Bekwar. So it's not just me who does the Mines Repository. I work with a lot of other people. And I know today as a symposium it's more about open science as a big concept versus just open access. So there's different flavors of open access. Previous speaker talked a lot about the Gold Open OA repositories often use the green flavored, the Journal of Creative Library Practice as a diamond OA journal. So there's a lot of different types of open access. And with open science there's a lot of other things that need to go on that libraries take part in doing working on open educational resources. We provide open access fees, support for those Gold OA journals that need. Financial support, we provide advice and scholarly communication, copyright issues, open data resources. And we have also have investigated, considered and signed transformative agreements with publishers in order to incentivize open access for some of our researchers at Mines. Two recent developments that took place. I'm sure most of you hopefully have heard the White House Office of Science and Technology Policy came out with the recommendation that taxpayer supported research should be immediately available. The American public at no cost and this policy should be taking place no later than the end of 2025. So that has stirred some discussion on our campus. And before that, there was also a national security presidential memorandum NSPM 33. And this asks researchers in order to ask the researchers on campus to use persistent identifiers, DPIs. In many cases, it's using an ORCID ID as a way to keep track of researchers. So this memorandum has helped us set up conversations with some of the other research groups on campus since we wanna help our researchers get ORCID IDs in order to help track their research. So since I've got the title of the document or the title of this talk being supporting open science on a small budget, what do I mean by small? And I know small is relative. So in our case, we are just recently moved up to be a Carnegie R1 institution. And I know just like a day or two ago, I think I read inside higher education that the Carnegie R1 classifications and R2s and the other things, they're probably gonna be changing in the next couple of years. So that way it'll, the way Carnegie or institutions are categorized might be a little bit different in the next couple of years, hopefully, or probably. So compared to the other R1 institutions, there's a little over a hundred public R1s. I think we're pretty much the smallest as far as size of student body with about 7,000 total students, roughly five or 6,000 undergraduates and maybe one or two graduate students. We've got a relatively small library budget compared to a lot of other R1s. We've only got under 20 faculty and staff, so it's pretty small. And the size of a repository, we've got under 19,000 titles. So compared to some other repositories that are done by other R1 institutions, like some of them might have 70,000 or over a hundred thousand items in their repository. And we also have a small budget for the repository. We outsource the commercial hosting of it. So it's, you know, it'd be nice if we could pay for full-time technical staff and a server, but it's all outsourced. So some of the issues that we address because we are small were limited to one repository system. It would be nice if we had two or three different kinds like one for main, one for documents, one for images, our special collections, people would love to have a second repository that would be able to highlight some of the great images that we've got and then one for data. But, you know, we just have one repository system to do all three main tasks. It would also be nice if we could move to dSpace version 7.6, I think, when we want, but right now the hosting company still has us on a lower version of dSpace. Concerning that memorandum from 2021, we wanted to add in an Orchid API feature into the repository, but we can't do that until we move to dSpace 7.x. So we can't, you know, just take advantage of that quite yet because we're, we don't have the staffing in order to move us to the next version of dSpace. And we also recently lost access to Google Analytics data because Google changed the way they collect the usage data. So we could pay more money to the hosting company in order to set up a different statistics gathering system but we just need to wait until we get the new version up and running, hopefully in early 2024. So how am I doing for time, Emily? You're doing well, you've got about five minutes. Okay, good. Okay, so some of the future directions that we've got, we want to include more content from more departments. Some departments seem to give us more information and data than others. We want to increase the amount of research data. We're getting a smattering of smaller data sets but not all that much. We want to make the self submission system easier to use. We'd like to expand our sources of funding. And I think a long-term goal for me would be to, I would love it if we had a campus-wide mandate to have like all of the researchers and that all of their post-print articles into the repository. But I think if we do start those discussions of making that kind of culture change, we'd probably want to do it at the departmental level first and hopefully then move in that discussion onto a campus-wide mandate. So now I want to take a couple of minutes to do kind of a show and tell to show you the kinds of stuff that we do have in the mines repository. So we've got things like videos, images, aerial photographs. That's a lot of theses and dissertations. And I think the theses and dissertations are some of the most highly used stuff that we've got. We have whole conference proceedings with like hundreds of papers, a lot of open educational resources, special collections materials. There's a lot of special collections. There's stuff in there, reports from research institutes. We've got curriculum items, whole books, newsletters, student publications. I've been working with different units on campus to try and get more student publications but they seem to have go uptake. You've got maps, data sets. There's an example there. We've got post prints of book chapters, journal articles. So I guess I wanted to do this presentation more as a, instead of it being just how we done it good, what can we do to do it better? How can we better affect our culture change on campus in order to get more uptake of having more people put stuff into the repository and what kind of incentives can we use? I know promotion and tenure guidelines, a lot of campuses use that in order to try and help get more people to use the local repository. But I guess I want to use this as just a good way to ask for feedback from all the attendees. So I think that's it. Well, thank you, Joe. And thank you for asking. I like that you've asked a specific question to the audience. Hopefully that I'll get some engaging conversation. I do have a question specifically for you. And I guess I could, I'll ask it now, but we can also be curious to maybe even see what the other speakers have to say. But I'm thinking about, you know, Howard, there's a lot of conversation today about, you know, educating our current students on these open science topics. And how do you see or have you used the Minds Institutional Repository as a mechanism for teaching students about open science topics? Or more broadly, how do you see institutional repositories as a mechanism for teaching current students? See, I've tried to do that on a one-on-one basis. We'll get questions like, I'm also a reference librarian as well as working on the digital repository since all of us librarians have to wear so many different hats. So I think discussing the changing nature of scholarship, trying to get students to understand and take part in open science initiatives has worked in some cases. And I've also tried to do workshops. So we've got a good workshop series where we can have five or 10 or 20 different graduate students come to a workshop session. So sometimes that helps us get the word out. Great, thank you. All right, well, I'll ask folks to hold any other questions till our next speaker is finished, but thank you, Joe. Yeah, thank you. All right, and last but certainly not least, our last speaker of this session is Jason Portnoy. Jason is a senior data engineer at OpenAlex, the open comprehensive catalog of the global research system. It's his job to understand and improve OpenAlex's data and to show people how to make use of it. So Jason, if you want to share your screen, you'll be good to go. How is that looking? I'm seeing, oh, we're good. I'm seeing what it's supposed to see. Great, so hi, everybody. My name is Jason Portnoy. I'm in the senior data engineer at our research, which is the nonprofit that makes OpenAlex, as well as a few other products, including Unpaywall and Unsub and a few others. And first, thank you for inviting me here. Thank you for the introduction, Emily. And also just want to apologize. I had some recoverings from some unexpected dental surgery. So I'm not exactly at 100% now and I'm adapting this presentation from a longer one. So I'm not sure if I'm gonna speed through it and take too little time or runway overtime, but I will do my best on that. So OpenAlex is an open and complete index of the global research ecosystem. When I talk to friends and family outside of this world about what we do, I usually start, hopefully, with have you heard of Web of Science? Have you heard of Scopus? The answer is no, inevitably, but I move on. Have you heard of Google Scholar? And sometimes the answer is yes there, but that's sort of the world we operate in. It's about meta science, science of science, science as data, but all research publications and connections between them. And we call this scientific knowledge graphs or SKGs and they're becoming essential infrastructure for a variety of use cases, including research discovery, if you're just looking for research, scientific metrics and research intelligence and assessment. So what we actually do at a basic level is we index all works. We're speaking, there's a lot of ways to define it, but scientific research, scholarly research, we've already had some discussion about this today, I remember, but scholarly publications is what we're interested in. And then also things that are connected to that, so sources, such as journals and repositories, concepts that they're about, publishers, institutions, funders, authors, and we keep track of as many of these as we can and link them amongst each other to the works. And these are some counts of what we have, we can actually go to our API and get up to date counts. Just, of course, not important to know all this, but just some of the metadata we have available for these different types of things that we announced, which we call entities. We have titles, publication dates, persistent identifiers, including our own DOIs, PubMed IDs, open access information, the type of open access, the article processing charges, APCs, funder information, these are all things we try to keep track of, and then institutions, authorities, sources, concepts, and various metadata about all of those. Anything else I wanted to, yeah. So, moving back to some examples of SKGs that already exist, we have Web of Science, we have Scopus, which is owned by Elsevier, Google Scholar, there's a question mark, it's not exactly the same thing, but it is mentioned a lot in this space. Microsoft Academic Graph, which we are basically, they sunset it, they no longer exist, but they released their project to the community, and we are building, basically we took the mantle from them, and we are building on what they did. Dimensions is another one, Crossref, and Open Outs. So, Open Outs is very young. We, Microsoft announced that MAG was going away, Microsoft Academic Graph, and they actually discontinued it December of 2021. Open Outs lodged in January of 2022, and we've been moving very quickly since then, and we're still young, we're still scrappy. Hopefully you've heard of us, but if you haven't, hopefully you will soon, or I mean you have today, but you'll be hearing more and more of us, because we are up and coming in the space. We launched a user group and a customer support ticket system, which is really important, because we are really intent on engaging directly with the community. We want a two-way conversation about what we're doing, we're building everything out in the open, and we're trying to differentiate ourselves from big players like Web of Science and Scopus in that way. We introduced full tech search in August 2022. I joined in February of this year, 2023, and soon after that we started offering a premium service, which is a paid service that's part of our sustainability model. We are mostly funded by the Charitable Fund, Arcadia, but we are moving to our sustainable model, which Unpaywall is already doing a pretty good job of, and it's a premium offering that offers certain benefits in terms of the services we offer, but our core offering of the data is totally free and open and always will be. One big thing we did over the summer is launch an improved author disambiguation system. We are constantly working to try to improve the data as much as we can and to have it be a viable source for all sorts of research intelligence and assessment use cases. So our big selling points are we are big, easy, and open. We have about twice the coverage of other services. We have almost 250 million works and much better coverage of non-English works and works from the global south than our competitors. We are easy. We have a fast, modern, and well-documented service, and we are probably most important to this audience as we are open. Our complete data set is free under the CC0 license and that allows for transparency and reuse. So I'm gonna talk a little more about that open thing. It's really a major feature for us that we're open. It's half of our name, open Alex, of course, and so we do take it seriously. And so in contrast to the other SKGs, the paid-of-USKGs, their subscriptions are costly. Their results can't be shared and by extension, your results can't be shared. You can't necessarily build on them and you inherit their exclusiveness. So I'll talk a little bit about each of those. Their subscriptions are costly, so that's a problem, obviously. Pressure on budgets is intensifying universities. Also pay well systematically to less wealthy regions and even after paying for a subscription, your access is limited. Open Alex is free, enabling equitable access across the globe and we don't limit access to data at all. More about that, how your access is limited and the licenses tend to be pretty restrictive. So their results can't be shared and by extension, whatever you do with it cannot necessarily be shared in full. And that limits transparency in decision-making and limits the reproducibility of research about research. And because Open Alex is completely open, anyone can examine and replicate the analyses. And yeah, so another point is you can't necessarily build on the other SKGs. So you don't have access to the full data sets. You can't use it commercially, obviously. You can't integrate with internal or external dashboards and you can't really develop derivatives tools and these are all things that you can do with Open Alex and people have been doing. It's completely open under the CC0 license, which is public domain. Anyone can examine and make use of it, however they wish, without getting any lawyers involved, without just having to worry about that at all, even for commercial use. And we're already seeing this. We have a thriving community of people developing exciting tools and extensions based on the data and we're working with a lot of them too. Last problem is one of the equity and exclusion. The other SKGs tend to have these exclusiveness criteria that can create biases. Some examples are to be included in those, you must have an English abstract. They don't include pre-prints. These types of dissertations are excluded. There's certain types of peer review, which aren't allowed. For example, in the law profession, there's a lot of research that just isn't included because the peer review works differently in that field and it's systematically excluded from those SKGs. And a few other. And so open analysis has a philosophy of being inclusive and allowing you to apply filters to pick which data you do and do not want to include for whatever purposes you're using it for. So this is about open house benefits from an open ecosystem. We really couldn't have done this even a few years ago, but it's the momentum around open data, open scholarship that is enabling us to be able to do what we did. And you can understand this as we inherited a lot of data from Microsoft Academic, which worked a lot like Google Scholar crawling the internet, trying to find all the articles that way. But going forward, we're mostly using Crossref, which is all about publishers and anyone publishing scholarship, making it open for these types of purposes. So you can see that these graphs just show that over time in publication date, the proportion of works that are coming from Crossref with this open data that we're able to make use of is vastly increasing. We're also benefiting from open source machine learning tools, and database and architecture tools, and a bunch of different publicly available data sources such as Crossref, Orphid for Authors, Roar, the amazing Roar initiative for institutions, and Wiki data for a lot of structured crowdsourced data around knowledge and concepts. Open Alice has broader coverage than any other SKG. This is a somewhat out of date, but just comparing us to some of the other ones, we have about almost 250 million works at this point, which is significantly more than websites and Scopus. Google Scholar does beat us among the estimates that have been done, but their data is very closed down and we don't really actually know how much they have. So, yeah, if you wanna look at how we're being used, we have a lot of testimonials on our website, and this is one example from VossViewer, which is a popular visualization tool, but they're able to ingest our data directly from the API, have live network-style visualizations of science. More testimonials from a variety of use cases, people using us in industry, in research, in nonprofits, libraries, all sorts of cases, and we're working with a lot of them. We are also open about our limitations to just go through a few of them. We inherit bias from our sources, our biggest sources being CrossRip and Microsoft Academic, and they do have their own inclusion biases in terms of a research base of people doing scientific research based on open Alice and things like that. We are so young that there isn't a lot of that going, but it's coming rapidly and there's more and more of that being conducted and it's very encouraging. We have great coverage, of course, like I've said, but we are missing some things. We have limited software and datasets. We do include a lot of them, but it's not our focus. We don't have patents, but notably, there's a professor at Cornell, I believe, who has done work linking our data to patents, which the open nature of our data allows that sort of extension of what we do. And another limitation is our stability. We are improving constantly, but even from day to day, you will see changes in our data, sometimes in our data schema. We're at that stage now where we are building something rapidly and it is not entirely stable. So there are three main ways to use open Alex. You can use them right now. One is the snapshot, which is the entire dataset. You can just download it all right now. And that is a big difference between us and other SKGs. We offer it through Amazon S3, but it's the full dataset. It's not that easy to work with, but it's a lot of data. We suggest you start with the API and not download the whole dataset because it's unwieldy and difficult to work with, but you certainly can and there are a lot of people doing it. There are a lot of use cases that it enables. The API has been the main way to work with it. You enter a structured URL. You can apply filters. You can apply group eyes to count things, but it will just send you back the data. So this URL will get you all the works, but if you apply some filters, you only want open access works. You only want open access works from a certain institution. You specify all that in the URL and it'll just return the data to you. And that does require some programming experience and for people who don't have a lot of that, we have a web interface coming out. It'll allow for all sorts of searches, group eyes, analytics. That's coming very soon. So please be on the lookout for that. I am going to skip through most of this. This is some research intelligence use cases that can be made pretty easily with open Alex. But just to demonstrate, this is to analyze an institution's work. Like I said before, we tag sustainable development goals, SDGs from the UN in our works as an example. So you can see the different SDGs at a given institution. It's a relatively simple structured URL that's actually included in our user interface. This is a way of tracking how are we progressing towards our own goals as an institution using another structured URL. And I'm going to call it there because it seems like we're, I am running a little long, but we want you to use open Alex if you haven't already tried it. We definitely welcome your feedback at this, this URL. We have a lot of well structured documentation, which could answer a lot of your questions. But of course I am happy to answer anything. Any questions that you have. Thank you. Thank you, Jason. And there's definitely some, some questions that have popped up for you specifically. So I'll maybe toss a couple of those yours way. And then we'll invite all the speakers back and continue on with the larger Q&A. But thank you very interesting, especially to kind of hear the nuts and bolts behind it all. So let's kick this off with, there's one question just saying that I remember testing a beta version of the user friendly interface of open Alex. Is that still available or was it rolled back? It's, so we will be announcing the actual beta. I'd said the, what, what do you call a beta? We were calling the alpha. And the beta is coming should be, it should be in the next few days actually. So it's, I can't, I can't exactly say that it is out or it will be out, but it is coming very soon. It's going to look a little different from the alpha version that you saw. But yeah, if you're in the user group, we'll definitely make an announcement on there. Or you can just keep checking back to open Alex.org. But we're going to basically switch it over so that the beta version of the UI will be what you see when you go to open Alex. So you'll just be able to immediately start exploring the data. Great. Thank you. And another question on open Alex. It says, someone said I might not be understanding correctly on how you collate your data, but if you're automatic, but if you're automatically pulling articles from multiple sources, do you worry about duplicate records? Oh, absolutely. Yeah, it is a little complicated and I apologize for not making it clear. That's the core of what we do is that sort of deduplication. So we get a lot of information about this article that came from Microsoft academics. They released all their data when they sunsetted it and said, here's all the works. We get something from an update from Crossref. And so yes, we do a lot of work to deduplicate. The easiest way is with DOIs, which is becoming increasingly available and is great. But beyond that, we do our best to identify one persistent identifier. We assign our own open Alex ID for a given work. And then we have a concept called locations, which is the places on the internet that you can find this work, either behind a paywall or open. And we will list out the primary location, which is what will be considered the version of record, say if it's in a journal, the best OA location, if it's in a repository where you can just grab it, we'll list any others, if it's in PubMed, we'll list that. But ideally, it's all under the same, if we're doing a good job, which we think we are, but not certainly not perfect, all of those locations will be collated in the same work, which we deduplicated. Great, thanks. And I've got a couple more questions lined up, but I'll invite actually all the speakers back, just so that way if they can chime in across questions and we keep this more of a conversation. So Sanjeev and Joe will bring you both. And then Jason, you can stop your share. It's amazing, these zoom wizards behind the scenes and how they just make magic happen. Really. So I have another question that came through for Jason specifically, but Sanjeev and Joe, feel free to chime in on your thoughts as well. So this person says, I work in the evidence synthesis space and know that open Alex is already making possible completely new ways of synthesizing research and making things like living systematic reviews more possible. How do you see knowledge graphs like open Alex changing the way we search for and discover scholarly information, maybe particularly in light of large language models and GPT technology? Yeah, so I don't want to say anything about our roadmap or anything about what we're planning on doing. We don't actually have any specific plans of using large language models. I think that the work we're doing will certainly enable all sorts of use cases of organizing scholarly knowledge and helping out with that. Personally, I've heard a lot of reports of large language models just making up things about the scientific literature. And I could imagine that the work we do could help with that because that's a really serious thing. You ask a large language model to give you a well-set article in this area and it might come up with just something completely fabricated. So if you bound it to something like we're doing, which is we're trying to keep an eye on what's actually happening, you can hopefully put some bounds on that and not let that happen. Yeah. We also, we, oh, that's probably, yeah. Thanks. Joan, Samjeev, not sure if you have any comments or thoughts or concerns in that area if you wanted to share, but you're welcome to. Well, I do wonder why theses and dissertations are kept out of the system. I think it would be great if you could draw upon those because that's such a huge part of our minds repository. Yeah. So we do actually include a lot of theses and dissertations. And I think, you know, we, I think the, the only reason we have at this point of not including any given dissertation is that we're not doing a good enough job. I don't think we have any exclusive criteria necessarily of any reason that we wouldn't exclude them. It's just we, we are focusing more on trying to get all of the traditional scholarly publications, you know, journal articles or similar of what people consider that. So it's, it's just that we, we, we haven't gotten to them yet. And, and we are working on it, but I don't, I don't think we necessarily have exclusion criteria around those. We already have a lot of them. Great. Thank you. We have a question. I'm thinking this one specifically for Joe. Do researchers at your institution self archive the research output? What method do you use for researchers to submit the articles? We do have a process. There is a submit button essentially on the website, but we just don't get that many authors to, to use that to find there. A lot of times Christine and I'll work behind the scenes in putting like a batch of items into the repository for, I'm thinking mostly of like the undergraduate research symposium will work. We'll have the students submit items through a different system like Microsoft forms or Google forms. Then we'll save the PDF documents, have all the metadata worked on, and then we'll put them into the repository behind scenes instead of having each individual student submit it directly through the repository. Yeah, thanks. Yeah, having systems in place can really help just increase the content, the quality of content. And this one is for Joe or others. So what has been your experience with copyright compliance awareness with your repository? This is something we often run into at our institution. Folks are sometimes enthusiastic about submitting their work. But when we explain, we can't take the final version. They lose interest or become frustrated. So some frustrations with the green self archiving model. Yeah, we've worked with some authors who, you know, just want to give us the final PDF that they've published with whatever publisher. Then we explained that, you know, we can have the peer reviewed version as a final draft. So for, I think for authors who do want to have more of their content available, open access, they'll take the time to find that Microsoft Word version or whatever slightly previous version to whatever had been turned into a, and laid out as a PDF at the end. But I think it is a detriment and copyright is just such a mess. Like my colleague Seth Valetta, she usually deals with a lot of the copyright questions. I know good amount of copyright, but I'd rather have more of a definitive answer done by one of my colleagues. Thanks, Joe. So a question for the group. I'd be interested in the panelist thoughts on the subscribe to open publishing model for journals. Maybe I could take that one. We've looked at a couple of these things. I think it's a really promising idea, you know, what you want is some sort of collective, the way I understand it, some sort of consensus to be able to get enough interest in a journal to make it worth, you know, being just supported by the subscriptions. I don't know how to make this work, you know, for places that don't have subscriptions and how this sort of like changes the planning for this, for an economic model, unless you are already well established. If you're well established, say, okay, you know, if I'm going to get enough money from my subscriptions, then maybe that's good enough and we'll make that open access. Maybe others have some way of thinking about this, but I think that, you know, I would like to understand how this would work for, you know, anything other than establish publishers who could make this decision on one year at a time. Maybe I could say a little bit more. The idea here I understand and whoever asked the question could also explain this and maybe you know, Emily, that every year there's a decision as to whether there's enough there from this, from the subscriber base to be able to keep that journal open access or not. And I think that works if you have large bodies that are dealing with subscriptions. Did I get anything wrong there? Yeah, and you know, also, I'm curious as to like, you know, you were talking about the different economic models, you know, with the approach for converting, you know, if there's a, if it hits a quota for his number of, you know, open access, if it hits an open access quota, it could be flipped. But yeah, and I'm seeing another comment come through. So yeah, it's, you know, it's exciting to hear about field robotics and a similar thing happened at a well known neuroscience journal Elsevier this past spring neuro image. And hopefully more journals will follow suit as you did, you know, you did, and this will become sustainable. Yeah, thank you for that. I mean, it's not sustainable right now, right? We have been hunting for this, this model that will keep, keep the journal high quality. I mean, one, one idea that was offered to us just so you know, is, hey, make the template for the papers available. Paper is accepted through the review process and you, you give the template to the authors and say, hey, go to town and make this into a publishable PDF. And then, you know, we'll put it up on our journal that could take care of some of the costs. But I think it doesn't on, you know, doesn't remove the economic model thing here. I think what we need it is some way to think about how you know, when a single is when a, when a song is played on the radio or on Spotify or whatever, the singer gets like one or two cents or something like this, right? I mean, I think if we could use come up with a model that's similar to that, maybe the publisher can actually recover their costs. And if it was hugely successful, the authors could get some of that money back too. But I think this is, this is where open access has to go. Otherwise what it will be is, you know, the authors are putting a huge amount of work, more or less because they they're in it for academic publishing, scholarly publishing, and, you know, they're not going to be able to afford to keep journals afloat. Despite the fact that they do all the work in addition to the fact that they do all the work and generating the research and writing it and reviewing the papers, et cetera. So if they have to pay for it also, it seems like it's going to be hard to scale this thing. So just some thoughts there. Yeah, yeah. And, you know, the high APCs, you know, I know it's been brought up a bit today and about how, you know, how are we making sure that we're also inclusive of researchers or specifically authors from institutions or countries that aren't funded enough to support that. And, you know, I understand just like we said, even right now it's the model that big publishers are businesses and they need to demonstrate profit to shareholders. So it's going to take a lot of creativity to make some substantial change on that front. I think there's some interesting alternative models, you know, like a peer Jay has that lifetime membership option, which waives membership fees for authors that are coming from low income countries. And so I'll be curious to see more creativity in that space as a response. We are just about out of time. If you have any last thoughts that you had to, you know, you really wanted to share with the group, I welcome you to do so. But I really appreciate these great talks from you three really from out the whole day. And we have one more quick wrap up session after this. So Joe, Jason and Sanjeev, thank you again. Really appreciate it. And I'll turn it back over to Melanie. Hi, everyone. Thank you for those of you in the audience and for all of our speakers today. Before we wrap up with our closing remarks, I do want to remind you that we have a survey that is linked at the top of the community notes document. We do appreciate any feedback that you might have as it helps us plan the next open science symposium. So with that, we are going to end the day with closing remarks from the Dean of University Libraries, Keith Webster. Keith has been a champion of open science for a long time now. And so he's going to share his thoughts on this topic. He had some travel disruptions, but he thought this might happen and had the foresight to record his comments for us. So I will play that recording for you now. And it's just about 10 minutes of comments. So a brief wrap up to the end of the day. My screen. Good afternoon, everyone, and welcome to this final session in the 2023 open science symposium brought to you by Carnegie Mellon University Libraries. I'm Keith Webster, Dean of Libraries. I wanted to offer a few remarks to help set some framing around an institution's perspective on open science. And also by positioning the work we're engaged in today in some historical context in the first instance. The earliest scientific journals began to appear in the middle of the 17th century. And until then, researchers to a large extent had little incentive, often persecution, religious impositions that got in the way of their sharing and opening up the minds of their scholarly endeavors. But scientific journals changed the dynamic. These offered a mechanism through which researchers could share their ideas with each other. Fast forward 300, 350 years. And whilst the journal business, of course, has expanded dramatically. It is really the advent of digital technology that has accelerated the progress towards a truly open scientific market. Whilst many important endeavors have emerged along the way, I would point to the report from the Royal Society released in 2012 as a real driver of the momentum that we are enjoying today. And as we look to progress into the United States, we see things like conferences and toolkits coming from the National Academies. And of course we have all celebrated and appreciated the designation of 2023 as the federal year of open science. And as that year comes to an end, I think it's timely to look back on how we got here and where we go next. There's little doubt that the COVID pandemic has had a real driving effect on policymakers' perspectives. I think that UNESCO really framed this very elegantly, that the pandemic brought together the global scientific community in ways that we simply hadn't seen previously. We saw that players from universities, industry, government and research organisations came together across their organisational boundaries. And the research community crossed national borders because we all recognised that we were in this together and that the fastest way to discover treatments, pursue a quest for vaccines was by global open collaboration. And I have heard many leaders in the past two or three years say that we simply can't go back, that the pandemic and the impact on scientific activity pointed a way that has to be part of the future. But of course in doing so we recognise that each of us is part of an institution or organisation and we all need to collaborate. So let me just say briefly how we think about open science at Carnegie Mellon and I do so through the perspective as you would imagine of Carnegie Mellon University Libraries as that is the bit of the university with which I am most commonly associated. But I will say that what we do in the University Libraries really is about building a service, offering expertise that is for the university. These are things that are administratively housed in the University Libraries but I very much view them as university activities. And our work over the past eight to ten years has been framed in the context of Carnegie Mellon University's strategic plan within which it set out an ambition to create a 21st century library inherent in which was a recognition that scholarly communication is changing and that we need to be conscious of developing our services in a way that aligns with that. When we think about open science from a library perspective invariably our historical reputation has been built upon managing the record of scholarship and in an open science environment we recognise that through the perspective of providing open access to the publications, data and software and code that represent the primary artefacts of the research process. But in seeking to maximise access to these we need to recognise that the scientific workflow has to be optimised to ensure that the products of research are developed where possible in a way that maximises their shareability, their usability, their reproducibility. And we have spent a lot of time building an end-to-end open science workflow that offers our research community a suite of services and tools that we have tested and believe offer great functionality and we have done so in partnership with organisations like protocols.io and lab archives and the open science framework and my colleagues will be glad to talk with you about their perspectives on these workflow solutions. Let me turn to the ways in which we are supporting the opening of the products of research. Firstly for publications we have a range of approaches to supporting open access. For example we support pre-print servers, we offer an institutional repository and we provide financial support for open access article fees. But our institutional view has been firmly that we wish to allow our researchers to retain the rights to the work that they publish. And we also wish to take the administrative headache out of the researchers submission and publication of articles and therefore we have pursued an approach to securing agreements with our major publishers such that any article with a Carnegie Mellon corresponding author will be made open access assuming the author agrees immediately upon publication. In 2018 I looked back at our publishing activity in preceding years and recognised that 72% of Carnegie Mellon's output had been published in the journals managed by half a dozen major publishers. And that represented an opportunity for us to work with those publishers to arrive at agreements that met our needs. And over the last couple of years we have continually reached out to publishers to maximise those opportunities for our researchers. Perhaps most notably our first agreement was with Elsevier and the first institutional agreement of the sort that the world's biggest publisher had agreed to. And in the intervening period we have worked with many other publishers. The one that was top of my list the Institute of Electrical and Electronic Engineers remains a tough nut to crack but I do hope that over the next year they will come to the table with a business model that we can sign up to. On the data management front our generalist repository which accommodates publications, data, code, images, presentations, you name it, as long as it's digital it will fit into the FIGSHIR platform which we have branded as Kilt Hub. And we offer services and expertise to ensure that our researchers can meet the funder requirements for data management plans and for data shading as well as collaborative opportunities where those are appropriate. And just to give you a sense of how we have to have a bit of tartan or plaid in the branding that is the Kilt Hub homepage. To meet the needs of software we were grateful to the Alfred P. Sloan Foundation in 2022 awarded us a significant grant to establish an open source programs office. You will have heard from side Chowdhury this morning about our work in that area. And that represents part of an overall approach where we serve the university community through a collaborative network that represents our aspirations around open science and open data. We provide training to the university community through the carpentries series of workshops and through locally designed interventions. In doing so we focused on ensuring that our colleagues are as up to date as possible and my colleague Emily Bunchivani led a team that secured an IMLS grant to support the development of a national science training program. But the sense of community is also at the heart of what we do. We've seen things like rapid prototyping through hackathons. We have sought opportunities to work with the policy makers to ensure that they understand an institutional perspective inside and I had an article published in the Hill a few weeks ago about many of these issues. And we ensure that we're possible the university community understands what we're offering. This is just the heading of a substantial news item that we released to celebrate the year of open science and ensure that our community is aware of what we can offer. But let me conclude by recognising that software the White House memo that came out last year about public access to the products of research was notably silent on software for good reason. But we shouldn't ignore the fact that software is important. Earlier today you heard about the development of the Carnegie Mellon Cloud Lab and that is going to be a facility from which vast amounts of research data will be generated. It is built upon a commercial facility founded by two Carnegie Mellon graduates and their primary market has been the biotech startup community in Silicon Valley where a very closed system has been critical. But we were so grateful to Emerald for working with us to make their programming language open source. That is a critical step in opening up the research from the Cloud Lab for shareability. And we are working with the Cloud Lab with our partners at Figshare to ensure that the research coming from the Cloud Lab can be deposited into our Kilt Hub repository where appropriate. I hope that's given you a sense of our agenda at Carnegie Mellon. We are focused on the needs of our community but we are keen to partner with others to ensure that anyone who might benefit from the lessons we've learned along the way might do so but also so that we can benefit from the emerging best practice from other institutions. With that I want to thank everyone who has made this possible. An event like this takes the work of a lot of people. I'm truly grateful to our open science team, to everyone who presented today, to those who ensured that the technology worked and to everyone who joined us online. We look forward to seeing you at our next open science symposia. Thank you to Keith for the closing comments and thanks to everyone who participated in this event, our speakers, all of the attendees who offer great questions and comments throughout the day as well as the organizing committee. We will be sharing the recordings as well as the slides from this event in a follow-up email. We also have the survey that I mentioned before if you can take a moment to fill that out as well. We also have our newsletter linked at the top of the community notes as well and you can subscribe to that if you'd like to hear about future events such as the next open science symposium. As Dean Webster mentioned we welcome partnership and collaboration so please don't be shy about reaching out and we will see you at the next symposium. Thank you.