 Maria Graviero from eLife. Thank you very much, Eric, for the introduction. I thank you very much for the invitation to speak at this event. It has been really exciting so far, and it has been great also to see eLife featured in so many of the talks. So thank you very much for all the mentions. So I was asked to talk about the work we're doing in eLife, to make the components of research accessible and discoverable. But just before I start, I would like to just say a few words about what we do. So eLife is a non-profit organization led by researchers for science and for scientists. We have the kind support of these four funding bodies. The AGG Mind, the Welcome Trust, the Maxplex Society, and the World War Foundation. In our first few years, we dedicated our efforts to establish eLife as an open access general for the life and biomedical sciences. But we also do a lot of work around innovation and technology, especially towards supporting the development of open source tools. So I will also be saying a few words about that more towards the end of my talk today. So before I talk about the work we're doing, I would like to just quickly go through the reasons why we should encourage open science behaviors, although I believe that all of you will confirm and agree with what I'll say. And then I'll go through the work we're doing. So I guess that there are various reasons why one should support open science and do all our best to make the research components accessible and discoverable. One reason being that obviously this is of great benefit to scientists and the whole community. Additionally, this helps advance science. It increases peer validation and the checking of scientific facts. And one other main motivation to make the research components accessible and discoverable. And when I speak about components of research, I'm thinking here obviously in the traditional research article, but also the underlying code, software, data, and materials, for example. So one of the other motivations to make all these components accessible and discoverable are, of course, vendor mandates. And this has been discussed already today. So just to give you, for example, some recent news. So as of last month, 13 funding bodies in Europe have mandated that research they publish not only needs to be made available in open access, but needs to be released under the pure or immediate open access model. So it will be really interesting to see how this shifts behaviors in European research. Additionally, there has been increasing encouragement from funding bodies to not only make publications available in open access and the most policy lessons, but also to make the components of research such as data and code available. And this is something that, for example, the Well-Contrast in the UK has also started encouraging so that when research is applied for funding, they do not only need to provide, for example, a data management plan, but they also need to provide a plan on how they will make the source and the data, and all the rest of the output is available after the publication. So now, just to say a few words about the work we're doing in eLife. So thinking of the kind of more traditional component of research, the article itself, as I mentioned, we are an open access journal. So all our content is published under either in the CC by attribution license or under public domain education. All content is available as ready at the point of publication with no embargoes. Because we are online only, we can take full advantage of the digital environment. So we do not impose any number of figures. One can display supplements for videos. And in the interest of discoverability, each one of these individual components can design their DOL, so they can decide to do their own right. Regarding data, it has already been mentioned today, the fair principle. So we also follow these guidelines. So data should be valuable, accessible, and true, interoperable, and visible, both for humans and machines. So with this in mind, when researchers submit a manuscript to eLife, we ask that all data sets associated with the real-life publication to be made available, unless, of course, there are strong reasons that may be also such as in the case of data or human subjects, for example. We also ask for the data to be clearly documented, and all the procedures that were used to collect and generate the data to be clearly outlined in the method section of the manuscript. We recently started asking authors to provide the data availability statement. So this is something that all the journals do as well, such as, for example, Clause 114,000. And this is kind of a prompt or an encouragement for authors to disclose which data they have collected and where it can be found. The data availability statements, as well as any consulted or newly generated data sets, are that as such in the XML, which encourages interoperability with, for example, data aggregators. And the data sets are also cited in the references list. In terms of code, there's another component for research. We ask authors to also provide the analysis scripts or custom code they have developed. They can do so by uploading the source code files with the submission. But we mainly encourage authors to deposit the code to a dedicated software platform. So for example, GitHub, EPLAP, or all the version of the computer platform. Also, as of like a year or so, we've started forging authors' codes when they have deposited to go to a database repository. And this allows us to keep kind of a version of the code as it was at the point of publication, while the authors can still carry on and develop their own work using their own GitHub repository. And also, then a link to the authors' own repository, as well as the commit number and the location is included as a full reference in the references list. In terms of materials and resources, which we can consider as another important component of scholarly research, obviously, as most publishers, we ask authors to document in the article as the materials and resources that were collected and indicate how they can be provided to other academic researchers, in case they want to replicate the procedures for analyzing the code and reuse the materials. Whenever possible, this should be deposited to dedicated repositories and as for the community standards. And as of similarly to what other publishers do, as well as following, for example, the encouragement of funding bodies like the NIH, we encourage authors to use these research resource identifiers or RRIP. So they are unique searchable identifiers, just like, for example, we've got the U.S. for publications. And these are for, for example, anti-global resources and study opinions for the course of research. Still in mind of the availability and discoverability of materials and resources, we ask our authors when they reach the stage of revised submission to complete these key resources stable, where authors can outline the main and critical materials that they've used within the scope of their research. So you can see an example on your right. So these will usually accommodate information about streams, software databases, and whenever possible, authors should also provide the RRIP if there is one available. And this does not only promote discoverability, but also we think that it's a good encouragement towards reaching standardization in terms of reporting. Another component I would like to mention today are protocols. So most of you may be familiar with Bioprotocol, the Peer Review Journal. So we encourage submission to Bioprotocol at any point before or after acceptance. So for example, on the two figures on your right, you can see an example of a new life paper that then had two step-by-step protocols published in Bioprotocol, and then to recognize this, we issued a new version of the article with the DOIs for the publications and also added two public annotations that you can see on the image on the top. And on the Bioprotocol side of things, they also included a link to the ULIF publication towards, with the objective of increasing discoverability of the publication that was associated with the protocol. If you've deposited or written a science method using protocol.io, we will also encourage that you include the DOI in the method section if you submit a manuscript to me. And I thought I would take longer. That was really quick. Finally, I would just like to say a few words about projects we currently have underway, and that kind of relates to a lot of things that have already been mentioned today towards encouraging the use of reproducible documents. So we're actually working with Coltain in these products as well, and we have conversations underway. So the idea is that we would be able to develop a reproducible document that would be supported by the publishing of the structure. So what often happens with users that are already familiar, for example, with Jupiter models, or other kind of computational or reproducible documents, is that when you reach the publishing infrastructure or when you go to a journal to submit, then you need to provide a flattened PDF. So we've spent a lot of time creating this enriched manuscript, but when you reach the submission point, you can no longer kind of provide all that beautiful document you've worked on. So with the reproducible documents stack, our aim is to bring the components of research together. So the article, the code, and the data mainly encapsulate everything for the authors and also for readers, and that would be able to go beyond the usual static view of the research article so that readers will be able to play with the code and data, and also that this would not only be something useful for people or my family with notebooks, but also those that prefer the Excel and more the environments. So in specific, this entails kind of three lines of work. So we are working on an authoring platform. Obviously there are already a lot of authoring platforms that use reproducible documents in a web kind of browser interface, but we needed to develop a new one so that the publishing infrastructure can then support these formats. So the publishing infrastructure uses JaxxML, and we needed to provide something in the same format so that all my journals could then process the article. So the image on your right is an example of our authoring tool. So that's something that's working the way that we're getting user testing and feedback on at the moment. The second line of work is the creation of a container that would be able to accommodate a set of multiple digital documents, so the article, the code, and the data. And finally, as I mentioned, we are working on creating the infrastructure so that the paper tends to be submitted in this format from submission until publication. So this could be particularly helpful as well for reviewers so that when you have an article and a peer review, the reviewer will be able to play with the data and go then manipulate it as they see fit. So if you're interested in following the project, you can go to that website and subscribe. And also if you're working on an open source tool yourself, feel free to get in touch with me at the end of the session or to go to this website, as we'd be really interested in hearing your ideas and see if we could provide any support for your project. And that was me. Thank you very much. Thank you. By a situation where manuscript is accepted at the point that it's sent out for review and then the author is able to decide whether or not they want it published, that's currently being provided to feedback is on that. Yeah, yeah, so we introduced this peer review trial whereby, so this was something, an idea that we developed after a workshop organized by Aizid Bail. And this was an idea discussed with the fellows at the Aizid JMI. So the idea as you call it when we outlined was that we would invite authors to submit and if the paper was deemed suitable to go through in that peer review, then we would be kind of committing to publish the article with the editor's decision letter and the peer review reports. So in the event that there were like big flaws outlined during the review process, the authors would still have the option to withdraw or they can decide how to address the review's concerns. And because all these correspondence is published with the article, the reader will have the chance to see how the authors decided to address the concerns. So we've received 305 submissions, so the pilot we intended to be for 300 papers, there are still a few, while most of them are still under review, we've published four articles so far and we should be publishing the data and some of the results mainly about the initial submission or the three-year stage in the next few weeks. So that kind of, well, our readers know, for example, what was the difference in terms of encouragement rate for peer review comparing to the traditional papers that go through the traditional workflow. So this is data that should be ready for all of you to read it shortly. Will you decide whether or not that program will continue? We haven't thought about that yet, but something that we may wish to do again because we may not be able to have enough data to kind of draw substantial conclusions on, but we'll see. Yeah. So I'm wondering about this reproducible publication container. First, infrastructure and second, sort of the use cases for this. I mean, is it just, yeah, are you just putting the text into like a garden container or is there something expanded from there? Yeah, so the idea is that, sorry, I realized that I didn't mention a few things that I wanted you. So the idea is that this could be a collaborative tool and you would have a text kind of visualization similar to what you can get with Word. But, and you would be able, for example, if you've written your work using a Jupyter Notebook or a Markdown, we would have the infrastructure to support the conversion of those formats to this container that would then be needed for submit a paper tool to an article and we'll be supporting our MetLab. So the idea is that this would be a platform agnostic to the tools or methodologies that we've used and that could be ideally then also used by other publishers. So this would be entirely open source. So obviously it's a bold and ambitious project and we're working with various partners. So the idea I didn't actually mention substance and see a lot of the main companies were working on this and we have other people as I mentioned on the table also in the discussions. Hi, I think your life is fantastic. We, yeah, we published and I thought it was really great how you guys really interact with the people who submitted. My question is the following now that you're saying that we'll be more forward to all the data and all the code, what is the burden on the reviewer? Are you expecting the reviewers to analyze the data? I mean, that's all. We know that that doesn't happen and I think that's just how science, and kind of it's how things are done at the moment across the publishing ecosystem. So it's not in life specific. But I think that with these documents that would be able to see the code that's underlying the plot or a figure and be able to manipulate it, that actually that may make the reviewers kind of work less complicated. So we know that there are a lot of really keen reviewers that actually download the data and go and process it and then feedback. But that's just a minority. So I think that with these physical documents, reviews may be more inclined or more willing to review the code and the data. I know that many journals are low to sort of look at impact factors. I think E-Life is one of those as well. So I'm curious what you guys think of as sort of better alternatives of evaluating the success of articles and what you guys use as criteria for getting better citations or visitation of the website or what you guys use. So in each one of the live articles you can see the number of downloads or end page views that I've received. And for example, we also encourage the use of other alternative metrics, such as the ones that you can see with all the metrics that have been mentioned before. So references in media outlets, Twitter, that's the type of recognition. Yeah, feel free to do that. So the all metrics, I feel like it's like that certainly seems to be at this point a little bit contaminated by all these bots that kind of like send tweets out and whatnot. So I'm curious if there's like any other like more rigorous criteria or if it's just downloads in all metrics or... So actually, so we do rely a lot on the editors themselves to make the judgment as to whether the paper was good enough to be published. And we believe that's kind of, enough recognition and it's better than, for example, in a G-index or in my practice. It was, I think, one early this morning that he said G-index is actually quite low. So obviously that's this metric as well as impact. There's not clearly kind of reflect all the other work you do in terms of reviewing. Also like for example, if you're an advocate of open science, you may be involved in a lot of open science initiatives and that should also be taken into account. For example, when you apply for funding or tenure. So yeah, I think it's a change that we probably or I don't see in the next queues not only in publishing industry but also in terms of funding and recognition. Great, do you have any more questions for Maria? Okay, and also Maria mentioned protocols IO and they're also here and they'll be having a session tomorrow. So if anybody's interested in learning more about that platform, please feel free to join us. Next, I'd like to introduce Gabriel Gascay from POSPIO. Because I'm the second to the last, I was told that I can take an hour. So I would like to thank the organizers for inviting me to speak at this really great symposium that has been not only very interesting but also invigorating. I was asked to give an overview of some of the initiatives that PLOSP has either triggered or embraced to support and promote open access and open science in general. And I will start with a little bit of our roots in history. Probably many of you are familiar with our role as a non-profit open access publisher, but we actually started as a advocacy organization and we see ourselves as innovators. PLOSP was founded in 2001, that is many, many years after the formation of the universe and the appearance of life on earth. And in an open letter, we advocated for the establishment of an online library that will provide unrestricted access to the published scientific record because we honestly believe that it belongs to the public. And it wasn't until 2003 that PLOSP became a publisher and launched its first journal, PLOSP Biology. Today, we publish now seven journals covering mostly biology but also medicine, physics, chemistry, engineering and social sciences. So at PLOSP, we're constantly revisiting the concept of open and see how we can expand its boundaries. So we're starting by advocating and promoting free and unrestricted access to published research. But we have also taken steps to grant and promote access to rural data, to redefine the way that scientific contributions are assessed, including increasing transparency in peer review and overall, we work to open up scientific communication to make it faster and more efficient, useful and interconnected. So in the next few slides, I'm gonna describe some of the work that we have done to achieve this goal starting with open data. So there is evidence that data availability from the authors declines by about 17% per year since publication, indicating that authors themselves are not great stewards of data. Conversely, there is also evidence that studies that make their underlying data available are more impactful. So what this graph shows is that for papers reporting microwave data, the number of citations are greater if the underlying data is available. So starting March of 2014, PLOS established a policy that required that all of our papers should provide the underlying data within the legal and ethical restrictions via a third-party repository. And during submission, our authors must provide a data availability statement that is published together with a paper and provide links to the data repositories. As of today, the PLOS journals have together published more than 100,000 papers with a data statement and have only rejected less than 0.1% of our submissions due to the authors' unwillingness or inability to provide data. As I briefly mentioned before, we see many long-term benefits on data sharing via a third-party repository. In addition of reducing the burden on the author, it facilitates or even invites independent validation and re-analysis which in addition of bolstering reproducibility, it adds value to the research output. We also believe that authors should be given additional credit for sharing useful data. Even though I feel that I'm preaching to the converter, in this slide, I'm going to show you a cute example of how sharing data is a force of good. So for every paper that we publish, we do a promotion campaign that includes Tweety. We tweeted about this article and somebody counter-tweeted saying, like, this article seems cool, but please do not use this graph for promoting it because it sucks. So we challenged this person to come up with a better way of displaying the data since they were already available. So he came back with three alternatives and you might agree or disagree on the relative improvement, but the point is that once the data is available, it can be freely reused, re-analyzed, and re-plotted for scientific or teaching purposes. We have also wondered in clause how can we redefine the way that scientific contributions can be assessed while at the same time take a stance on reproducibility. And we know, we all know how science, how competitive science can be, particularly in certain hot topics, and how timing it can be when you get it scooped. So therefore, in considering this and the value of independent replication, the second year, clause biology specifically implemented a policy that we call complementary research policy that allows authors to submit a scooped paper within six months of publication of their competitors' work. And that previous paper won't be considered detrimental during our editorial assessment. This paper published in genome research at the beginning of this year shows first evidence that in vitro-syntetized CRISPR guide RNAs can trigger an RNA sensing in 18 new respondents' human cells. One day after this paper came live online, we received this submission that reached the same conclusions. We sent it for full peer review and it got published on July of this year together with a perspective written by the senior authors of both publications in the value of independent replication. Since our policy was implemented, we have received ten complementary research articles, two of which have been published, four are under active consideration and four have been rejected. We have also embraced initiatives that other journals already roll out to increase transparency in peer review by publishing their reviewed comments, the authors' rebuttals and all their decision letters. We believe that by making fully available the peer review process, we will increase reviewer and editorial accountability, provide a training opportunity for broader students and postdocs about the peer review process and enhance the reader's understanding of the study in the current context of the field and also create a pathway for predicting peer review, which currently doesn't happen. In the last slide I will mention a partnership that PLOS has established with the Color Spring Harbor Laboratory and their preprint server for the Biological Fields bioarchive and as a result of this partnership any manuscript that is submitted for preprint posting in bioarchive can be directly directed or sent to PLOS for formal peer review and other journals do that if they wish to select a PLOS journal. Conversely a manuscript that is first submitted to PLOS any PLOS journal for peer review will be offered the opportunity to preprint posting in bioarchive and it will be the PLOS staff personnel, the one who will do the initial screening to determine suitability of the manuscript or preprint posting. The editors will also use the comments delivered to the preprint server as part of editorial assessment and authors can opt in or opt out. With that I think that I made a case that PLOS has significantly contributed to open science by granted unrestricted access to the science that we publish together with the associated data by redefining the way scientific contribution is assessed embracing transparency in peer review and enhancing or speeding dissemination of the scientific output without compromising peer assessment. And with that I will take some questions. You have probably 40 minutes. I'm curious if at PLOS or if people are aware other places if there are policies or procedures or programs to help like junior scientists who potentially can't pay, like the people I know who are most excited about open access are also often early career scientists who don't have the ability to write in funding for open fees into their grants. And I know that there's often policies for people publishing from lower middle-income countries and things like that. I guess I'm curious if there's any discussion about ways to incentivize junior researchers who maybe don't have the fees to be able to get their foot in the door with PLOS or other open access. Yes and no. So as you mentioned before PLOS has three tiers. I'm an editor so I work directly with the manuscript and I'm blind to all the financial procedures so I never assess a paper based on the financial situation of the authors. That being said I know that I work as peers and depending where the senior author of the paper comes from the APC which stands for author publication fee might be waived. That doesn't apply to any author in the States. Now we don't have a short answer on how to solve that issue but we're actually looking on a radical way in which open access can thrive and it's not charging the authors directly but considering or dissemination of the scientific output as part of the scientific endeavor. So we ideally or we believe that paying for distribution should come from either the funding agencies or the universities and this idea is not originally from PLOS. It's been circulated but that's where we are headed. So we don't have a short answer for your concern but I think in the long term the way that open access is financed is going to change and it won't depend on the specific project of each individual researcher. That's a really excellent point about APCs and I'm actually going to take this opportunity to plug CMU libraries. If you are at CMU we actually have a fund to help researchers pay APCs so if you don't know about that you can just come talk to one of us and we can help you out with that. Any other questions for Gabriel? Thank you Gabriel. You gave such a great talk that you answered all of my questions it seems. Can you explain a little bit what I know we've talked about this before. How the review process occurs so you send something to BioArchive. How does the review process there and how do you take in that review process? To BioArchive you mean? Both steps from BioArchive and then to Pielsen. If you post in BioArchive they offer you the opportunity to directly submit to a collection of bio journals including all of the journals in plus. And BioArchive doesn't provide a formal peer review. Actually they do not post as far as I know the final version. Somebody this morning was saying that you can version. If your paper has been revised you can update a newer version in BioArchive but they do not allow posting of the final published version. So they are exclusively a pre-publication pre-print server. They don't provide but they have a comment section where your peers can freely go and leave comments. So they offer you the opportunity to transfer directly to many journals and if you do that, it depends on the journals whether they will take into consideration or not the comments that have been left in BioArchive. We do. We go to BioArchive and read what people have left mostly useless but we still do it. If you submit to us first then you will be asked whether you want us to post your study in BioArchive. And it's up to you to say yes or no. If you say yes then we will screen to know that if you do a human study, there are no identities that if you do an ecology study you are not reporting GPI locations of GPS locations of like endangered species and something like that and then we will post it on your behalf and during peer review we will be checking the pre-print to see if anybody has left comments. Am I answering your question? I'm just wondering for those reports that nobody gave any review. Then you go and find reviewers. We actually, we don't substitute whatever is left in BioArchive for selecting our own. Your paper will still be formally peer review by us. We review that we select in addition to whatever is in BioArchive. I'm just curious as far as the data availability statements that are in there especially for like class medicine where there's a statement that Sharon can't happen because it's human data or things like that but oftentimes there is an email contact put there that says I'll reach out to this individual. Do do authors sign that they will keep that up to date or do you guys check those often or how how is that? So yeah, we as was mentioned before this morning like privacy beats everything. So if there are legal or ethical restrictions and not sharing the data we allow the authors to keep their data themselves and be contacted. There is a commitment a verbal or a written commitment that the authors will share the data but we don't have the power of enforcing and we don't do any follow up. If, however, and this should apply for any journal only plus if you contact an author and it's unwilling or unable to share the data you can write to the editor and then the editor it's part of our job. We have to follow up. And not only with data, with regions if you want to use a published mouse or Plasme and the author is unwilling to share you should reach to the editor and complain and it is the editor's duty to go to the author but we cannot enforce you know can shame but that's as much as we can do. That being said, I think that BMC, are you familiar with that publisher? They retracted papers because the author didn't share the code but the paper wasn't flawed. Any more questions for Gabriel? We'll move on to our next speaker then. Our next speaker will be Reinhard Schubacher from Carnegie Mellon's Department of Physics. Next for the invitation to speak for a few minutes about the archive I'm an experimental particle physicist and I have a user of the archive but I don't have anything to do with the organization per se but I've been using it pretty much since it was established so let's start with the origins and evolution of the archive. These days it's called archive.org and it is an openly accessible moderated repository for scholarly preprints or called e-prints in numerous disciplines. It was started in August of 1991 by a physicist Paul Ginsburg when he was at Los Alamos National Laboratory. He saw a need for bringing the distribution of physics preprints into the internet age and so in 1991 that was a few years after the internet was really established and first web browsers were floating around and so he came up with this idea and he just set it up himself. This thing has a 24 hour a day submission and announcement cycle so this is something that people in my field certainly look at every single day and see what's new. They strived to be a permanent archive of preprints and it's totally free for the user and it's available worldwide. For its first years it was hosted at Los Alamos National Laboratory and it was called xxx.lantel.gov and it was completely these death heads here as part of their little logo. In 2001 Ginspart moved to Cornell and he took the archive with him and since a lot of people were upset about skulls nowadays the symbol is a smiley face. Say something now about the funding mechanism so it's free for the users but last year they had a budget of just about $1 million and that money came from the Simons Foundation and from Cornell University Library and there are 206 member organizations that organizations pay anywhere from about $1,000 to about $4,600 a year to be part of that group. Don't ask me what the member organizations are with the usual suspects I assume large universities and so on but anybody can use it so it started out in physics in many branches I'll show you that in a minute but very quickly it was adopted in mathematics, computer science more recently in quantitative biology quantitative finance, statistics electrical engineering and system science and very recently some economists have started depositing their preprints there as well how is it organized it's really run by Cornell University there's a scientific advisory board that has 13 members on it I don't mean to say they're all at Cornell but it's centrally organized from there each of the subject areas has a subject advisory committee for example there are 10 of them in physics and Paul Ginspar the originator of the whole thing is still a member of this group and also of this group here but the archive policy decisions they're ultimately made by the Cornell University Library group found out recently that they're actually moving this is some internal thing at Cornell they're moving from the library to something called the Cornell computing and information science area so that's supposed to be seamless though for the users so when you go to the archive in physics my field for example all these bullet points here are subfields of physics astrophysics condensed matter physics general relativity high energy physics experiment lattice phenomenology theory mathematical physics nonlinear sciences nuclear experiment nuclear theory everything else in physics okay and then off the page here is the next one which is mathematics I highlighted one here so if I go into here and I want to know something about my field I energy physics experiment and I just click on it and I find out every day what the new submissions are in the last 24 hours and I highlighted one of them here from last Friday because hey I submitted a paper last Friday with my graduate student and so what you get is a title a title lead authors a one line comment that says what the thing is about I mean in this case I was from an invited talk at a conference this past summer and then the abstract and so it goes so now if you're interested in that title an abstract you can click on this and the actual pdf file will pop up on your screen so it's extremely easy to use if you click on an author name then you get a list of everything that that author has ever had is her name on the archive and so you can check the pedigree of people submitting if you want in addition to navigating to this point you can also set it up in the archive so that they actually send you email every morning with the titles and abstracts of all the things from the last 24 hours and so you don't have to go there you can just pop into your mailbox every day and you can keep up to date or try to now the way the archive gets used at least in physics I heard in our last talk that maybe in biology it works a little bit differently so in physics if you post a result to the archive that determines your scientific priority you do not have to get the paper published in the archival journal to beat the competition that's different from the way it was years ago 30 years ago it was whoever got the archival publication into the archival journal published first that was scientific priority but nowadays everybody uses the archive and even though it's not final or official if you put it there first it's your baby I mentioned that there are the daily postings that anybody can get now another interesting thing is that the archive is moderated but not peer reviewed that means that for each subject area there is a group of people and they're pretty much anonymous I don't know who they are for physics even who actually look at each preprint that comes in and decides whether or not it's suitable for the particular subheading that it was submitted to and so the moderators can reclassify a preprint or they can reject it outright if it's just some nonsense from some crackpot you can imagine that leads to a little bit of tension sometimes but this is not again it's not peer reviewed they're just looking to see if the thing fits sort of their format and is on topic and so that's what they view their task as being in order for something to appear on the archive so not peer reviewed but it's amazingly clean too in the sense that you don't find much nonsense on there even though it's wide open it's almost wide open in 2004 they also established a thing where they have the notion of endorsed submitters they started saying that anybody who submits to the archive has to be endorsed by somebody else who is already endorsed so it's like a giant chain of mutual endorsements this was viewed initially as maybe a problem for young scientists who don't have an established reputation and they want to get their first paper in there but this I would say is sort of minimally enforced as far as I can tell if you submit a paper in a recognized university if you're from Carnegie Mellon and it's your first paper they're not going to reject it so certainly in physics and I expect it's true pretty much across the board and we've heard in the last talk of it about this in physics the archival journals like physical review and European Journal of Physics will accept manuscripts first posted on the archive for real peer review and for publication in their journals there are a few exceptions to that I think nature and science they have rather strict rules about having a pre-release of the results but certainly all the American Physical Society journals will let you do that submission format is usually latex or you can do it directly with a PDF in terms of copyrights the copyright is usually retained by the authors but the archive makes you at least say give them what they call an exclusive irrevocable license to distribute preprints there are some other types of rights things you can choose creative comments and so on usually you just give them their minimum rights there also no citation or download statistics are kept on the archive so you cannot go back weeks or months later and see how many people in the world have downloaded your paper that's sort of as an author oftentimes you'd like to know ok who's reading my stuff or how many people have read my paper the archive does not offer that it's just the way it is so last couple of things this shows how the growth has gone on the left side here this is 1991 over here this is 2017 the blue as an example that's high energy physics the branch where Ginspar was trying to solve a problem and he evidently did but you can see that it's plateaued and it's been plateaued for a long time then condensed matter physics came along there the green the red that's astrophysics phenomenology this light blue one is pretty much all the rest of physics nuclear physics general physics and so on the purple is math whatever this color is that's computer science and these tiny slivers are newer things including biology here but I'm sure they'll grow as well on the right hand side this is just the fractional usage so again the blue that's high energy physics so it's more or less in submissions their fraction of the entire archive world has just dropped as other people have joined in and this is one more then so within physics there are these subdivisions high energy theory, high energy phenomenology lattice experiment the fractions of the total of that one more example is for astrophysics it used to be from 1993 to 2009 astrophysics was just one big listing and in that year they switched and they started having all these subheadings don't ask me to interpret these descriptive things but you know they just switched and they split it off into a bunch of different things that are more of a topical interest ok so in summary the archive is a free permanent repository of scholarly preprints it's a moderated thing it's not peer reviewed postings are it's taken very seriously in the community and the growth is 1991 it continues and even now new fields are joining this system preprint distributions so thank you do you have any questions for Ryan? I have more of a comment actually in contrast to for example the bio archive doesn't deal with I's it costs money and the archive decided not to deal with I's one of the issues that raises for me personally is that I can't report I'm allowed to report on my NIH progress reports I'm allowed to report preprints but only if I can provide it to you a lot so that defeats some of the preprints yeah I mean there's some physicists who ensue the entire publication circus right they publish their stuff on the archive and they say that's it I put my stuff out into the world I don't need it to be published in a real journal right but then if you're a young scientist or a support from a funding agency they've got to agree with that argument and they may not yeah your comment makes me think too at least the physics archive does not accept comments our previous talk mentioned that in biology you can put comments on the archive that's absolutely not done for any of the physics fields really just the repository for the preprints themselves you can go back and change your metadata so if you get a DOI and if you want to put it in there by hand into your listing you can do it but that comes when the paper appears in an archive or journal we have any other questions one last random questions as we get situated I'll start it off with a question for our first two speakers as someone pointed out earlier open has a lot of ramifications and it is sometimes poorly understood or applied but one thing about open science and open publishing is open reviewing and there's been some real advances in eLife and us journals and others have really pushed on one side where do you see that advancing to and also I know there's plenty of pushback that says as a junior investigator or as a woman investigator or minority that because my identity might be known to the other reviewers they are not as forthcoming so what are some of the problems where some of the advances that might see in the future of really improving peer review process I mean I think that there's also multiple understandings of what open peer review means I think that eLife has been really innovative in that sense I don't know if you're familiar she's better placed to explain but they do a collaborative peer review in which I think correct me if I'm wrong but every reviewer reads the manuscript sends comments and there's an editor that brings together their reviewers and they have a live conversation and try to reach a single joint review and as far as I know it's the only journal that does it we don't do it that's a form of open peer review another form will be forcing the reviewers to reveal their identities to the other the version that we hope to release in the second quarter of 2019 is publishing which is something that eLife also does there are reviewer comments but there is no dialogue about that we just collect it my job as an editor is try to synthesize them adjudicate and write a decision that's my job then I release their comments I release my decision and I will publish the author's review so she can elaborate more on how it works in eLife but at least in this version the junior we will not ask it will be optional for the reviewer to sign their comments we will publish the content but not the identity and we don't think that that will inhibit the reviewer or jeopardize their position and identity again will be optional so in that sense I think they will be safe and it will be on the authors to decide at the end of the whole process whether they want the reviews to be published or not so the up in will be at the very final as Gabriel mentioned in eLife we have this consultative purity process and this goes kind of in line with one of the points you made so at this point all the reviews can see each other's reviews and they can participate in a discussion by the eLife editor and actually we run a webinar for early review researches last week and we sent a survey in advance kind of to get a feeling of how their experience had been if they have read a paper for eLife and in most cases they indicated that they put a lot of effort in writing the review but then when it came to having that discussion some of them were really honest in saying oh obviously I felt a bit intimidated by seeing a lot of senior people in that conversation and that I didn't feel as comfortable making the stunts I would like to because of the seniority of the other people involved but I think that this shouldn't be an argument against open peer review, obviously there's something we are aware of so actually I was just discussing this at lunchtime with one of our writers who's based here at Carnegie Mellon and he was saying that he had a similar case in a recent manuscript he handled and what he thought he would do would be to just have a separate conversation with a junior researcher encouraging him to make the point he wished he made and kind of just give that kind of that security that he would also support his like the stunts he wanted to make and just kind of to collaborate with something Gabriel Losh also mentioned there are various descriptions of what open peer review could be and one really innovative peer review kind of approach to open peer review is also their 4,000 model of post publication peer review and I think that we are all aware that open peer review is here and I think we all agree that's probably the way to move things forward and it will be interesting to see how each publisher kind of decided to take its own approach. Just to round up, I think that we are all very aware, we don't want to compromise anybody, we don't want to hurt the authors or hurt their reviewers so we do have these discussions and try to make steps for one kind of. So a few weeks ago at the National Library we ran this workshop with a bunch of data scientists and we tried to reproduce about 12 papers and we learned a lot of interesting things but one of the things we learned was abundantly clear is that none of the reviewers had ever tried to run the code in any of the papers. So do you guys have any experience or ideas about how to get reviewers to test code that is submitted in computational papers? Yes, so actually I think that someone in the audience has asked me a similar question so I think obviously this is something that we unfortunately we cannot enforce, we can encourage and for example check this may be a good way to encourage reviews to do that or for example for each more kind of computational oriented paper to make sure that the journal tries to recruit someone with the right expertise to run the code and see actually the statements that are made in the paper comply with the data that was to process it and there are tools for example like Code Ocean I appreciate that it will be here tomorrow that could help facilitate the review process in that sense. I honestly don't know how much reviewers are familiar with using Code Ocean but I think it's a great tool for that purpose. I'm not sure if it is 1,000 but I appreciate there is at least one journal or more already using Code Ocean I'm not sure if it is integrated in the review process or there are conversations about that. To note everyone I'm Keith Webster, Dean of Library I'm the guy that Rebecca introduced as the library name that doesn't like books you can discern from that what you wish and I was surprised that I was using slides I didn't plan to use slides but there were a couple of pretenders this afternoon who seemed to think that they could squeeze more slides per minute into a presentation than I can and I didn't want the opportunity to compete go by that said I am conscious that I stand between you and speed dating and at a time when Carnegie Mellon is about to release his consensual relations policy I don't want to get in the way or get caught up in that at all. When I looked at the delegates list I realized there were a bunch of people from Carnegie Mellon some of whom probably spend their lives in labs in this building there were a bunch of people from Pitt and you may never have been on the main campus of Carnegie Mellon there it is just by way of orientation for those of you from out of town we are currently in a building roundabout here Pitt and UKMC and around there massive bit in the foreground is Carnegie Mellon those of you who don't know much about the University may not know that it was founded in 1900 by Andrew Carnegie in 1967 we merged or the Carnegie Institute merged with the Mellon Institute based in this building so a lot of Scottish heritage both on the Carnegie and the Mellon side a lot of Scottish symbols on campus this is outside my office quite seriously practice bagpipes outside my office one of the curious things about this whole Scottish thing is there's a clause in the faculty handbook that nobody really has grasped apart from me that anybody from the faculty speaking in public has to do so for the Scottish accent I've been practicing for five years I'm really from Wisconsin but I'm we're in this building as you've probably worked out having come in this morning again those of you not from CMU may not appreciate but quite seriously this building was Gotham City and Holland the Batman movies so there you are and that's a CMU writer I think playing Batman that's the open sciences what we're here talking about today it really has become significant over the last few years for me being a Brit the real marker that something is important is when the Royal Society actually publishes a report saying this is a thing because we've kind of been around for a few hundred years and they're slow to grasp things but really just looking at the number of reports coming out from the National Academies or at least CD and others really shows that policy makers have grasped what all of you have been doing in your daily jobs for years and I think there's a recognition that those who pay for research expect that they will have access to the products of the research they've found there's a growing expectation as we've heard several times today around reproducibility and accountability partly to minimise errors partly to tackle the nasty questions around scientific misconduct partly to have fun with all of your hacking and coding and also I think in society at large we recognise that the internet has democratised many aspects of our lives and it's not at all unreasonable to expect science to be just as visible just as accessible and it's also important for researchers particularly at the beginning of their careers that they can take open science approaches and increase their visibility and build their reputation one of the products of open science has been a whole host of open science tools you can begin to map these out and pick your pathway for particular approaches you can spend hours having fun like this and you can do so by your speed dating later on but open science I was just sitting here today I don't often spend a lot of time in this library it's usually in and out but as I was looking around it made me remember that science has always been open researchers from the earliest days of journals 350 years ago have wanted their work to be known to shape science to build reputations and really something made the point earlier of citations and impact factors and I think it was really the arrival of the impact factor which was an unexpected product of Jean Garfield's citation index that changed things because all of a sudden journals that societies around the world had published to foster open communication amongst scientists between members became commercial commodities the commercial publishers were able when the impact factor started to be published in the 60s they were able to rank journals whether you believe in the impact factor whether you're trying to game it and not getting into that debate but quite simply the impact factor introduced some form of ranking it's like a fantasy journal league and that turned journals into commodities publishers realised that they could buy the best ones give the societies a few thousand dollars they've done a great deal and then squeezed billions out of libraries in the 30, 40 years since and that has really transformed how people communicate and if you look this is a pos article talking about all of the publishing the critical thing here is the red line so back in 1970 in the 70s about 80% of articles published in natural medicine sciences went to publishers other than the big four or five look at it now it's down in 2015 or thereabouts to about 45% and a similar pattern in the social sciences so it's just one of the things that has really squeezed openness because everything's behind a pay wall here at Carnegie Mellon we spend about 7 million dollars a year on journal subscriptions buying back the stuff that you've published perhaps the one good thing to have come out of this is a recognition from the academy that this is unfair and that we need to do things differently but again thinking as I looked at the collections and the stacks here it made me realise over the course of the day that really from my side of the business what we're talking about is in part the scholarly record and what I really think about in the context of the discipline is what matters and what is useful to researchers and scholars now to an extent what constitutes the record what designates the publications that form the scholarly record has been one of the library's primary functions to an extent what has been what we have added to our collections that make a difference of course we listen to scholars and we act upon what scholars tell us but at the end of the day it's been what us librarians have done that frankly has mattered but that's a dirty secret that we don't talk about too often but the scholarly record has shifted it's gone away from what you see around you in this building to something that is now almost exclusively made up of digital objects and digital documents we used to focus on the outcomes in the middle of the books at the end of project reports but now we look at the products of the research process the protocols and methods the community conversations the evidence that is gathered and after the research has been concluded the community discussion the revisions the reuse of data code publications and so on and we're seeing that expanding because digital documents digital objects are cheap and easy to replicate and deliver but we're also beginning to see the use of machines in analyzing and extending the scholarly record one of the challenges for people in my business now is how we try and help the scholarly community capture and curate what's important not fuss about what's not important and navigate what it is that we curate that's another presentation two minutes left I was going to thank you at the end okay, I'm sorry I'm going to speed up so the objects that document the research process are increasingly openly visible and there's a huge amount of stuff to capture and curate storing valuable research data is an important endeavor but it's not going to be the librarians who succeed in this venture because we will not do that better than the high performance computer centers we will collaborate with them and get on with it so that takes us to the event where the CMU library and the Malin College of Science collaborated to host this important event today I want to share a few broader factions which is why I was asked to do but I couldn't resist some personal reflections first but these are not comprehensive there's just a few highlights of things that struck me I'd be like Lucy in Anisha's video I'll grab everything and spit it back to you so some thoughts academic tribes and territories has been one of the recurring themes of today as I think about the key players their funders have come up repeatedly they can encourage even force change in our behaviors if a founder says we will give you a grant but a trade off of that is you make your data open access journal or whatever else it is they can drive a lot of change we see huge partnership between researchers and funders particularly in Europe kind of less so in this country I'm still navigating the US political landscape coming to that institutions need to think about the reward processes what does it mean to get promotion tenure and so on and is it time to give up on the national expectations of markers of scholarly achievement and in disciplines we also see some drivers around pre-prints data archives and such things we also heard about citizen science which kind of sits outside the academic tribes and territories but there's an increasing alliance between those in the academy and those outside who are interested and able to contribute to what we do I heard also a lot about workflows and tools my big takeaway I said to Anna that I need to go take a software carpentry course because I've been promised a better job another big point that we need to hammer to everyone is shading data is not particularly useful it's reusing data grabbing data by pointing and clicking doesn't equal reproducibility we need to build our websites that was the shoes was another takeaway a lot also about observatories and that's a really important point that I haven't reflected much on before today I often talk about the fourth industrial revolution in the context of libraries and open science we also heard about the four research paradigms and data-driven discovery being the fourth paradigm some interesting questions around loans, data code research outputs when, if ever, do they expire we need to push for open file formats we heard also about reproducibility I like the fixture tagline as open as possible as closed as necessary the central motivation for the scientific method is to root out error how does open science help that happen Victoria presented some interesting reproducibility enhancement principles how do we have a conversation about those we've heard also about open access some of my colleagues get excited about the open access movement I hate to dispel that myth and actually point out that it's a business model not a movement and simply it's a shift in the burden of cost where we don't charge leaders or libraries for the first clocking infrastructure costs we've heard about the advantages of open access to researchers funders and the rest of the academic tribe and territory I also struck this morning by the importance of open source both in terms of programs and the hardware and how that makes science affordable for those who don't have national labs on their doorsteps we also heard a nod towards Plan S the Eurokian Union's attempt to conquer the world of publishing quite exciting this week that the architect of Plan S is in Washington according to Nature he's going to White House and they're waiting for him already with great excitement to figure out how the US can do Plan S it ain't going to happen we've touched around the margins what are the impediments to open science the restrictions on data reuse how researchers want to make additional discoveries with their own data not shared with others there's some commercial impediments around the publishers that we're trying not to talk too much about the overhead in data curation how long does a researcher take to organize data assign metadata describe their data so that others can use it how can those of us on my side of the shop help with that process how do we make available useful forms of storage National Academies as you're probably aware released a report a few weeks ago Open Science by Design really worth reading I'm not going to critique or summarise it here but if you haven't read it the URL is there my quote of the day open science it's honorable open sciences strategic what would you like to do next this is a one off event where you now go and date and you go off and do whatever you want to do that's the end of it do you want another symposium like this maybe next year do we want hackathons any other sort of thumb what about software training software carpentry to get better jobs collaboration incubator events you know we'd love to hear from you find out who's interested in this we'll gather your feedback we want to know some of these things did you meet new people that you will collaborate with did you try out new tools whatever currently melon folks your library is here to help you forget about the painting things around here Rebecca was spot on we are trying to rebuild a library that can help you navigate the open science tools that are out there we've got experts many of them sitting around here and in the back of the room who can help you many of them come from your disciplines they will be delighted to come and make machines that go ping go ping in your labs and talk to you about your information activities that's our fixture instance for those who weren't aware of it a few thank yous from the university library team Melanie and I imagine Jessica for plan keeping and not waiting at the time that we open once Eric from the College of Science so all of our local organizers our sponsors I'm done thank you very much press a number of slides a couple of slides I have just a couple of logistics notes before we go speed dating and reception I don't really I need just one slide for a few thank yous well these are our sponsors which keep just thanks but I'll reiterate our thanks for them I want to thank all of you for attending too I know it's hard to get you know put aside what you're doing on a Thursday and come here to listen to this but I really hope this is starting to build more of a community I hope this today discussions after the reception and tomorrow's workshops inspires some new ideas and we would really love to hear what comes out of this and what you want to hear next those of you at CMU certainly get in touch with any of us will probably send you a survey to all of the attendees so be on the lookout for that I want to thank all the other support that four of us who organized the event have had from the libraries from the Mellon College our promotion and design team Shannon, Heidi, Gigi, Al our admin support in the libraries booked our guest hotel rooms and helped us with catering orders and all of that our donor relations and then Ben who's going to help you with the scientific speed dating in just one minute and as well as many others who jumped in so man our registration table and all their sort of odd jobs thank you very much for helping us make today and tomorrow possible at the end of the slides a couple of other brief announcements tomorrow's workshops they are in your program they are on the event website there's walk-in space available if any of you would like to attend just let us know we'd be glad to have you at noon to 2 tomorrow there's free lunch coupled with a fig share and tilt hub deposit a phone you can deposit your data on the fly or just talk to us about maybe what sort of data or code you might be interested in sharing and enjoy a sandwich with us and those events the lunch and the workshops will all be on the third floor of this building near where you came in at the Bellefield Street entrance not in this library tomorrow although feel free to come up if you'd like to just enjoy the space make sure you take a mug we have many take one for your children two for your children enjoy and reminisce back on the event we ordered 12 dozen the reception is next good news there's wine and beer as well as other drinks and snacks and an hors d'oeuvre spread during the reception then we'll lead us through the speed dating the reception runs until 6.30 but we encourage you to continue your discussions form informal groups go out to dinner especially for those of you who are from out of town we'd like to speak to those who are here from out of town graciously visiting us and those of us on the committee are happy to provide any restaurant recommendations so further ado Ben there are dating instructions this is really really simple we first ran this at a Great Lakes ISCV meeting if you're interested in running this at your own institution we actually published this in F1000 and it is indexed in PubMed so you can look up scientific speed dating but it's really simple so basically we will go and get in the beer line once the beer line is once everybody has a beer then we will there are 8 people who are presenting software that they built everybody will spend 3 minutes at a given table so they will set up the placards at various tables hear them pitch their software for 3 minutes you'll have 1 minute to ask questions and then we will rotate tables until each person that came to present their software has presented their software to more or less everybody does everybody understand are there any questions relatively simple everybody good of course I will tell you when we're starting yeah presenters people who are presenting software they know what they're doing get a pitch you will get your placards from these people alright great let's go have a beer