 Alright, thanks for sticking around until the end of the day. My name is Karthik. Can you hear me in the back? Yes. So I'm with the Berkeley Institute for Data Science, and I'm also involved with a lot of different open source projects. I lead a large open source project called R Open Sci. The thing I want to talk to you today is how to get credit for any research software that you might develop. So this is my first time at Fostem. I actually thought this was a pretty small conference and that I would have to convince you that software is important. I don't really have to do that, but I'm going to do it anyway. So the thing that's always striking to me as an academic is that software is so prevalent in every academic endeavor. It's not just heavily computational fields like astrophysics or bioinformatics, but even many of the humanities that you would think are not computational are computational these days. And it's hard for me to actually capture this better than, say, Guile Veracou, who is the lead for Scikit Learn, is that software helps us make predictions from models. It helps us run experiments. It helps us derive insight from data. And this is just universally true in a lot of different fields. And this is not just anecdotal. So I have some colleagues that have done surveys in the UK and the US to try and find out how much research software people use. And this is not just things like Microsoft Excel and Microsoft Word, but software designed for research. And it turns out that more than 90% of people rely on these types of software. And when asked what happened to their research if the software disappeared, more than 63% of them said they just have to stop doing what they're doing. So research software is really quite critical for a lot of different fields. And more and more I am seeing academics spend a lot of time taking their code and packaging that into software. You see everyone pulling together a package these days. And it's really interesting that currently the skills that we require to thrive in academia are not actually all that different from being in industry. And funnily enough, the person that said this, Jake VanderPlas, friend of mine, was an academic and now works at Google. But luckily he's still just doing whatever open source he used to do before. He's just getting paid a lot more money to do it. But the challenge with doing software in academia is that it's not considered research and it's very hard to get credit for academic software work. And there's a reason why we really don't know how to give credit to software, which is the biggest problem. Very simple things like we really don't even know how to cite software. So it's kind of all over the place. Sometimes people cite a paper. Sometimes people cite the entire language to reference a single package. And this is not just true of tiny journals, even big journals like Nature and Science. People just casually mention software. They don't quite cite it, which means that you will never be able to track down the actual version that anybody used if you're ever trying to think about reproducibility. And then there's a big consequence to all of this. If we are not going to get credit for our software, we're not going to do a very good job of it. It's not going to be sustainable. It's not going to be collaborative. And that sucks for everybody. So we need to really find a way to make software scholarship and give people credit for that. And that's what my talk is all about. And a handful of us decided we need to do something about this. And so when you think about software and citation, it's just a mess of different types of challenges. There are very few things that are technical. It's mostly cultural. We're not quite ready to do this. But for example, there's not an easy way to cite software because we haven't quite agreed upon writing papers about software or directly citing the software itself. Software citations not being allowed is a weird cultural thing. It's not usually indexed by the Bain counters. And until recently, people didn't actually peer review software. And that's something that I've been involved in for a handful of years now. And even though software has dependencies, we don't actually have a clear way of saying what software is connected to what other software. And so just a quick reminder about why we cite things. We're trying to give credit. We are trying to make it clear to everyone that we've done our homework, make sure that we're not stealing from other people. But at least with software, the biggest thing that we're trying to do is we give proper credit to someone who's building that software. And so how do we go about recognizing software for academics? So there's two possible ways. One is we could try to do things in ways that academics are already quite familiar with. We do research. We write papers. We get the papers published in the best journal possible. People cite our papers eventually. The citations that add up, we get credit. Or we could try and think of something new and interesting instead of doing the same old boring nonsense. So because software has dependencies, there's more interesting things that you can do and just go beyond that very simple one-dimensional credit model. So if you imagine this hypothetical example of Arfan Smith who wrote a paper and then he references a couple of his old papers because he's building upon previous work. He references a couple of large data sets and then he references two critical pieces of software that he used, in this case, AstroPy and Scikit-learn. But because of dependency trees, we can tell that these two depend upon NumPy and SciPy. So there should be a way to automatically assign credit to software without having to cite every single piece of software. But we're not very good at agreeing upon any particular standards and trying to get a large group of people to agree to a big piece of change is very, very hard. So we'd have to get buying from individual authors, editors, then move on to whole communities and societies and then get the journals on board and that's really not going to happen. And so the alternate option is just writing papers about software, which is pretty easy to do and it's not a bad thing because a paper can be something that you can just easily cite. We don't really have to create any new infrastructure and then if you are writing software that is important to your community, the best way to bring this to the attention of people in your community is just to publish it in a journal. People read it. There's a challenge, though, trying to get a software paper into an existing journal which is that if you've already done a full research paper, writing another software paper is really painful. You're going to copy all of the documentation that you've already extensively written as part of your software into a paper. Most journals don't publish software papers and for those of you that contribute to open source, if you join a project that already exists, it's very likely you'll never get credit for that work because they already published one canonical paper for that work. That's a common problem for people that joined the Jupiter team. Everyone cites something that's quite old at this point. So we've come to realize that trying to change the system is really hard and so we are going to have to stick with what exists and just hack something around that. So my colleague, Arfan Smith, who founded the Journal of Open Source Software, several of us started talking a few years ago and said, can we create a new type of journal that makes it easy to publish papers about software without making it very difficult? And so we created JOS, Journal of Open Source Software. It is entirely free, open access, it costs nothing to publish, and we created a new system that is very developer friendly. And by developer friendly, I mean that if you would like to get a publication for your software, assuming you have followed all the best practices, you've written good documentation, you have tests, you have clear installation instructions, you've designed a usable piece of software, you've got a good open source license, then we expect that it should take you no more than an hour to write a paper. And the JOS paper is actually fairly simple. It's often no more than two pages long. It's a very, very high level description of what your software does to someone who's not an expert in your field. We're really looking for you to cite who funded you, major references that influenced you, and we really do not want you to put any results in this paper. It's just a simple, citable object for your paper. And we tried to be as conventional as possible in the scholarly space, so we didn't throw anybody off. And this is what the form looks like. Has anyone submitted a paper using Manuscript Central? A handful of you. You know how painful that is. Here, all we need is your name, your version control repository, and if you have an editor in mind. That's really it. You can even skip the title and the description because we already have that as part of your paper which you write in Markdown and then put it in the same repository as your project. And the thing that we built is a robot which is basically a Ruby bot that runs on Heroku and listens for every single activity on a GitHub issue. And this bot, which is the one in the middle, can talk to GitHub and it can talk to a bunch of different services. We named it Weedon because the journal of open-source software is Joss Weedon for sci-fi fans. And as soon as a paper goes into the review queue, Weedon steps in and says, hello, I'm a bot. I'm here to help you. If you'd like to know all the commands that I can do, just type in atWeedon commands and Weedon will tell you everything it can do. It immediately starts to identify what language it is and starts to tag the language of that particular submission. And then Weedon is really nice because it gives different powers to different people. So if you're an editor, you can assign reviewers. I can just say, assign Matayush to be a reviewer and then it'll assign him as a reviewer. I can, if I'm an associate editor, I can assign someone else as an editor and I can also just start a review which just creates a giant checklist for the reviewer to work through. It also gives powers to the authors and the reviewers. So at any point in time, they can say, Weedon, generate a PDF. So it goes through the markdown, goes through the references, generates a beautiful PDF. You can look at it. If the formatting is bad, keep adding more commits, generate another PDF. It will check references. It will go crawl all the DOIs and let you know when something is broken. And then if you've got superpowers as an associate editor, you can say, Weedon, accept this paper and then it will go deposit all the metadata, archive the paper. And right before that, you also archive the software itself. And in the end, we create a PDF that looks like a standard PDF to most academics, which is good because you don't want to confuse them by showing them a software paper. So this is a paper that was recently published about the tidyverse. And when you go to the JOS page, you can see the review. You can download a PDF. And you can see all the orchids for all the authors. So we've been running the journal for more than three years now. We wrote a paper right after the first year describing how the journal works. Some of these statistics are outdated, but the trends also hold. We tend to get a lot of submissions that are Python packages and R packages. And most of our submissions come from the US or the UK. We're getting more submissions now. And we're growing quite a bit. We've published almost 700 papers. We published 30-ish papers a month. We've got a lot of editors and we're constantly growing. And in many ways, even though we created a journal, we really just created another open-source project. So you imagine in an open-source project you get users who get very excited and at some point they start contributing and then they end up becoming maintainers. So you can then step away and hand this off to someone else. It works similarly for us too. We get people that submit to us and because our reviews are 100% completely open and public, they get to see how the whole process works. They want to come back and they want to review. And then at some point, if someone reviews too much for us, we just make them an editor. Lastly, I want to give you a few insights that we've learned running JOS over all of these years. So one of the things that we've done is we've tried to make JOS, even though it's experimental and interesting, seem very much part of the scholarly infrastructure. So we don't have our own login system. We use Orkids, which are researcher identifiers. As soon as we accept your paper, we archive it, all the metadata with Crossref. We archive the paper and the reviews with Portico. And then just about a couple of months ago, JOS papers started getting indexed by Google Scholar. We're still trying to work with Scopus. We love best practices. So all papers are open access. All the authors have complete copyright control over everything. Our governance strategy is fully open. Our business model is that we don't have a business model. It costs us very little to publish any paper. And then even though we're doing all of this, we're doing a pretty thorough job reviewing your software. So we're giving you a citation for a very short paper. But along the way, we're checking to make sure you have a good license, your software functions as intended. If you claim any performance improvements, somebody will go test that. And then we go through a pretty big checklist. The process is quite fun for authors. So we just heard a talk about how E-Life is trying to make things open and transparent. JOS doesn't really reject papers. Our goal is not to reject papers so we can inflate our... I forget what we're trying to inflate. Our acceptance rate. So we always do a desk reject if the software is not fully complete or not appropriate or it's not a research software. But once you get past that point, we want to help you succeed. And the goal post is very clear. So if I'm telling you there's not enough tests in your software, you know what to do to get to that point to get a submission. And because everything is open, nobody's really a jerk about anything. We try to leverage the best parts of open source. So we figure developers are already on GitHub. The journal lives on GitHub and the bot acts on GitHub. And then we try to automate things that are very tedious and boring, and that's what Weedon really does. So if any of this seems appealing to you and you would like to submit to JOS, please submit a paper. If you have a software package. And if you have expertise in any open source language, please sign up to be a reviewer. And I'm happy to take any questions. Thank you. Yeah. So we have a potential solution to this, which is that... Oh, sorry, I'll repeat the question. So the challenge with contributing to an existing open source is that there's already old publication and you will not get credit. So we would like to encourage people to submit new papers when the software has made a major milestone leap. So at that point, everybody that contributed in that previous iteration will come on board. So that is a way we can give other people credit. Yeah, thank you for that comment. It does work, but the challenge with traditional publishing is that you have to have novelty. So it's always hard to get a publication that adds to an existing software. But if that works, that's great. Oh, yeah. Okay. So we work through a checklist, and that checklist is very public. So we're not really passing judgment on quality, but we want to make sure the documentation is easy to understand the software actually functions. So reviewers have to step through every single function and every single example to make sure it actually works. And so you end up with software that is not broken, that is not difficult to install in a different platform, actually has a license. So we just make sure it meets a whole bunch of benchmarks that are signals of a good usable piece of software. So we're not doing a very deep code review. Yeah. Right. And if someone else has done a much deeper code review, we will actually rely upon that review as well. So reviewers are transferable into JOS. Good question. So this is a very difficult question. I'll repeat the question. How small can a piece of software be to get into JOS? This is the common topic of discussion among the editors. So we have a cutoff that you cannot publish a minor utility. It has to be slightly substantial in what it does. And Weedon actually does a quick scan of the software. And then if there's any doubt, we will have a discussion among the editors to get consensus. So somebody who's an expert in the field will come in and say, oh, this actually is a very trivial implementation of like one single method. And then we reject that. We do actually. So how do we deal with papers from languages that are not very common like Haskell? So right now we don't get any as far as I know, but we struggle with some languages like Julia where we don't have enough reviewers. So as soon as we know that this is a problem, we just try and reach out to more people to sign up. And so our reviewer sign up forum tries to get expertise on languages. And every few months or so, the editorial team will decide that we are lacking editors in a certain area and then go reach out to people to come join the board. So if you know someone who's an expert at Haskell, maybe you that wants to help out, feel free to reach out. Excellent question. I don't know if it is, but if you want to publish here, you have to be open source. That's another way. But overall, I think it's against academic spirit to not be open source. So especially if it's publicly funded research, why on earth would you create proprietary software? So unless someone can come up with a good reason, I don't have one. Oh, that's totally fine. Our review process happens on GitHub, but your code can sit in any version control repository of any kind. Let me see if I follow your question. How do you actually cite other methods? It's tricky. So we don't require a lot of citations as part of JOS because we are only looking for a paper that is a very high level description. So if you want to cite a fundamental paper that describes random forest, you can put that in your references. It's totally fine. But we're not looking for a very exhaustive list of references. People usually put maybe five to ten references in their software publication. Wonderful question. How long is the review process? I've seen one that has happened in a couple of days, and I've seen one that's taken many months. It's all up to how fast we can work with the authors. So for example, you submit something and I say, you're doing a test that's missing. And then you tell me, I just started teaching this semester. I have zero free time. I'm going to come back to this next semester. That happens quite frequently. But if a software is pretty well used, pretty feature complete, sometimes we don't really have much to point out. And the few things we point out, people will immediately commit, and then we just go accept and then it's done. Good question. So we haven't encountered that problem yet. How do we deal with plagiarism? Especially I mean people stealing other people's code. So we have not really done much about that yet. I'm sure we can build in more functionality into the bot to actually try to detect some of this. But so far we haven't come up with this. People do have to explain what they've done and how is this different from something else. So you cannot just make an incremental change to an existing piece of software and say, this is my JOS publication. And if you're submitting something to JOS, you actually have to demonstrate that you've contributed substantially to that piece of software. And the editor who's handling that paper has to go verify that. But that's actually a great suggestion. And one of the things we're doing is we're making our bot smarter and two people are now working full-time on the bot. So that's something that we could add for the bot to look into. Oh, yeah. So great questions. So how do we integrate references back to package managers? It's all up to the authors. So as soon as a paper is accepted, we tell, for example, here's a submission that was accepted. And the very last thing we tell the authors is, congratulations, your paper is now accepted. Please add this information to the read me and the citation. So if you're on CRAN, you add this back to the citation file, same thing with PyPy. So it's up to you to advertise everything back if you don't have an automatic way to do this. Thank you. Thank you very much.