 Okay, I think that the trickle is slowing, so I think everyone is starting to arrive at the webinar, so we'll just kind of ease into it. First, I just want to say hi and welcome everyone and just say thank you for joining us. This is a webinar about leveraging open ecosystems to enhance reproducible workflows. So I am Nikki Pfeiffer, Chief Product Officer at Center for Open Science, and I am super excited for this webinar today. One of the things that I enjoy most about working at COS is the mission, which is to increase open this integrity and reproducibility of research. And what's the best part is that this provides the opportunity to develop open source software for our communities, and to be open about our roadmap and our development plans, even going as far as collaborating with our user communities and what those priorities are and what their needs are that we will work on. And lastly, prioritizing an open API for access to the OSF content so that we can increase that transparency, accessibility, and even interoperability with other systems. And to create this community for developing innovative workflows that anyone can use with the OSF. So for me, this just kind of encapsulates a really great group of wonderful use cases and examples for how that actually takes shape in real life. So I'm going to turn it over to Eric, who's going to talk about a little bit more about how that works within an open ecosystem. Thanks, Nicky. So I'm Eric Olson. I'm on Nicky's product team here at the Center for Open Science working primarily with the infrastructure that we developed to support those missions that Nicky has spoken about. And I would like to talk a little bit about the mission in terms of a contradiction that our mission statement calls out, a seeming contradiction where we talk about infrastructure, research infrastructure as a research commons that it wants us to be flexible enough to respect the very many different ways that researchers approach their work based on their disciplines or their regions or other needs, but also specific enough to be a solution for problems that researchers are trying to solve. So it might seem like you're trying to do very different things, but here we think of them as part of our strategic plan that really if this is a community effort, as Nicky pointed out, but these are dialogues and really not contradictions that clash if they are creative tensions that give us opportunities for creative solutions. And for us, the primary pieces of infrastructure that we've developed to support those workflows is the OSF. And workflow support and management tool, it has features that support each element of each phase of the research that researchers, any researcher is going to be doing, they may do them differently based on their disciplines or their data needs or other approaches, but they each have some needs at these various phases. So the OSF and other interfaces and tools within the OSF that we have developed over time have been in support of that mission of supporting with infrastructure and making possible better practices within these workflow phases. And quickly, one of the ways, or really the primary way that all of these features at the OSF come about are a combination of several factors. We have integrators, which we're going to talk about here in just a moment. We have members and supporters on that mission slide, so a little bit of the groups that fund the COS through various grants and other support. We also have members through a number of our fee-for-service tools, including universities and publishers and funders. And then we have users, researchers that contribute and use data on the OSF and the technical outcome of all those things coming together as the OSF as a tool. But really it's all a community effort between these groups and ourselves sort of acting as a facilitator where they're all, you know, coming together and overlapping to turn that into a specific workflow management solution with the OSF. And again, looking at this workflow sequence here, one of the primary benefits of using the OSF and one of the exciting parts of working here is that it isn't just the OSF in the vacuum, it's actually a lot of tools from across the life cycle and across the landscape that are integrated with the OSF. This is only being a few examples that we have built connections with or have memberships with like across the rap and data site, or we've built integrations with a number of storage providers and citation managers because research communities came and told us that was really important to them and would strengthen the OSF and their workflows. But there are many, many other connections with the OSF that are made possible by an API that we've spent a lot of time developing as its own tool, you know, alongside the OSF as a user interface, making sure the API is is highly well documented and highly usable. And that's not just an us effort, it's an effort of many different groups across the community, including the gentleman what we'll speak to today. And the API for those that are not familiar is a interface, a tool, a way for our systems that normally would not be able to communicate with one another, they use very different kinds of information or languages. It gives them an information information and the ability to talk to each other, so they can exchange information when otherwise they probably wouldn't be able to or would be very difficult for them to be able to do that so we're going to see today or a few examples of research communities or individuals that based on feedback from their users or across their communities, they saw a value in using features or elements of the USF by way of the the API to strengthen their tools and by effective strengthened the USF and how valuable the USF can be. So you're going to hear from several speakers and then we'll have some time to reflect on each of their individual tools as well as how they interact with each other and with the USF and with other parts of their community toward the end so use your, your Q&A box and drop things in we'll reflect on those towards the end of our afternoon together and otherwise we are ready I think to hand off to our speakers. Yeah, great thank you for sort of setting the stage I think you did a wonderful job Eric and I appreciate sort of the articulation of the interoperability because I hear that word it's really important for tools to be interoperable but really understanding what that means from a real example use case is what I'm excited to hear more about from our speakers so I think to kick it off we have Lenny from protocols.io who's just finished a new integration with OSF for sharing open protocols. Lenny you ready to kick us off. Absolutely. Thank you. It's a pleasure to be here and many thanks to the Center for Open Science for organizing it and for the invitation. I'll spend, I think I have 12, 15 minutes I'll spend just a couple of minutes with very quick background of protocols.io and then really focus on why integrations and API from the protocol side and from the Center for Open Science. Why this is important and I think you're going to hear me reinforce many of the things that Eric just mentioned. And I'm not going to give a demo of the functionality and features of protocols.io because I don't have enough time, but I am going to go through the slides that we typically present when we talk to researchers about protocols.io or universities and I think it will highlight why the interoperability and integrations are so important. And our mission is very similar to that of the Center for Open Science we're trying to improve their reproducibility of published research. And I always start with this favorite tweet of mine from a biologist who says I'm looking for a protocol in 97 paper is described in 96 finds 96 papers described in 87 finds 87 paper and it's paywalled and this is a biologist at UC Riverside. So you see campus they have a lot of subscriptions but not to this particular paper so it's a common and frustrating experience it's not particular to biology here's a physicist. Devices were fabricated as previously described previously described previously described and the original references devices were fabricated with conventional methods. And this is very common we all those of us who are researchers we've all come across papers like this and I have a long collection of tweets and blog posts around exactly this. But the one other slide on this I want to show actually refers to work from Center for Open Science and collaboration with the science exchange company where they try to reproduce this is the reproducibility project cancer biology. It was a big effort to take 50 published cancer papers and independently reproduce results from them and when the first five papers going through this project where you finished there was a lot of media attention to it and one of my favorite. One of my favorite articles on that is at young and the Atlantic who wrote that one of the hardest the hardest part of this project by far was figuring out exactly what the original labs actually did. Scientific papers come with method sections that theoretically ought to provide recipes for doing the same experiments but often those recipes are incomplete missing out important steps details or ingredients. So, the mission of protocols that I owe is to make the method sharing easier, more reproducible capture the important information organize it, and make it easier to share and collaborate before during and after publication. It's very much in line with the mission of Center for Open Science. The main benefits are that using protocols that I owe publicly or privately you have communities or internally in your group you have protocols organized in a single place. There is concurrent editing history of changes as people come and go the knowledge stays and you move away from the situations that we saw described in the previous slide where no one knows what exactly was done for a particular paper and then when you're publishing you have a DOI, you have a citation protocols that I owe you can link to exactly the steps from the paper of how the research was carried out and hopefully you get also more credit for the method development. At this point, so we launched in 2014. At this point, there are 9,000 public and over 30,000 private protocols and protocols that I owe and we're growing at about 1000 new protocols every month from hundreds and hundreds of scientists. And our business model it's important to highlight here is similar to GitHub where everything that's public is open access free to read and free to publish. And if you're using protocols that I owe privately. Whether in academia or in industry, that's when there's a subscription fee and so there are over 500 journals that encourage in author guidelines using protocols that I owe to share the detailed methods. And then there are the funders that recommend or require protocols that I owe universities that kind of similar to the membership model with OSF universities that have been signing up for internal campus wide use of protocols that I owe. The reason I highlight the business model and all of this is it really does tie into why we have the API's and integrations and so typically just before this I would have done demo of protocols that I owe and then before I take questions I always finish with this slide which is preservation and backups. And you can see OSF is on the slide and I talk about the fact that we have public API's just like Center for Open Sciences OSF. We have integrated export into Google Drive Dropbox OSF and I'll talk about that obviously in much more detail in a few moments. Daily backups, we send every all of the public protocols to clocks, we mirror everything in that's public in a GitHub repository and through Internet archive. And this is a really important slide, especially when we're presenting not just to researchers but to universities to librarians. There's a lot of trust. And when we talk about why are in API's integrations, why is this interoperability important. In my mind there's two answers to it. One is the researcher, and that's the key reason because exactly as Eric was saying, there are a lot of tools, there are a lot of platforms, there are a lot of interfaces that researchers use. There is great resource from the Harvard Medical School Library, and it's a survey of different electronic lab notebooks I think there's 30 in this list so this is a matrix of what are the pluses and minuses what are the different features but you can't expect your user to be on one platform or using one tool. Right. And given that there are a plethora of electronic lab notebooks repositories right and lots of different places where the scientists are if you don't have API's you're limiting yourself and you're limiting the functionality and usability of your tool and your platform so one obvious reason to have the API's is to allow the integrations and to be where the researchers are and facilitate their workflows and make it easy to connect to the other tools. And the big part, the second component of it from the protocol's IO side is the trust right why I show that slide whenever I present around preservation backup is we're for profit as I did say what our business model is, but open access or not for profit or nonprofit. There's always a question from the researcher from the librarian there's always a question of trust will you be around tomorrow. Are you sustainable. What is your governance. If you're for profit. Are you stable and you're going to be bankrupt. If you're stable. Why you tomorrow right and maybe the organization that buys you doesn't have the same principles and ethics that are in line with what the librarian is seeking and so we highlight the integrations we build the API very early on and put a lot of efforts into documenting it because it's a key part of the trust and going back to that slide. It's not just these integrations behind the scenes if you start thinking about who are we plugging into this entire ecosystem just as I was showing. Right. What you sign in with orchid when you publish a protocol. If you added your orchid ID, it will show up in your orchid outputs. We send women to do why we send everything to crossref. The internet archive integration actually comes through crossref they're using crossref to mirror our protocols right so you can see that these are just some of the things API and here are two of those 30 lab notebooks in the survey that are using protocols that IO API is to connect to us and pull protocols from us there are many many other integrations taking advantage of protocols that IO API is and you can see it by directional right like we're connecting to SF we're connecting to Google Drive and that really is the future and whether you're for profit or nonprofit. It makes business sense. It makes sense from usability and the perspective of the researcher to invest in APIs and invest in integrations and on that note. I mentioned internet archive and on that note. The next question is why did we recently integrate with OSF I'll show, you know it's a simple integration well documented APIs on the OSF side. So that made it easy to connect. Why did we email OSF and say we'd like to connect, you know, what are the next steps. It's actually because of the presentations to librarians and universities where they would ask. Over the past year I think there were two or three presentations where a librarian from a different university would say, are you integrated with OSF we have people using OSF and the next thing we did when that question popped up more than once the next thing we did is we went to some of our users and turns out yes indeed they're using OSF and they would welcome the integration not just with Google Drive and Dropbox. And at that point, because the OSF APIs are well documented it becomes a no brainer that you want to do it. Both for the presentations to librarians for that trust aspect and for the researcher right so those two key reasons. And so here is the file manager on protocols.io you can select an entire folder, all of your protocols all of your files, one at a time or in bulk, click export, navigate to copy to OSF. And you say this is the file that I'm about to send to my OSF account. And here is where the OSF API comes into play so you get authenticated we're pinging OSF you sign in your authorize connecting once you've signed in. I just tried at 130 this morning after giving a talk to a Japanese group. So this is something that works it's not just it's not photoshopped it's real screenshots from this morning. So this is my OSF route. I would select where I would like these files to go. And then it takes a couple of seconds done synchronizing with those steps so this is on protocols.io and then I navigate to my OSF space and there's the file uploaded today at 130 in the morning. And it, you know, it is an investment to develop API's it is an investment to document them and keep them up to date. It is an investment to even, you know, even when someone does have good API's. It's an investment to connect to them right so we did have to have our engineers connect to OSF, even for this simple workflow, but it pays off. It's worth it for the researcher. Our users welcomed it. Here we are on the webinar discussing it and I whether, you know, just to finish up whether for profit or nonprofit I don't see how you can really be a stable and trusted player. Participant in this ecosystem. If you're trying to encourage collaboration, reproducibility among researchers, you can't not participate in that collaboration and integration with others as as a for profit or nonprofit. So I'll stop there, and we'll be looking forward to the next presentations and happy to answer questions at the end. Thank you Lenny that was fantastic. And I'm yeah personally really excited for this it's been one that I've heard bubble up among different research communities over the years so it's it's great to see this come to fruition. So I don't see questions and we're going to save any anyways to the end. But if you did think of something while Lenny was talking make sure to drop it in the Q&A so that we can pick that up at the end. Okay, so next we've got Stefan and he's going to sort of talk a little bit about Jupiter workflows and OSF. Hello everyone. So yeah, my name is Stefan Boilman. I'm a research fellow at the School of Information Technology and Electrical Engineering at the University of Queensland in Australia. I'm not a developer of any of these APIs, but I'm user. So I'm, I would just show a workflow that we did for a recent paper. And based on that workflow, always have the OSF team got in touch with me and asked if I could present what we did there and maybe it's useful for for someone else. So let's get started. Well the first thing that I often hear from colleagues and especially older colleagues as well open science is easy right. You simply put all your data in a zip file, and you upload it to Dropbox and you paste the link at the paper right that's that's what you do. Why do you spend so much time on making things nicer for users. The reason is that I think it's not that good, because if you if you think about it, if a user has to download two gigabyte file just to see which functions were used. That's not so cool. Then also, if I just put my MATLAB code and my data together. I'm relying that the user has the right MATLAB version. I'm relying on an assumption that the reader has a MATLAB license at all. It might not work. It might not work in a few years. Maybe MATLAB is not around anymore. So I don't think it's it's it's that useful for people. Then also Dropbox is great and it's around for a few years but what if and Lenny also mentioned that what if Dropbox gets bought out by someone and they close that free service down what if Dropbox runs out of money. What if they just stop that free service then you have a link in a paper and that paper is a record of your research and suddenly that record is not there anymore. And I actually had that problem in one of my earliest papers have a link in there and the service was closed down a few years later so I actually was affected by that and I started really thinking about this problem. But then a trivial thing I found a bug and it happens all the time because the more lines of code you write the more bucks you will have. So, I haven't had a single paper where I don't have a mistake in there. And I always had to do something afterwards to fix it. So then the problem is, what do you do right do you just exchange that whole zip file and fix the bug do you just leave it in there do you add a little text file and say oh by the way that's actually wrong and it should be that. Yeah, it's it's a lot of open questions and as I said I ran into these exactly these problems while reproducing other people's work and also with my own work. So I was wondering, can we somehow do this better. And well this is, this is where I am today so I would say the first example is where I was a few years ago. And this is one of our recent papers where I really tried to to learn from my mistakes and try to do it a little bit better. So, first thing is, I put the code that we used in the paper that reproduces the results of the paper. I put in an interactive Jupiter notebook that runs in the browser that runs on Google Colab. So Google Colab provides a lot of free computer resources including GPUs. So it's a very nice way so that people can quickly play with the code. And the good thing is, oh there's a hand up from Aaron, do you want to ask a question or. Okay. So yeah, they can run it interactively in the browser so there's no setup required they just click on a link and it just works they can just play with it. And I think this is super nice and accessible. But of course the problem is if Google Colab would get called to tomorrow and Google is known for just chatting down their services so I was aware that I can't put in a Google Colab link in my paper because Google Colab will probably not be around in five years. And then, yeah, that's that's then where OSF comes into play because OSF is probably around for 50 years because they have a preservation fund. So I thought, okay, let's combine all of that and OSF so I will show a little bit how we do that today. So, and this is the thing, the platform that's why I just said a cloud based platform, because there are alternatives to Google Colab I just used Google Colab because it was around. And I have the link of that in the OSF repository so I can easily change it later down the track. Then also, instead of dumping my code in a zip file. It's of course on GitHub because GitHub is super nice for for people to interact with the code to fork the repository to fix bugs to report bugs to discuss the code so it's it's very nice easy way to share the code. And then also, GitHub gives me this nice feature that I can have a commit bug where I said this is the code that was used for the paper. So I'm already assuming that I will find the bug. And with that paper actually haven't found the bug yet in the code so if you find one let me know and we can fix that. And Lenny already said that as well, what if GitHub is not around anymore what if GitHub gets bored I mean we were lucky with the Microsoft acquisition that Microsoft actually wanted to develop that further. But what if GitHub gets bored gets shut down. Then again, I have a link that's not working in the paper. Yes. In this case, I'm, I assume that GitHub is so relevant to a lot of the research community and a lot of people out there that GitHub will not just simply shut down so here I took the risk, and I actually put it in because I thought usability here is better. But I have a backup, of course, so the GitHub is linked to my OSF. So in case GitHub goes down, GitHub is preserved in my OSF and I will show later that this is part of the snapshot that that get that OSF does off that so it is preserved, but here I took the risk that GitHub might not be around. But if GitHub goes down I think we have bigger problems. So, and then finally, well, yes, as I said I combined all of that on OSF. So here I can update the data, I can update the links, if bugs were found I can, I can preserve everything that I showed there in a timestamped snapshot which is the registration feature which I show later. And all of these examples use a lot of APIs from all these tools from from GitHub from from Google Colab so I want to show a little bit how to tie this together. So let's get started. Okay, so the first problem I had is how do I get my data nicely accessible in this cloud based platform because, yeah first I have to get my data there. The problem is that a lot of researchers use high performance computing systems. And also, we're not just creating 10 files or 100 files. My files usually are a million files. And this is bad. If I want to take the browser upload feature that OSF has an upload a million files. Well this publish when this paper will not get published in the next 10 years so I will just be too slow. So I need something that just does it nicely from the command line that I can script. And also, I'm probably not just doing this once because as I said I make a lot of mistakes so I probably have to do this multiple times. So, the first thing is, let's connect a storage back into OSF so of course you can use the OSF storage for this as well just to keep it simple for now but I just wanted to show that this is possible so in Australia we have a really cool storage provided by our infrastructure provider on it which is called cloud store cloud store basically is an own cloud install that gives every researcher. One terabyte of storage for free and if you ask for more you get infinitely more storage so it's it's basically our research storage. And the thing is if I would use the OSF storage I'm limited to five gigabyte for private projects and 50 gigabyte for public project. As I already said, a lot of our data is bigger than that so it wouldn't fit into that so that's why having these external storage providers is really really useful. And the setup is quite simple so you have to set it up once for your account where you authenticate and then you edit as an add on to the project where you want to use it. So these full webinar on storage integrations of OSF already on YouTube so if you if you want to look that up it's it's nicely described how to do this and how to add your own storage apis. But I must say the list is already pretty complete for storage apis. And this is also fairly straightforward. And if you, if you struggle for example with own cloud I wrote a little example on my blog, where I show how to connect an honored cloud store to OSF and just go through really all the details and if you're interested. Okay, so now we. Now we need to upload up and now we need something to upload our data there from our high performance computing system and there is another really cool open project and this is not maintained by OSF this is a community of OSF users that had exactly the same problem as I did they had millions of files and they wanted to upload them to OSF and also download them. So this is an OSF command line client. It's a, it's a project that started a few years ago and they're still quite actively maintaining it and fixing bucks, and it's an easy install so it's just if you have a python install ready it's just pip install OSF client, and I've never had any issues with it it's super easy it doesn't have a lot of dependencies just install it just works on almost every system I used. Then it's as simple as navigating to the folder you want to upload to OSF and then that folder can have a million files below you do an OSF in it in that folder from the command line. And then what that does is it asks you for your username, and this username is the OSF username, and then it asks you for a project ID, and this is the OSF project ID. Here one little thing if you're trying that. I had problems in an earlier version with OSF client with the OSF command line client when I used institutional lock in so when I use something like orchid authentication, because the problem is this project did not implement correctly the token, the authentication token in the background, and this authentication process actually failed so I had to create a true OSF account with my university address, and then it was working because then I didn't need the authentication tokens, but I just checked the GitHub repository and that bug was fixed they realized that they hadn't implemented this token correctly. And it looks like it's working now, but since I now have my own own cloud, my own account on OSF, I couldn't test it with with my orchid account. So, yeah, it should work. Then let's test the OSF client so if you just do OSF LS, what will happen is it asked you again for your OSF password. And then it shows you that cool I put a test file in my own cloud that I have linked to OSF, and then I see that I have a screenshot in there, just as an example if that whole integration works. So that's super fun, we can see the files. So now, how can we upload our files them. Well, it's as easy as you could imagine, it's just OSF upload, and then dash R is for recursive. The dot means I want to upload all files in the current directory, and then OSF storage slash data is the target OSF storage in this case means that it will end up in the OSF storage, and data is a sub folder where it will end up in. One interesting thing, if you exchange OSF storage to, for example, own cloud, it will directly end up in your storage integration that you did with own cloud. So it's really, really transparent just ends up on your institutional storage. So now I don't have to figure out how to connect to my own cloud because also they have their own token system. I could use something like our clone and try things but the thing is with OSF you have one interface and it just connects to all the different back ends and you don't care if own cloud goes away tomorrow. Well I just uploaded to Dropbox or I just use anything else. Nothing changes for me it just changes what word I put there. And if you try that the project, the OSF client project has a list of supported providers storage providers and not all providers are implemented yet. So I added own cloud yesterday to the list. And if the pull request gets merged it should work. And also it should just be a name of the storage provider that they have in a list because that's how to identify if the storage provider is is actually valid. So if anyone from the OSF team is keen, you could if you could add all the storage providers that you provide with the correct mounting names on your system to the project. That would be super fun because they have an open pull request where they say well we don't know how OSF calls these things, and we don't have, we don't have accounts and everything so we can't test it. But yeah, so if anyone is keen to do that that would be super cool. And now we got our data there. How can we now use that data for example in our Jupyter notebook because I said I want to pull that data for the users and just make it super nice. Well and lucky us again it's very easy in a Jupyter notebook, you can install everything you have so this is Google Colab as an example. If you add an exclamation mark it means you will execute that command not on the Python interpreter but actually on the underlying command line so you have a Linux system running below that. So you can just run pip install OSF client and then it will just go and install it and since it's a super small package it will just take a second. Again if you're interested in more details I also wrote a detailed blog post about this last year. So if you want to play with that go for it but it's it's really just a few steps that I'm showing today. So then, if we want to use this OSF data in the Jupyter notebook, the easiest thing is to make our OSF repository public, because then we don't have to handle any authentication again. And this is the point in the end I want this thing to be public so let's just do that. And then to get all my files that are uploaded there. So again a simple command OSF dash P for specifying the project clone and then the dot says where you want to have these files and then dot means here. And then if I do an LS I see it pulled all my data from OSF so it pulled the OSF storage it pulled own cloud. And that already works so they can pull all storage providers, they just can't upload yet to all storage providers because for this they need to know which ones exist. So if they are there, they will just pull everything. So that works really very well. Cool. So now let's let's look at GitHub. As I said, GitHub might close down one day and to avoid problems with this is, of course, I connect GitHub to my OSF account and you also see. GitHub is supported as well in case you want to use that which which I did, and it's also working very well. And it's the same procedure you authenticate once and then you add it as an add on to your project. And then it's there. And now that's a really cool thing that I was actually surprised that it works. So GitHub now has my Jupiter notebook in there because I version control it with GitHub and Google Colab actually just saves directly to get up so it's super easy to update your notebook and then it's tracked on GitHub, which is such a cool thing again because it's an API in the background to very different services just talking to each other. And yeah, when you when you have the OSF GitHub integration on and you click on your Jupiter notebook, it actually renders it in the OSF interface. And what I did is I put a Google does that actually they put a little open and Colab button on top. So it looks like on GitHub where you can just click on it it brings into Google Colab, and it will just run that. And yeah, as I said, if anything of this goes away if GitHub goes away tomorrow well I just put it on on GitLab if Colab goes away well just put it on Asia. It's total flexibility and it's really I can just reiterate what what Lenny said it's it's really you need that flexibility you need that because this stuff has to be around for 10 to 20 years because we're doing medical research, and it might be interesting for people down the track what we did. Good. Last thing, how does OSF help us to preserve all of that. And I'm not sure if I'm misusing a feature there, or if, if that is the way to do it because I was looking for a snapshot feature where I can just take a whole snapshot of the repository and archive it. And the way to do it, I think is using a registration so maybe some of them was I've gone correctly there if that is the right way to do it. But what I basically do for my project is at the point of publication, I file a registration of the project. And I just say, this is the state of publication. Please go away and archive all my files that I have and put it in a snapshot and preserve it in a, in a basically immutable snapshot that that no one can change and that also doesn't go away so this works very well for me. And yeah, now the cool thing is I have a snapshot of the paper when it was written when it was published. And then I also can update code later. So then people have the state of the publication, but also the publication is still alive. It's not just that dumb PDF record that never changes but it's actually more people go there, and they actually can do things afterwards update. As I said, there will always be bugs that will always be changes. So we need to be flexible to that. Yeah, so I hope this was interesting and useful and you saw that it doesn't take much to create reproducible workflows. Once you have such a great open ecosystem around us that just enables all of us to do this so you just have to have to know that it exists. But in the end, it's just a few clicks. And it's actually easier than making a zip file and uploading it to Dropbox I have to be honest. So this is pretty much what I had prepared today. If you try anything I showed today you run into problems. Feel free to get in touch with me. I've had many of these issues and I might have seen that that error before. So I might be able to quickly help. Then I thank you for your attention. I think the OSF team for organizing this webinar inviting me to present one of these examples for an open workflow using OSF and then I'm looking forward to your questions later. Thank you very much. Thank you seven that was awesome. I loved one the Easter egg you left in your code for us all to find as a challenge. And then the fact that you, you've already added and contributed back to the OSF client tool with the own cloud capabilities there. And just really quick before Aaron takes off. But if you want to start getting your slides ready Aaron that would be great is just see there's a question. Yeah well I also want to touch base on the use of registration for the, for the workflow that you were talking about and that's absolutely one of the use cases for it so I think that was a beautiful illustration of that. And there's different templates. So the one you showed was a pretty rigorous template that asked for a lot of input there's also the open ended registration template that doesn't ask for as much it might be the best way to just provide a description of sort of the archiving that you're planning to do. Great. Good. Yes I was already wondering why I have to enter so much for just reserve the record. Yes. Okay, that's good. Yes, and I was a question and chat does the client work with two factor authentication. I haven't tried it I looked through the code and I didn't see anything that wouldn't appear to work with two factor authentication, but the team is quite open for these things so if you, if you propose that and you implemented it was very well documented and the API date they use so it should be possible if there is an API in OSF for two factor authentication I think people could could implement that easily. Okay. Yeah, I don't know either but we'll certainly find out and try to get an answer, either during the webinar follow up after. Alright it looks like Aaron's already and loaded to share a little bit about workflows with our and OSF take it away. You're muted still there Aaron. You did. How about there. Good. Much better. Okay, can you see my slides okay. Great. So thanks for having me. I really enjoyed those. Those two talks I did not realize OSF could render Jupiter notebooks that is super cool and I'm looking forward to trying that out. So I'm coming at this from a slightly different perspective. I am the lead developer and maintainer of OSFR, which is an R package for interacting with OSF and it's built on top of OSF open API. So I have my contact info here if anyone wants to get in touch offline with any questions, but I'm going to give you a little bit of information about me, how I got involved in this project and the motivation for it. So I am what you would call a recovering academic and I'm currently a software engineer at TileDB, which develops a scalable universal data storage engine. But my background is in computational biology and I've actually spent most of my career in research and the origins of OSFR date back to when I was an assistant professor at Virginia Commonwealth University, splitting my time between research and developing VCU's data science lab, which was critical to the development of OSFR. So this is an organization I co-founded with my former colleague, Dr. Tim York, who is the DSL's director. And it's focused on helping VCU researchers and students take advantage of modern computational tools to improve the reproducibility of their work through training and collaboration opportunities. And we were motivated to do this based on our sort of own experience with collaborators and sort of witnessing firsthand the non-optimal workflows that people can use to manage the research and work with their data. So there's sort of this problem that computers have become an essential tool in research but training with basic computational skills is not usually part of standard training curriculum. So for me, you know, someone with a computational background, my research toolbox includes things like R for performing analyses, GitHub for managing my code. These tools, for a lot of reasons, but most importantly, they allow me to automate repetitive tasks. You know, things like downloading data, rerunning analyses. But I don't really know how to do anything that doesn't involve using a keyboard. So most of my research involved working with collaborators who would actually, you know, generate the data. And these are, you know, incredibly smart and skilled individuals, but it's not reasonable to ask them to learn something like get so they can push their data to my repository. So we'd end up, you know, passing files back and forth through many, many emails. If attachments were too large USB drives could be involved. And you know, all these sort of things that we know don't represent best practices for research reproducibility. And so one of the first things we focused on at the DSL was getting researchers to use OSF to manage the research. So we signed up VCU as an OSF institution, we offered introductory workshops that really focused on, you know, how much more your, how much more efficient your research could be. If you base it around OSF as the hub. You know, it, it, we would highlight things like centralized storage for your research materials automatic version control the ability to selectively add collaborators, all those things that make OSF great. And generally, adoption was quite high. And I think a big part of that was, you know how relentlessly focused COS has been on keeping OSF interface, you know, very approachable and user friendly. But for me personally there was still a missing link which was no direct connection to our, which would create, you know, a lot of additional opportunities for automation and generally making the computational side of research, a little more efficient for people who use OSR. And so I was delighted when Courtney Soderberg and Tim Errington who work at COS approached and asked if I would be willing to work on an R package. I think I literally jumped at the opportunity because it was something I, you know personally really wanted. And I figured there are many other researchers out there using are an OSF, who would also benefit. So I had practical considerations that made OSF are possible though that I wanted to point out, you know the first was I had some protected time to work on this. Because of my involvement with the DSL, which supported half my time in the VCU office of research who provided the funding they agreed this was a worthy project to work on. I didn't have to, you know, work on it and nights and weekends was something I could actually dedicate a little bit of time to. And I'm just mentioning that in case there are any funders or developers, you know who never tried getting funding for a side project. If it's something that's really contributing to, you know, the open science community it's it's worth trying to to get some funding for that. The second part of that was OSF API itself. The first time I looked it over I was incredibly impressed by how comprehensive it is how well documented everything is, as Lenny mentioned building something like that takes a fair bit of time. You know it's not something that happens automatically you really have to put work into it. But the benefits are obvious right now we have this wonderful resource that other people can build on top of and continue extending this open science infrastructure. So if there are any other tool builders in the audience, I highly recommend checking it out so really is a delight to work with. So let's talk about the package itself and some design considerations that went into it. So, OSFR provides a high level interface to the OSF API. So it's not a one to one mapping, you know, of our functions to API and endpoints. The trade off is, you know, we support a much more limited set of actions than the API offers, but we use those as building blocks to provide interfaces that are more convenient to our users. So, you know, for example, the API lets you upload a single file at a time. OSF Web Interface lets you upload multiple files, but you can't upload folders, right. So OSFR like the Python client that Stefan mentioned supports recursive uploading so you can upload an entire project all at once, which makes it very easy to replicate a local project on OSF. And, you know, the OSF API it's incredibly comprehensive let's do all kinds of interesting things with registration and pre prints and so on. And OSFR basically ignores anything that's not directly related to project management. So we provide classes for representing, you know, three basic types of entities projects and components, files and users, and then a handful of functions for performing the most common kind of tasks you would perform on those entities. So things like creating projects uploading files, moving them around, deleting if you're careful, those are all the kinds of tasks that fall squarely within OSFR scope. So the last important consideration was user friendliness. Like OSF itself, I wanted OSFRs interface to be very approachable. You know I wanted students who were just learning R to be able to use it. And the data structures that our users are most familiar with our data frames, which is how our represents two dimensional tables of data. The first row that we retrieve from OSF is returned as a data frame, where each row represents a single entity. So here in this example we have a data frame that contains two OSF files. And you can see what the file names are the unique identifier that's assigned by OSF. And then we have this third column called meta. This means all of the JSON data that's returned by the API. So this includes everything OSFR needs to represent these objects and perform subsequent actions on them. But they're deeply deeply nested data structures that can be pretty tricky to work with. And since most users don't need to access them directly we just kind of tuck them away into this, this third column here. So OSFR for our users in the OSF API in general is you know we can close the loop around this incredibly useful hub for managing research and create a workflow that anyone regardless of technical skill level can participate in with without having to sacrifice efficiency. So for me it meant all of my projects start with a script that downloads the necessary data from OSF. All of the results from my analysis are then uploaded automatically so it's not a step by I forget later. And then my collaborators receive notifications that hey you know there's a new set of results or tables for you to look at in everyone goes home happy. Now I'm going to attempt fate here and give a quick live demo. So we're going to use OSFR to create a new project on OSF and then populate it with some files, just to give a sense of how it works. So I will switch over to our. Can you guys see our studio here. Cool. So the first thing you have to do is load the package so library OSFR. And it's installed I have generated a personal token, which authenticates me as the OSF user, we have documentation available that explains a little more what what's involved with that process. So I'm going to create a new project called COS demo. I'm going to assign it to a variable just called demo project here so the function I need is OSF creates not component we're going to create the project. The title is COS demo, and I will make this public. Just to keep myself honest so we can verify everything actually worked correctly. So I'll run this. The output from that is a data frame, you can see here. This tells us, again, this is the title of the project. This is the unique GUID. So if you navigate to osf.io forward slash B97HM this project should now exist. I'm going to populate it with files so I have a couple of Word docs in my working directory here. So I like to keep things organized so I'm going to create a component called, let's see, research materials inside that new project. And then I'm going to create a subdirectory inside of that component called survey, since these are two documents from a fictional survey based project. And then for the last step I will actually upload the files. I can use an r function to identify all the Word documents or doc x files in my directory. And then we'll run this. We will use the correct name of the variable. Run that. I should have had some filler material here in case the upload took longer than expected. There we go. So again the output is a data frame. These are the two files we uploaded. You can verify everything worked using osf open. Give it the demo project variable. And that should open up your browser to the project. So there you go. We have our COS demo project. It has one component research materials, and then our subdirectory with our two files. Let's go back to the presentation here. Okay, so that's, that was all I have. So in the slides, I have a few links if you'd like to learn more about the package. So there's a website with all the documentation the source code is on GitHub. And there's also a blog post that was published by our open side that used the open science framework in the context of someone who's a really are heavy user and sort of comparing it to GitHub, you know, how do the two services compare and differ. Yeah, so I'm looking forward to any questions you might have big thanks to the Center for Open Science for organizing this. And that's it. Thank you so much, Erin. That was, that was fantastic. I learned so much I'm actually not an R user so I got a little tutorial on how that works. And it's, it's a, it's an awesome package you built and I, I think it's very beneficial to the community to see how it will work or how could benefit them. I am sort of thinking at this time we should turn over to any questions that have been surfaced and Eric I think you either have some pre plant questions for us or we'll read from the ones that came up during the talks. Yeah, I, there may have been some raised hands among the panelists while we were going so if you do have one. Otherwise, I will throw one to you, one at a time and can think about them a little bit. I have one question for Lenny. So you showed for example your protocols are PDF files and I'm wondering, is that the preferred use case does protocols I also support structured data so like we have MRI protocols that are basically XML files that have a structure like 100 parameters and then they are hierarchical. Is that anything that that protocols are our supports. Sorry, I that that's the limitation of not giving a demo protocols that I owe is very rich supports components timers gives run functionality versioning for King, and it's set up for dynamic and interactive protocols discussions on step level right materials, their videos and side steps images table so it's, it's very rich content and the API access so with OSF integration and dropbox we just exported PDF and with clocks for long term preservation, but the API access to protocols that I owe we also provide Jason files with all of the structure. And of course the whole problem that I owe is that it's not the Microsoft Word document, right it's not a static PDF, but is dynamic is cool because that's a limitation of some of the tools we already used to do this yes because it just supports text and yeah no Jason things and that's really cool. Okay, great. Thank you, Stefan. So, Melissa and the Q&A panel has given us a really good question. So Aaron just mentioned a few minutes ago that part of the way that what the OSF started and was able to to become the tool that it is today is that he and his team had some protected time to contribute to that community develop tool. So how can funders or workplaces or universities identify when there is a right time to invest into the resources and the time for you and folks like your yourself to put that time and energy into those community develop tools like that. And when should they be contributing to those open source tools that that their communities, possibly rely on quite a bit. Anybody can pick it up but Aaron you were you were the initiator on this one. Yeah, I think that's a really good question. You know, it's so much of research depends on open source tools, and it's really amazing how often, you know, these widely used pieces of software are developed as side projects. And there's sort of a fairness component there, you know, because so many people extract value from them it seems like someone should be willing to invest the funds that provide the protected times to continue their development. So I think we just have to be more honest about how time consuming it really is to develop these tools, you know, it's. It's not something you can necessarily develop just you know on your lunch hour every day, you know, development takes time but it also takes time to respond to user requests and bugs and make fixes. And I think part of it is, you know developers need to need to speak up and say you know I'm developing something that does create value for the research community and I think you whoever that is should consider funding it but I think you have to be able to demonstrate the value you know part of the question when is the right time to maybe consider asking for funding and I think you sort of have to be at the point where you can demonstrate that it's going to benefit others, you know, beyond yourself you've solved a more general problem for people in the research community. Yeah, Lenny is a link in the chat and I think everyone can see you want to tell us a little bit about that one Lenny. So just an example of a funder that recognizes us so chance or group or initiative, the link that I shared in chat they're recognizing that it's not just okay we've built something who is maintaining it who is continuing the development community and those are things that often with open source software, we're not cognizant of we just rely on volunteer time and it is a big problem but I see more funders recognizing like Gordon Betty Moore Foundation chains Zuckerberg initiative Sloan. Welcome trust and their, you know, OSF itself is an open source project that's philanthropically supported. There used to be more awareness of it more advocacy and less of reliance on goodwill and perpetuity right and funders do need to recognize that it's amazing when people volunteer and create something. But these things need continued support. So if there is community use adoption and reliance and something you have to come up with ways to support it. That was a great question. I totally agree. I can just add one little thing to that. I think often it would also help if people instead of starting a new project would actually look around what are already existing project and maybe help contribute there so that's one thing that I think is often a little bit underestimated because people often say oh it's so easy to start your own project but then exactly what Lenny said it comes in and you have to maintain it and if you started something yeah you can't just drop it. Yeah, they're already a lot of great tools out there so I find that just going to a community that already does something and say look, that is a feature that I could help with. Can I integrate that in your project and then people often quite happy to take that on board and that could also be one one way of helping but yeah. My users can help is to cite the software and their research that they're extracting value from. That's an easy way to signal to funders that this is a critical component in my workflow. Yes, please do that researchers in the audience that would be great. Well, I really have a few more minutes here. So I do want to quickly. Actually, Aaron do me a favor there and showed off the documentation API documentation earlier but let me drop that link in the chat. And I'm not going to go and give you a tour of that, because I think we probably have seen a better examples of what the OSF can do, then I can show you and in two or three minutes scrolling around in the documentation today so thank you all for providing such strong insight into how the the API works and why this documentation is valuable. So really, I would want to wrap up on. If we could each of us in a couple of words. Talk about where you in the case for those of us that are trying to develop or partner in development with resources like this these these open source. Community develop tools that require you a number of stakeholders what's the next challenge for us. What's the next action we can take in order to meet those those challenges that are 2021 and and beyond Aaron you want to go first. Good question. I think COS is a very prominent research organization I think you know you guys have a big platform. And I think this webinar was a great step towards maybe publicizing some of the, the community contributed tools that have been built on your platform. Because I don't think there's much awareness about them so I think that's something that could be improved upon that that sort of benefits everyone. I know that sounds great and we will if I haven't, we didn't say it earlier this is not the last API or community development kind of event that we're going to do this year and there's something you'll see many of in various formats throughout the year so I know there are several API developers in the audience today so I'm not forgetting about Joe we're going to get to you. So, it's fun to tell us a little bit about what you see as a as an action that we can all be taking that just on the USF itself but sort of in this space of developing and using as a as a community. I think, as I said, contributing rather than reinventing I think that's really good. Because also I see that, for example, a lot of people try to build new services where already service exists so yes if people more people go and figure out okay, what does OSF actually offer and I didn't know a lot of the things that actually OSF can do, and had to go and figure it out I think communicating what's out there and building that community supporting each other because it doesn't need much support sometimes you just need to be pointed in the right direction so if you can build that community of developers that people can help each other see what other people did I think that is really really helpful. Yeah, and that's that's pretty much it and keep. Yeah keep keep the service running basically for the next 50 to 100 years so we can rely on it I think that's that's probably an important factor for a lot of users. Yeah, it's perfect. Thank you. That's very, very important to us to certainly and as much as we can support, not just ourselves but all of you is really important to us and please don't hesitate to reach out to us with opportunities to help you. Lenny. I'm going to mention something that's hard and relates to part of Stefan's presentation where he was talking about, you know you just you shared a link to a data set to code but in five years can people open it. Right, and this is something that is on my mind a lot has been for years I don't know if this is for 2021 if this is something for OSF and Center for Open Science to solve. But all the wonderful integrations and exporting that we do. You know it's, it's critical it's important you have to rely on something being there and not disappearing yet being dynamic and versionable so those are all great things but we export PDFs to clocks. Internet archive is mirroring PDFs, but I did say that we support videos, you know people are adding lots of links to the project to the data to the protocols their links to YouTube, or maybe they've uploaded a file to protocols that I know and maybe protocols that I owes around in 50 years, but can those video files be open. Have we done the real preservation if we send the PDF with the link to YouTube and maybe that doesn't render in two years. So I think PDFs are good start the low hanging fruit but that is not enough. And if we want reproducibility and real long term preservation in a way that Stefan was talking about. That's a huge challenge and proliferation of different tools and different places where we put the data and the videos and all of the files. And the file formants doesn't make it in easier. And even if you have the file how many of us can open a disk kit with a PowerPoint file from 1999, or just 20 years ago we're not talking 50 or 100 years, or even 10 years ago right I think that's a good point integration and preservation we have to start thinking about what do we do to multimedia what do we do to ensure that those links keep working. I wish I had a solution for it. Well, for today we will take a well articulated problem statement I think that was terrific and and the next webinar will solve it. That's the plan. All right, Nick you want to take us home. Yeah, no I thought this time we spent was wonderful I want to thank everyone who joined to listen in and thank all of our panelists partners in the OSF community development work that think we've all invested in in one way or another I just want to say how much I appreciate your time in articulating your use cases and examples with the community and we'll record this so that we can keep it going and share it on and it's Eric said there'll be more of these types of webinars to come so stay tuned and we'll be sure to share those next coming up opportunities. So thanks again. Goodbye.