 And we are now on the Binder lesson here. So, yeah. So the question is, when is sharing code alone not sufficient? And so let's reflect on what just happened in the first two lessons of today. So we had scripts which worked roughly okay for many people, but there were many different problems there. And once we start having these dependencies, everyone's computer is different. So we have people running the Anaconda Navigator, Anaconda Cloud making own environments, maybe some people on clusters and multiply this all by three different operating systems. So basically, even if someone has the code, it really doesn't work that well. So look at this picture here. So what looks really simple. So there's, we say, okay, there's this code. It says Python import pandas. There's all these things underneath. And it can all go wrong some different ways. But what happens if we want to make our code a little bit more reusable? So we want to find a way to have code and have someone else to easily be able to try it out, even if we aren't giving them all the resources directly. And that leads to this question, which is being shown here. So Lea is a PhD student in computation biology and after two years of work, she's ready to publish her first paper. The code used for analyzing data is available on GitHub, but the supervisor is an advocate of open science and says this just sharing this code is not sufficient. So if this code is just on GitHub, what kind of problems could happen? And by the way, GitHub is a version control platform. It's a way to share code and control changes to it. So this course is definitely not about Git and GitHub. Our other big course code refinery is about this. So don't worry about that name, but what kind of things could go wrong? Please write in notes and I will flip back there. So what can go wrong? The easiest is probably we miss requirements. We miss dependencies. Okay, so basically the numpy or pandas that worked five years ago doesn't work anymore because something has changed. That or we don't even know that it is there or that we need it initially. Yeah, so if there's code that shared but doesn't have this environment file or requirements file, then there's really hardly any chance there. Without documentation, it can be really hard to use. So what about the whole version of Python changes? So for example, a lot of my old codes were written with Python two and I never updated them because I'm not doing that anymore. That code is probably effectively dead by now. Very likely, except if someone, well, it's, okay, I would make a distinction. It's dead for current development. It might still work if someone has a Python two and their respective dependencies to run it. So the code might still work if you get the right dependencies, if you know which versions are needed for it, but it's probably dead for further development. And then what about the practical difficulties of someone installing these? Let's say you're sharing it with someone that's not an expert in Python and doesn't use environment files and all these things. Is there any way that they can explore this code and do anything with it at all? Well, yeah, there are tools like binders which essentially provide a virtual environment that is independent of your machine. And if they don't want to care about it, this kind of, yeah, more or less virtual machine for a virtual interpreter can help you there. I wouldn't know of anything else that you could really do if they don't want to install anything. Yeah, so well, anyway, that leads to what our lesson is now which is sharing computational environments with binder. So back to Thomas' screen. So this is something where you can try to follow along and do it with us if you want, but for most people I would say, don't just watch us and ask questions. And why is that? It requires knowledge of Git and the GitHub account and for a later part is the Noto account. And these things are definitely not prerequisites of this course. So we don't expect everyone to be able to do this. So try if you want, but try to get the point of what we're doing and ask questions and get inspired to do this later. So let's see, let's get started then. How I'm arranging my screen so I can see what Thomas is doing. So we start off with this example we used from our previous lesson, which lesson was it, the data visualization. So it's a little plot. So we see the code here. It's in a notebook. Can you run it? I hope it works and I didn't do anything wrong, but yeah. Yes, okay. So there's a plot. So we need to, we want to make this reusable by other people. So to do that, the first step is to make a Git repository. So we would probably do this from a command line, but instead we're doing it from GitHub. So here we are on github.com. Thomas is logged in and we'll click new. And this allows to make a new repository. We can call it binder demo, okay. And there's no description needed public. But it's relatively important that it is public, otherwise binder will might get into problems. Yes. And click create repository. So so far what we've done is a pretty standard way of using GitHub. And next up we will find on this page upload existing file. So we can do everything here. It's right around there. Yes. Okay. Did you save it in Jupyter lab? And give it a title? Yes. Her rename the file. It is, I just try navigating there or actually choose your files. So this is a bit binder teaching. There called it, called it plotting. So it's been added and we give a git command message. So this is, this provides a history and explains what we're doing. So we commit changes and there we go. There's a notebook. Can we click on it and see what it looks like when we open it? Yep. GitHub has a preview. Okay. Yeah. That looks like what we just did. So now in fact, maybe someone can copy the link and put it in the notes. But this is, this is something that anyone can see. Okay. Thomas is doing it. Yeah. Let's need to find the notes again. Okay. There we go. Okay. Okay. So now here we are. So what's the first minimum step in order for anyone to be able to run this code? Telling what requirements there are. So what other, what dependencies it has. Yeah. So exactly what we just covered in the lesson. So we'll create a new file. We will call it, we'll copy this and we paste it and call it requirements.txt. So now we've just made another file in this repository from the GitHub web interface. And we pasted in the requirements.txt version of dependencies. So we can click commit changes. It gives us some standard message, create the files. We can do it and we click commit changes. And now if we look at our code, we have plotting.ipinb and requirements. So there we go. So now, okay. So what is binder? If we go to mybinder.org, which is listed, it's a way where we can, so it's a cloud service that starts Jupyter in the environment specified in the repository. So binder will read the requirements and make an environment and then start Jupyter in there for anyone online. So if we copy this URL, yeah, let's copy. Actually, no. Oh, yeah, we can copy that from there. And then paste. So what it tells us to do is not to click launch, but to click the little arrow down below, lower there. And this gives us a launch binder button just like we see up there. So copy that and let's go back to the GitHub repository and let's make a new readme file. I know there's this nice button there to do it. So it's called binder demo. We can paste this. Maybe we can put it below the title. Okay, yeah. So we see binder and if we commit the changes, commit directly. Yes, okay. So now our repository is described and this readme file would be, like it would say this is my example code for such and such a project. Here's my citation, stuff like that. And if you're following along online, you will see this there also. And now you can click launch binder, this button. Maybe open it in a new tab, but it's okay. Yeah. So we see a preview of it and we see it's running some stuff here. So this is building the environment. And can you zoom in a little bit? So we see it's running a bunch of code looking stuff, but it looks like it's trying to install from the requirements.txt file. And it might take a little while. I'm not sure why it took me a long time yesterday. Maybe we can take a short break and ask, did you? No, I just went over. You went to a new, okay. Yeah, it's not. So we're going back here. Just open the git repository again. So if there's any questions, please write in the notes because I'm sure there's all kinds of things here. So it's still... Building, let's hope that gets done soonish. Yeah, it took a few minutes for me. So while we're waiting, in what cases is this especially useful? I would say for relatively, well, for research code that you want people to be able to explore and that does not depend too much on particular datasets that you have. So as soon as your code depends on really large datasets, it might not be the easiest or best approach because they need to be stored somewhere and you only have limited resources on Binder. So in code refinery, we gave an example or we showed a demo, I think. There's a scientific project that was analyzing some data and making some interesting pictures and they created a binder repository out of this. So basically anyone else with a little bit of knowledge with Python could start this up and regenerate the figures and see how little changes would affect what the results are. And this was really good for science because the more people that can use your work, the more likely you already get citations about it. And that is presumably one of your goals here. I mean, of course it's also good for science if other people can explore and reuse things and stuff like that. There's a good question coming up. So would binder be suitable for projects where the same project folder has the raw data files used by the study? So yes, I mean, if the data is small, then this makes sense. So you could have the data within the repository. I would say it makes sense up to like a few megabytes of data size or something like that. Otherwise GitHub and binder hub might not be very happy because it's basically using their resources to host data. But one thing that can be done is the data is put somewhere else and your notebook itself is made so that it downloads the data whenever it runs. So the code would say, okay, does the data already exist if not downloaded? And then uses a downloaded file in that directory. Does binder run on your local machine or does binder instance run remotely? This is running remotely. So binder is run by some nonprofit group, which gives donations of the computing resources from Google and other kinds of places. Is it running? It's running, it's installing things. Yeah, so they get donations of these resources and basically binder hub is a modified Jupyter hub and it, yeah, so it's running in the cloud. So it provides resources for a short time, like after 15 or 30 minutes, it would time out and remove everything. So it's not suitable for storing things online for a long period of time. Like it's not suitable for long-term work or doing your main work. You know, if we do this again, maybe we could have a break while the binder is launching. That sounds very reasonable. While we're waiting, we can preview what would happen next. Or actually maybe we could do it at the same time. So the next section is about how can I get a DOI from Zenodo? So DOI is digital object identifier and basically it is what is used to cite papers and other things. So the idea here is that, let's say you have a project and you make some code and you want your code to be cited. So Zenodo is a publicly funded data repository. It's funded by the EU for open science purposes. It's run by CERN. But people can upload data there and it gets permanently archived. And then you can refer to this, you can cite it. You don't have to worry about storing it yourself. Things like that. And what we will do is we will connect GitHub to this. So basically Zenodo has GitHub integration. So whenever you go to Zenodo, you can tell it, okay, you can log in with, well, log in with GitHub and it requests permission to connect there so we can authorize. Firefox is managing the passwords. AM, multi-factor authentication. Yes. Okay. And here we are. So we, this is making the account on Zenodo, complete registration, or whatever. Yeah. This is for the sandbox. So, yeah. Yes. So we wouldn't want to do a demo for a course using primary Zenodo because it's permanently archived but this sandbox.zenodo.org is just for testing purposes and teaching. So okay. As you can see in the recent uploads, big stuff for more downloading. So yeah. Yeah. Okay. So if we go to the three line menu at the top right, there's GitHub down below. That's it. So we choose our repository, which would be binder demo. We would flip this to on and it goes and tells GitHub to set up this repository. Okay. And now that's all we need to do on the binder side. Can we go check binder? Okay. That's still built. Still running. And I really wonder why this is taking, actually I bet it's because we gave it these exact requirements. So maybe it's actually taking a long time to figure out what packages are even compatible with pandas 1.2.3 and map potlub 3.4.2. Possible. Actually, but actually, well, it's building the wheel. So it does know what it wants. So it's not at that stage anymore. So is it compiling pandas or something? Yeah. Anyway, we don't know. Okay. So let's go back to GitHub. So here we are. And now in order to make this code appear on GitHub, we have to do what's called a release. So if you look on the right side under releases, there's create a new release and we would, our first release, we need to choose, yeah. Yeah. There's a good question. How big of data sets can be shared on the Zodosanodo? And the default limit is 50, but I have a feeling that if you ask, they can allow more. It is the default has to be small because otherwise it gets filled up with random stuff. Okay. So we can choose a tag towards the top. We would do create a new tag. We have to push the plus button. Yeah. Okay. There we go. First release, we can do publish release. Okay. So it has this archive here. It has a zip file and tar.gz of the stuff. And if we go to Zodosanodo again, and let's see, if you click on profile, maybe. Nope. You go to, yep. Maybe click there. We have a DOI for this. So after your first release, the DOI batch that you conclude and get up right, we will appear next to your repository below. So this is our badge. Yeah. Do you want to add this to there? We can copy this. Which is a bit funny that we do a release. And then at the Zodosanodo batch, which is kind of, yeah, okay. So we are changing the release. But yeah. I mean, the release has the Zodosanodo thing in there. So anyway. Okay. So let's click there. Maybe open in new tab. This works. So we see here on Zodosanodo, the sandbox, we see the thing. We see the files that were there. We can download it. We see the citation for it. We can see it can track different versions when we make more releases. This actually looks a bit more organized than it looked when I was checking before. Anyway. That's the citation. Do you know if this citation updates if you have multiple contributors? Oh, that's a good question. I don't know. Maybe someone can answer in the notes or someone else can join for our wrap up. Or someone wants to make a pull request for. Yeah. But yeah. Yeah. Okay. Okay. Has binder finished building yet? One second. Binder. Building wheels for map one. Okay. So I guess it really is just. It's low today. I also remember that this used to be faster. So I'd propose we go back to our notes and we can have a final discussion of today. I will switch to my notes screened again. And if you see this is built, we can switch back to it. Okay. Here we are. So yeah. We've got plenty of time for a wrap up today. So what would you all like to discuss? Is there any news for day four before people start leaving? So I think we go back to... So there is one thing which uses the command line and this other environment a little bit more. But there are two things which can purely be done in JupyterLab. Oh, hello, Simo. So it's worth coming. I heard about signal basically with the requirements thing. So what I think is happening is that because you see that it's building wheels, it means that it didn't find the correct versions of mudplot leap straight up from PIP. So it starts to build it itself from source code. And because the version numbers, I think they are locked down to a previous version. So because you're running like, it's probably running most recent Python, Python 3.12 or something. And because you have all the version, it probably has to rebuild it from scratch. So that might be the reason why you're encountering this. So essentially what it means is we need to update the... The requirements. The versions that we want there or we just remove the version requirements. But now it seems to be finishing so maybe you don't want to do it at this point. But yeah, this is the kind of stuff. Like if you see something like building wheel, it might be a situation where it's like, it actually is doing something from a source code instead of installing from PIP. But yeah, like... So we're showing Thomas' screen again and it's pushing and oh, look, it... Starts the chipture. It just started. Yeah, there we go. And we see along the side, there is the plotting notebook. So I guess Thomas can click that to open it and then probably run it. And there we go, yeah. So and... Do you want to try the thing without... If you... Okay, like stack requirements, what happens? Yeah, well, we can try that later on. Yeah, we can try it later. I think we should do wrap up. Yeah, but yeah, I agree that is very like...