 Okay, so binder, what is binder? So the plan now is in the next one hour. Let me actually know, let me see whether I'm already sharing something. No, I'm not, it doesn't matter. We will share in a moment, I don't need no. So what binder is, this will be a tool that we will demonstrate in the next 50, 60 minutes, 50 minutes, 40 minutes, which connects nicely to the dependency management. I think it will also connect nicely to the data visualization, doesn't it? Because the problem that we try to solve now is, so let's imagine that we created a great visualization on Tuesday. So we read the data with Pandas and we did some statistic analysis and at the end we got some nice plots. And now we want to share our plots and our research with the community. And the traditional way of sharing this is to share it in a PDF, but that is actually not the best way maybe. If you have tried to copy and paste something out of a PDF or if you try to change something in a PDF, I mean, if you want to rewrite. So this lesson is really how can we do it better? How can we share our research in a better way than just putting it in a PDF? How can we make it reusable? How can we make it reproducible and even modifiable? And it should still work in five years. So that's what it's all about. It will be, it will connect nicely to Jupyter notebooks, but it's not only for Jupyter and it's not only for Python. I mean, you can also, the tool that we will show, you can also use it for our studio in principle, actually anything. Right, yeah. Anything that can be in Docker, which is anything. Yeah, so we will share a container. So, let's see. So Binder is a cloud service, I believe. Yeah. Should we open the lesson and have a look and start? Yeah, let's go. You have it open, I believe. Yes. Are you opening it? I have it open. I know the font is too small. I just wanted you to just find like where you have to find it. So now we will be here in Binder. So I'll open it up and then I will zoom in. Binder. Okay. Zoom again. Did you already say overall what Binder is? It's a web surface for running code. Yeah, I think. With the Jupyter interface. We can imagine it as a web service that is free or where I can run Jupyter notebooks in the cloud dynamically without even having Jupyter installed and without having any of the other dependencies installed. So anybody else can really revisit all my pipeline from data import to explanation to the figures. And again, we run it, we can even modify it. I guess this is sort of the main point. So maybe the point is that you can do this and then anyone else can run your code. So, yeah. So it's a way of sharing your work in a way that allows people not just to see it, but also to use it. Yes, they can interact with it. They can go in and say what would happen if I change this factor two to factor three? How would the board change? They can just go and read. Maybe the most common scenario is just to rerun it. So as an example, there been one of our examples in Code Refinery was that someone makes an article and then they have their code and data in binder. So someone can start and basically recreate their exact plots and do what I don't want to say. So try adjusting the parameters and see what is going on. And now the things that Sabri mentioned, requirements of text, environment or demo, now we will need these things because this will define the environment and it will tell a binder which precise environment to recreate for us. So now these things will be important. So just some questions, motivations that we want to clarify. I think the take home that I will try to pass along is that sharing code and in this case, code can be a data visualization Jupyter notebook that sharing code alone may not be sufficient. And we will have a bit of a discussion in a moment why what problems can appear. Some of it we have already discussed in the dependency management lesson just before. We will show one way of really sharing the computational environment through binder. There are other ways, but this is something I think a very nice way for Python and Jupyter and we will demonstrate it. So for one example, I will demonstrate how to do that step by step, but the steps are also documented here. So I will really follow this lesson. And so we'll demonstrate this service called binder, maybe depending on how the timing goes, maybe we have time to discuss also how to make this citable. So how to get the digital object identifier for your notebook in us through services like Zenodo. And then you can, then you can cite it and it's preserved for forever. Okay, should we go? Yeah, let's do it. Okay. So this exercise one here, it was designed for a group work, but I think we shouldn't assume that people are in groups, although I think watching this in a group it would be really, really fun. So we can take this exercise one as a HackMD discussion. So maybe somebody can help me copy-pasting this into the HackMD and I will have a look at this, it would be fun to hear from you. And in the meantime, let me introduce the question. So what do we want to know here? When, here's an example from the, I guess it's geoscience community. So you may come across a code that does some Python stuff and on top it imports from a package, some modules functions. And this may not be enough. It may be difficult to rerun it a few years down the road. And just to give it here like a tangible example. So the example is that Leah is a PhD student in computational biology and spent two years of intensive work and now the paper is ready to be published. The code she has used for analyzing her data is available on GitHub. GitHub is a service that many people use to share code and Jupyter notebooks, I will also demonstrate it. But her supervisor who is an advocate of open science told her that sharing code alone may not be sufficient. So what problems can you anticipate? And I will have a look here. Okay. I'm switching to HackMD. Thanks. So we see data might be missing. Yeah. Code without documentation. Code without documentation is hardware points. And we have also then discussed keeping data and code close to each other if it's possible. Yeah. We just discussed dependencies. So dependencies have newer versions into the five years. Yeah. Which always happens. So somewhere a place where we list all the dependencies that we have. We might want to know which versions. Even if I do that, can I still get some problems? Even if I list all the pattern packages that I need and the versions, can I still get problems? Let me see. Anybody? Have you ever had a case of code that just doesn't run sometime in the future? Like it's made on some processor architecture which doesn't exist anymore or like somehow depends on the basic system C libraries or something. Yeah. Those evolve slowly, but they do evolve. So I have seen code that doesn't work. That only depends on the system libraries because it's 10 years old. And great. These are great keyboards. So system libraries, I also see an answer there in Hack&D is that when we import Python packages, they might have something else than Python underneath. An example is Mapotlib. It's not only Python, there is also Fortran or NetCDF or these geo libraries or SciPy. So they might be actually more underneath and they might depend on libraries that are in your system and they evolve as well. And then we might need more. And then one rescue can be a container. A container is like documenting actually the operating system not only the Python dependencies. And this binder service that we will demo now is one way of running this in the cloud and having it reusable or reproducible. But there are other great points here on Hack&D tests, operating system. If something was compiled, how was it compiled? Examples, there is processor dependency. It's very complicated. And I think we shouldn't try to over engineer it either. We shouldn't be afraid of sharing something just because it may not work in 20 years. It will probably not work in 20 years, but at least we can try that it works for a couple of years. And even if it doesn't work, then at least the documentation can help because somebody can then go in and do some archeology and try to recreate it. Yeah. I guess at least if it's shared, if someone needs it, they can do the work to maintain it themselves. Right. So let's try it out. Maybe we can go back into my screen share material. Okay, right here. And we will demonstrate one way of doing it. So here, what we will try now, and this is a very nice picture here, is that we created this Jupyter Notebook in step one. We have this Notebook in step one. And what I will now do in step two. So we will do the step two. We will create a Notebook. I will put it on GitHub. And this is not a course about GitHub. So we have other courses and workshops on that, but we will upload the Notebook to GitHub. And then from there, we will run it through this binder service and anybody can visit it. Anybody can rerun it. So that's the goal. And one example that we can take, you can try it with any other Python code, but the example that I will use is the, one of these visualization exercises from Tuesday. So this is, we have seen that before. So I will, step number one, I will take that and I will put it into a Jupyter Notebook or my computer. So I'm still on my computer here. Let me try and start it out. And I understand this is tiny. I will make it a bit larger. First thing I should do, well, let me first copy paste, but before I do anything, we should rename. Let's call it, I don't know, visualization and I should also try to actually run it. Does it still work? Nice to see what it does, yeah. Okay. Yeah, seems to have some work, good. So let me save it now on my computer. I'm still, I saved. So this is not because of my computer, but now I want to share it to the world. So step number one, I will put it on GitHub. GitHub is just one of the many places one could put it. There's GitLab. Yeah. We will not go into that much. I guess this is the reason why this is a demo because we don't want to require it. Exactly. Okay. It would have been in 15 minutes to explain Git and GitHub create accounts. It would have been a bit too much, but I still encourage you to go through this. Yeah. To this exit. So now I'm on GitHub, the place where I can put code and notebooks, I will create a new repository, a new project here, just for this, just as a demo. Okay, and let me zoom in and we give it a name. I don't know, Python course, Python demo 2021, just demonstrating binder. So I create this public. I want this project to be public because I want to share it actually. So in this case, I want to have a public and I will create a readme file there. I will, the other things I'm not worrying about it now. So I create this project right now. There is not much, there is a readme file in there. Yeah. And in this project, I will now collect Jupyter notebooks. For instance, the one that I just created. So let me, here I can edit. Oh, nice. Using the web interface. That makes it simple. Just using the web interface, create a file, upload files. I want to upload, choose your files. Is the notebook saved? It's saved on my, so now maybe outside of the screen, just because I don't want to share everything that is on my hard drive. So somewhere there is, there is a folder where I saved the notebook and I selected it. Visualization.i.py and B. I need to document what I did here. So uploading my notebook and I commit, I saved this. Okay, uploading, uploading. GitHub and Git is wonderful for so many reasons that I will not go into now. It's a different course, it's a different workshop. We have now this notebook on GitHub. And actually if anybody can visit it now and actually feel free to, you know, I can be biased. I'm not afraid to. I'm tasting it. So anybody can visit this on GitHub. It's already pretty good to share a code like that. But it's not perfect. What there are two problems. One problem is you can actually have a look at the notebook. That's great. If you click on it, it will show me the notebook. It shows you all the content. So someone can see the plot, see the stuff. I mean, better than most of the ways we share stuff already. I think this is really, this could be very often, this is good enough because anybody can now go and copy, paste it out and reuse it. And of course you can download the notebook and run it on your own computer. Good point, you can do that as well. So here on, how would I do that? I can hear down somewhere. I think the three dots there and. Is it wrong or? Well, at least I would do that and save file but that's not exactly ideal. Yeah. Okay. So somehow. So there are ways. What is, that's great. One problem is that I can't actually edit this thing. This is, it's just an image. I can go in and modify. It's not a dynamic notebook really. Second problem is I could decide to delete this project on GitHub and if, if I do that, well, then my work will not be reproducible in five years. So maybe we will solve both problems in this lesson. Now we will take it, let me take a look on the, well, maybe you can help me relaying any questions from, from I can be. None so far, let's say carry on. Yeah, no questions. It always means that it's maybe too fast or slow or completely out of scope. But. I think it's good. Just keep going. All right. How much time do we have left? Do we have like 20 minutes left, right? Yes. Oh, 10 minutes. I can. 20, 20 minutes. Yeah. So now we will take it one step further and this is, this is this card here. So I will, now we remember this, what we explained requirements of text because this is a file that the binder will understand. So I will add, I will create this file and I will add these dependencies to it. So I want to document that this project depends on pandas with this version. It probably also runs with other versions, but I know for a fact that it, because I created it with this, with these versions. It also depends on that part of it. So copy, let's go back to here. And here I will create a file called requires the text. Add file, create new file. Requirements.text. Now I cannot make on typo because that would confuse binder. Copy paste. And now I need to write what I did. I'm documenting dependencies. So in the previous lesson, we mentioned that there are many programs that expect the file to have exactly the right name and this is one of them. So it is actually looking for requirements.txt by exactly that name. So I saved this thing. The file is now here. Also that is already useful because somebody who wants to now run the notebook, they could also open the file and have a look and see that it will probably work with these versions. But now we take it a step further and let me just adjust here a little bit the screening. Now we will visit this wonderful, wonderful service called binder, mybinder.org. I'll open it up. This service can turn a Git repository into a collection of interactive notebooks. We will have interactive notebook in the cloud. And the notebook can be in different places. It can be on GitHub. It can be on all these different places. In my case, it's on GitHub. And all I need to do is I need to copy paste the address into here. Okay. So we tell it the name via, yeah. It refers to GitHub, okay. And what it will do, well, it will go in there and it will look for this requirements of text file. Alternatively, environments.yaml, it understands both. It will create this environment and then I can run whatever is in there in this environment. And I could click on launch and it will just do it. I will not do that instead. I will do something else. It suggests that it suggests me to copy paste this into my readme file. So I'll let me copy paste it then. Because then anybody can just, anybody visiting this repository can now launch Binder. So into my readme file, I add this thing. And maybe instead of explaining what's happening, we will see the effect of it. Adding our binder batch commit safe. And what I got now is this button here. So anybody visiting this project on GitHub sees this button and now we will, I will click on the button and open it up in a new browser tab. And now Binder does some thinking and some working. It's now went in there. It looked into my requirements of text. It's now installing an isolated environment here. It actually sets up a Docker container and it will install these dependencies into it. And we will wait a couple of seconds and use this time to have an eye on HackMD. And in a couple of seconds, maximum one, two minutes, it will have this notebook running for me. And then all I need to do is to share this with people I want. So even I could refer to this in my publication. Later, if we have time, we can try to give it a digital object identifier. But this is such a more useful supporting information to a publication than just an image in a PDF. So a very good question while we're waiting on HackMD. Who is running Binder? Who is Binder? It's a great question. So I think it's some sort of, they got some public funding and a grant to work on this and have been slowly developing their thing. Some of the resources are donated by different cloud providers and organizations as you can see on their page. I can also add to this that the Binder itself is open source and you can also run Binder yourself. Well, probably you wouldn't do it as a person, but it might be an idea for a community to set up their own Binder instance or for university or like a national service. So it's possible. And I can't say that the Binder people overlap with the Jupyter Hub people. So that's basically the same team and they're using it to fund Jupyter Hub development in part. Yeah. And while this is still spinning up and it takes a bit of time, the first time it's creating this environment because then the environment will be cached. So second time I will visit it, it will not take so much time. But I can use the time to answer the question about programs written in C++ and other languages. So yes, you can run in fact any, you can run any Docker image in Binder. Of course, if you use myBinder.org, there is a time limit to it. So there is a resource limitation. And I don't remember how much the resource limitation is. It's normally no problem at all to run these notebooks, but for heavy duty calculations, you may hit a limit there, but it doesn't have to be Python. Okay. Let me see, we can still wait. Let me see what's happening in the lesson just that I didn't forget anything. Still working on it. So in a moment or two, there will be an old book, but there will actually be a Jupyter Lab running on this address. So running on somebody else's computer. And the nice thing about that is is that somebody who visits this and is patient and can wait a minute or two can run this and all they need is a browser. They don't need Jupyter, they don't need Python, they don't need Matbotlib, all they need is a browser. I think this is really wonderful. Maybe we can let it continue spinning. Aha, here it is. Here it is, Jupyter Lab launching. This looks somehow familiar, but it's not on my computer. It's on this address here. And now I can go in and open a notebook and I can do run all the cells and I can even go in and modify things. So what happens if I change this to, then it will become more transparent. So this is now interactive, this dynamic. Wonderful. Yeah, great. Now, depending on timing, we could talk about this part here. It's a bit, depending on the schedule, I would be already, I think I'm already happy what we have achieved. So this can also be consumed in reading. We can download it as well, if you want to. We do have 10 more minutes. So I'd say we're answering the questions pretty well. So let's press ahead and see how far we get. So this is a bit, this is another wonderful service, it's an auto run by CERN and other organizations. It was designed to host data from CERN experiments, but these days it's used for, you can make your code and your notebooks and your data citable through the service. It's not the only service. So what I will now demonstrate here in very few minutes is something that is useful, not only for Python, I would say any code that we create is to eventually make it citable. Actually, it solves two things, we make it citable, but also we preserve it. So even if I go in and I delete my GitHub account or repository, Zenodo will keep a copy that will hopefully never move away. And here are a few steps on how you can achieve that. If you want to practice this, I recommend to do that with the sandbox. So Zenodo has a sandbox, it's like a practice field where I can experiment and I don't have to be afraid of breaking things. And once you know that this is working, then you can go for the real thing. And the reason why I will now go for the sandbox is that this is only an example, it's only an exercise. I don't want to preserve this exercise for the next 30 years. Zenodo makes it difficult to remove things for a good reason because there's point of it. I mean, all right, so let's practice on the sandbox where we can just experiment and we don't have to be afraid and it can be removed. Good, so once you visit, this is how it looks. And there are already some articles and maybe codes and data published by other people. I will log in here with my GitHub. You can log in with Orkid or you can create an account. I still verify, I'm still on the sandbox. I'll log in with GitHub. And Zenodo has a very nice interaction with GitHub. Is that now what can I do? I can, all we need to do to make, to get a digital object out of the fire and to make, to preserve their notebooks is to activate this repository on GitHub. And now there are many. I just need to find it. What was it called? Python, something, Python. So I have too many repositories on GitHub but somewhere there is Python. Python demo 2021, that's the one. That's the project we just created. All I need to do is switch this to on. Now Zenodo will watch this repository. And whenever I create a release and release is, it's like a milestone. Release could be, for instance, when I publish something or release could be at the end of the PhD thesis. So whenever I now create a new release, Zenodo is watching it and for every new release, it will create a new digital object out of the fire. So all I need to do here is create a release and maybe, how should we call it? One of zero or so this is the published version or this is the version, this is the pre-print version could also be and later we can make modifications to it. And now once I publish the release, publish, is this, has this happened? Published, we weren't able to release okay, I forgot to do something. So I need to do, it works like this. Create a new tag, okay. It's a detail, not important now. I created this release on GitHub and now in the meantime, Zenodo is, if I go in here, it actually already did something. So it created a digital object out of fire for me. Now I can share that. People visiting this DOI, they will find a copy of my project. If you actually now maybe, if you've now tried to go to this DOI, you may not find it and this is just because I'm on a sandbox, but the mechanism is the same. And connecting it back to binder. Now the even better thing would be sort of the code standard would be, I could now go on my binder, my binder.org and instead of binderizing the GitHub repository, I could actually create it from the Zenodo DOI here, super. I think this is all, all I wanted to show, I really encourage you to try this out. It will probably take more than 50 minutes. Try it out, try, test out also the Zenodo sandbox, try all binder, here are all the steps. There is much more you can do. Or there are these citations you have for us. So you can tweak and configure, but this really the, just getting it to run was very few steps. Key point, what was the point of all of this? We have a mechanism to really share an interactive reproducible computational environment. I think binder is not even the only example, I think there are other tools, but it's one example for many of these. Someone mentioned, in HackMD mentioned Google Colab and asked what the difference is. And I'm, well, I mean, Google Colab essentially uses Google Drive to house the notebook. So it's not on GitHub, it's not using Git, but otherwise it's just very similar. Yeah, it's probably good. Two comments there, of course, there is a for-profit company behind it. This is not the case in my binder.org. The other comment is that Google has also terminated projects in the past. So there is always a bit of a question, like will it still exist in 10 years? Will they decide that this is maybe not something they want to continue? So in terms of sharing research output, I think maybe Zenodo is the better place, but what can still collaborate on Google Colab, but still good to them, put it somewhere where we know this is not going away? Yeah, I would compare Google Colab and Binder, not Zenodo. Zenodo is for keeping something forever, or as long as possible, like having something. And Binder GitHub also, mostly actually GitHub and Colab are for collaborating on something in real time and sharing. Great, I'm checking whether there are any other questions. Okay, well, there is also another question, other tools equivalent to Binder. So I don't know of anything we haven't mentioned, but do you? I don't, I'm sure there must be. CoCalc is something I know of, but I don't know if that's like Binder. Binder is a very specific kind of thing because it wants anyone to be able to do something and when you don't have accounts then it's hard to monetize somehow. Questions about the code changes in Binder? So what if I make changes? I can make changes, I can experiment, but if I later log out or if I forget my tab open, after a while it will stop working and Binder doesn't save anything. I mean, it creates this instance for me and later everything's vaporized. So if I want to save, if I really want to save this, my changes, I need to save them and then I would save it to my computer, I would commit it to GitHub. So if you change it on GitHub and restart Binder, you will see the changes on Binder, but not the other way around. Right. Good questions. So I think we have perfect timing. We could go soon into a break. Yeah, should we do that and come back at, well, should we make it exactly at the zero zero of the hour, not go too much ahead? Sounds good. And then after the break, we will learn about the packaging and then like a wrap up discussion and then after party, right? Yes. Exactly. Thank you for listening and see you after the break. See you soon. Bye.