 of today. The first lesson of today will be scripts. So going from the Jupyter notebooks, more into a command line area and kind of the very very first step to make code really reusable without modifying the code itself. And that's related to this first icebreaker or second icebreaker question here. How do you run the same thing over and over but with slightly different parameters, like the same code with different data? So, should I share my screen and we can get straight to the lesson? Yep. Sounds like a plan. I will push a button here and it transfers. I adjust that. Okay, so as a reminder for the schedule coming to scripts here, my openness. So what does the word script even mean? To me, script means a kind of self-contained small, very small program that doesn't have a lot of interactions yet, but that's just some very small thing. It can call additional libraries, but in itself it doesn't provide a lot of code. That's what I think of when I'm talking about a script. When it gets more complex, I would rather talk about a program. So is this lesson really about scripts or programs? I think both. It's about the transition from different systems of using code. So the point is this lesson, so so far we've been showing things in Jupyter. So basically you can click and run things in whatever order you run. So now that we have scripts here, we'll be able to plug our code into other things a little bit better. Yeah, at least we will be able to rerun our code with different parameter sets without actually modifying the code. Because if you're modifying a Python script or the Jupyter notebook by editing your initial conditions or the source files or whatever, you're always modifying the code. Okay. So up here, there's the section on why scripts. So if you had to convince the people listening here that you should consider making a script or a program out of your code? Well, the main reason I would say is it very often comes down to I have a certain pipeline that I've developed for one example. And now people want to use this pipeline for multiple different data sets. Of course, if I have that in a Jupyter notebook, I can every time change the Jupyter notebook, change the input variables and run it for their data set. But that needs me to do it again and again and again, and I will likely do typos and whatnot in there. So I'd rather have some kind of folder where I have the data sets and I want to go through all the files in that folder. And potentially I want to run this on a cluster or something. Yeah, that's a good way to say like some people come and want to use our cluster to run tens, like they have hundreds of input data sets and say, I have this Jupyter notebook, how do I run it on a hundred different things in parallel? That's sort of hard. But if you make a script out of it, then we can make it extremely easy to just run it with a hundred different input parameters. Yeah. And that's where scripts or the transition to Python scripts from a Python notebook comes in. So we have two exercises prepared for today. The first one is coming up shortly. So it's just about converting a Python notebook or a Jupyter notebook to a Python script. Yeah. Because here's our short term roadmap. So I will do a demonstration of converting a notebook to a Python script. And then you'll have five minutes to try this yourself. Then we'll come back, talk, demonstrate more, and then there'll be a much longer exercise where you can finish up number one and try some more things. Does that sound good? Yeah, hopefully because that's what we're going to do. Okay, so I'll show my Jupyter lab here. So, um, yeah, essentially, if so, this is the instructions here, should we go to the exercise and show? Yeah, you can also just download the, download it now. Okay. So I'll use the same trick I used before. I will right click here and do copy link and come into Jupyter lab and do file open from URL and paste it here. And it both downloads it as you see on the side here. And it's opened it in Jupyter lab. Yeah, that's how you're running it to show that it works. Yes. Okay, there we go. Okay. And now you're essentially want to export this into a Python file. The easiest is just going to file, save an export notebook as so this will download it to my computer. Yep. And not say file as save and export file. Further down, save as executable script. Executable script. Okay. But this is downloaded. And where is it? Now, so another, I suppose we also do, we try it from the command line here. So I think that what I'm about to show will work for more people. I'll go to file or at least it's easier than immediately. New launcher. And from the new launcher, I start the terminal. And I see this is the directory where I'm doing course things. So this may be the first introduction to command line for some people. So for some people, this may seem impossibly hard. For some people will be impossibly easy. That's just sort of how it goes, unfortunately. So if it's too hard, take this lesson as a demo and work slowly and come back to it later now that you're inspired. That's all I can recommend. So I'm copying this line here. And I will paste it. And for me, control shift V works. I think just control V might do something. Control V actually does work. Anyway, so if I run this, then it converts it. So it says it's writing data to weather observations.py. And you can try if it works. If I open the file browser, I see weather observations.py here. And if we come here, we can try running it this way. Python. Oh, it looks like it ran. And if you now type LS or dear, depending on the operating system, you see that there is a weather PNG being created that wasn't there before. Let's come back to the file browser and we see weather.png. And if I click on this, the same image that was generated before the same. Okay, so it was also generated from the from the notebook. But it was updated. As you can see, it was created seconds ago. Okay, it actually works. So let's give our videos hiding. Let's give our viewers five minutes to just repeat this again. And see that that works on the machine. So we'll give you five minutes. Yeah, this if things go horribly wrong, then, um, then, well, we're sorry, some things gone wrong and sit back and do this as a demo. Okay, thanks. So five minutes. See you shortly. Bye. And then we should be back on now coming more to what the actual use is of the of doing this and putting it in a script. Yeah. So reading through the notes, there's some different problems here. Um, and this is sort of, I mean, this is expected. So of course, we don't want problems, but there's always some places where, um, like someone's own computer is different or with different operating systems or I have a feeling that there is some cloud service, which is not working and doesn't provide shell access. And this is though. Yeah, in that case, much we can do about that. Um, especially if the if Jupyter's run on a web service, then it's mostly out of our control, unfortunately. Yeah. So if it doesn't work, just, um, watch. Okay. Or if it doesn't work, um, what I would suggest is you download the file, um, copy the data from the file into, uh, in, into a new file that's, uh, that you call.py because it mainly is exactly the same. It, uh, that Jupyter notebook is mainly Python code and doesn't contain a lot of other stuff. Yeah. Okay. So let's continue. Yeah. So, um, the nice thing, uh, was, um, was having this as a script is that we now have access to command line arguments. So if you run this from the command line like we just did, um, when it was in the terminal, we, uh, we can add additional inform, information in. Um, and though, and that information is accessible inside the code, uh, via the sys.argv, or via the sys object and, um, via sys.argv. And it's interesting to know that the first are the first value. So argument, this argv stands for argument values. The first argument value is always the program or the, the, no, the script name that you're calling because it's actually the arguments to Python that you're getting there. Yeah. So basically I could run something like Python weather observations. Uh, as an example, hello.png and the code can figure out that this hello.png is where I should save the figure needs to modify accordingly, but yes. So from the command line, we have A and B arguments. And from the code, we have it accessible as argv one and argv two. So, uh, yeah, yes. Um, the example ago, uh, also adds the, um, the time range because we are looking for something, uh, with weather observations that maybe we want to change the, uh, where or what times we actually want to look at. Yeah. But yeah, um, we can also first just change it so that it just, uh, changes the output for output name. So I will come, I will open the script itself. So here the icon is a Python script thing. So I double-click this and I've opened now weather.py and I see this script. So it looks a lot like the Python code in one, into where the cells were. So what do I change? So if, uh, okay, if you, if you want to change that the first input argument is the, um, uh, we will depart from the, from the code that's in here a little bit at the moment, um, just so, just to make people aware of that. Um, so if we want to have the first input argument as the output file name, then we would need to go to, um, for, uh, essentially to the end of the script where it says fig.save fig. Well, let's do what it shows here in the screen share. So, okay, then the first thing we need to do is we need to actually import this sys, um, so the system information and that's just to know that, okay, now we can access this system. It's a package that's always available in Python because it's a basic package, um, but we still need to import it so that we can access the information. Yeah. And so now if you're watching, if you're still working on the exercise, stop and just listen to us. So that's what I'd recommend. Um, we will change it so that we can, uh, set the start and end date, um, for, with the first and second argument arguments are always, uh, command arguments are always separated by spaces. So the first in that instance actually is number one because zero, as we said before, is the script name. So that's just RGV one and the second and the end date would be sys, RGV two. And then we want to have the third argument being the output file name. And, um, we can either directly say here, um, sys.RGV three, or we can create a variable in between that is that indicates the output file name. Okay. There we go. Okay. I've saved it. And if you now, now go back to the terminal terminal. So Python weather observations from the first of March, 2021, the end of May. Let's see. Did it work? Let's come back to the file browser and I see there's an image now called spring in Tapiolla and it looks different for sure. And it's still quite cold and cold. Actually, there's something odd here. Might be that the data actually doesn't contain the March part because it only starts at, uh, somewhere beginning of May. Okay. Okay. So, um, so, yeah. Okay. So now, now what do we get from this? Well, we get the option to, uh, that we can now essentially modify or call the same code but use different input variables without going back into the code and modifying, modifying the script. So you can run this script multiple times with different input parameters that has the advantage that you can essentially call it from some other tool or you can call it in another script, um, and can automatically call it with a list of, uh, of things. Yeah. Making it a bit more robust to change, to potential changes. Yeah. So now before we go to the exercise, there's a little bit more. So here, what we've done in this code is very basic. So we're just using this, this is our feed directly, but there's more advanced modules to do this. So what can you tell us about that? Well, there are a lot of so-called command line argument parses. All of them have their advantages and disadvantages. Uh, the, it's essentially the more complex stuff you want to do, uh, the more very likely complex code you have to write in the end. Um, for simple programs, I think our parse is quite efficient because it doesn't need a lot of, um, yeah, a lot of details, but it's not as flexible. So if you have, have some requirements where, yeah, you can have requirements that just are not achievable with our parse. Um, all of these parses, uh, commonly, uh, give some kind of, um, positional arguments or named arguments. Named arguments help you to, uh, if you have multiple arguments and you have some default values, uh, you don't want to, um, be, uh, have to make sure that, okay, position, uh, position one is always that position two is always that and always have to check, okay, is it the right thing? But you can essentially write, okay, in the end, or you can in the end write, um, whether observations by minus, minus, start date and give the start date, minus, minus end date, give the end date, minus, minus, um, five output file or output file name or something and the output file name, um, making the command line argument that your arguments or the command line call that you give a lot more readable and a lot less, um, prone to, oh, that was the wrong position. Yeah. So I believe the next exercise is having you try to modify this weather script to do what we just demonstrated and also use ARC parse. Yeah. Let's see. Yes. So there's some, well, it's up to you what you work on. So you can keep working on the first part if needed. You can try to do what we just did. You can use ARC parse if you want and have extra time. Um, then there's another exercise, which we will talk about, but you can try working on it now. Okay. Um, should we go to the exercise now? I think so. And when we come back, we'll quickly discuss a few more details, but yes. So we call, this is 15 minutes, so a relatively long amount of time. Okay. Great. So see you at 45. At 45. Okay. Bye. Hello. We're back. So let's see. So yeah, so from the notes, it was clear there were lots of different problems in here and that's sort of, that's unfortunate, but, I mean, it's, this really is a pretty interesting and difficult thing because this is the first time we're going from pure Python to going and connecting to the bigger operating system you're running on. And there's so many different variations there. There's things like can't access pandas, for example. So that means your Python on the command line is running in a different environment from that on the, uh, the, yeah. So if you're having really big problems here, don't give up. So take it and basically go to one of your colleagues who can sit there with your screen and ask, okay, so I'm trying to do this lesson. Can you tell me what's going on here? And you can get some help that way and it will work much better. As long as you sort of understand the general idea of what we've done here and why these scripts should be used, then you've got to the main points. So with that being said, what have we learned here in the end? Well, I hope we have learned that you can, by changing the notebook into a script or into pure Python, use command line arguments to modify your, or to modify what your program does without modifying the program's code. And yeah, I think I would call this program because in, in the end, this is exactly what kind of distinguishes things here. If you have command line arguments, that's something that you kind of always have to call a program with because any other part that is self-contained but a program you can call with different arguments. So if someone comes to us and wants to run some code with a program on our cluster on many different input files, how would we approach that? That depends a little bit. So if the input files have a certain naming convention, or are just numbered, then it's relatively simple because they can essentially just go through the numbers and have a, have a for loop in a bash script, where you, with that for loop, call the individual, call the individual files or call your script with the individual files as file names. If they are not ordered in a specific way, you probably need to either provide a file that contains the file names and you read those in or you might go into, well, either write an additional Python script or you, or you provide, or you change your script so that it reads in a certain file and if, or a certain folder and goes through the folder. But if each, each of the computations is quite expensive, so you can't just do this in one Python script or it would take ages, but you want to parallelize the computation. You will write, likely write some additional script that loops through those files and then provides them as inputs for the, for your, for your script at the end. And those inputs are then run individually and each use one job on the cluster. There is, there is no clear this, this and that because it depends a lot on your situation. But if, but one thing that happens quite regularly is that people have input parameters and and want to, want to go through, let's say a thousand different input parameters that they have more or less predefined. If they have them predefined, they can in the script define them once. And then the calling for the input parameter or the command line argument would be the position of the parameter in the list of parameters that they have defined. And each run will take one input per input parameter set, for example. Okay. So with that being said, should we go to a break? I would give a few more words about additional options like using config configuration files. Because I think the problem with command line arguments is that at some point the command line string gets really long. And even if you have an argument parser or something, it becomes not really well readable anymore. The individual call are running out of time. So please continue, but let's try to be and I just want to mention that there are then methods that you can have a look at later on to write the configurations for your, for your program into a configuration file and use that as and just be given a configuration file as input argument that is then parsed. It's just what I wanted to mention. Yeah. So like here, for example, we've defined all the parameters for the plot and it can be generated. Okay. Great. So let's take a break until two minutes pass that hour and we'll see you next and bye for now. Bye-bye.