 So, next up is Simo with his talk, Tour of Scientific Computing Skills and Tools. Yes, let's see if I get the share going on. Yes, are you ready? There you go. So yeah, Simo is one of my colleagues at Alto. He's been here longer than I have and, well, he's sort of a wizard on software and stuff like that. Yeah, I can quickly give an introduction. So, hi everybody. Simo Tuomista already been here for a few talks, but mainly. Yeah, so as Rich said, I've been here in Alto for a long time already. I basically, like I did my masters in physics, computational physics, but I chose this kind of a, this kind of a route instead of going towards like academic route because I felt like this kind of a crafting atmosphere that the scientific computing has is better suited to my skills and interests. But at the same time I completely, well, I'm very interested in what people are actually using these things for. But mainly there's like, yeah, I'm more of the spectrum of I like to do the things, but not necessarily do the writing of the paper and stuff like that. But so, but let's start with the talk. And I will want to go through a quick journey of like scientific computing, and what kind of tools do you need throughout this way. So, so the journey for scientific computing is, is winding one, like we had in the previous talk of one off and run before that. That typical scientific computing workflow might look something like this. So you have some experiment and then you do some models and, and then you do some raw data and you have lots of different things that you need to handle throughout your process. So, so it's usually a good idea to take some best practices or like take some tools that you know are good and reuse those tools, because if you have bad tools, you might reuse them. So there's this quote by this, this sociologist sociologist, I think, Abraham Maslow, who said that I suppose it's tempting if, if the only tool you have is a hammer to treat everything as it were a nail. So basically, because you're learning constantly while you're doing, and you're, you're doing stuff and you're learning because, well, you're doing stuff. You immediately start to learn some tools. And if those tools happen to be the wrong tools, you might end up in a situation where you start using the same hammer for every nail. And in scientific computing, it's usually better to be a bit more flexible with your tools and, and just like go with the flow and use the tools that are good for the situation. This means that you won't end up in this kind of situation, like in this XKCD comic, where you try to fix problems that you created and so forth. You, you, you get, you can build upon your learned information instead of like learning about savage and then keeping it on. So, so I would highly recommend, like, keeping a critical eye at your workflow and I do this constantly myself, like I often feel like this, like you know these informercials where you have the black and white person who was like, in the black and white image is like, they got to be a better way. Like I often feel like that. And if you get that kind of a feeling that usually tells you that you're not using the correct tool, like it's you yourself telling yourself that like something is wrong with this situation. And it's good to recognize that feeling because then you know that okay, it's probably a good idea to, to at least not necessarily switch completely if you're already halfway through your paper, paper and you want to get stuff done. You cannot necessarily have the luxury of switching completely, but it might be a good idea to like recognize this feeling. So, okay, but let's go to actual good examples. Can I ask you a question quick? Yes. So what tools have you ever been in a situation where you, you change from the hammer to the, I don't know, the better tool, but your collaboration, your collaborators are not excited about they have to change to the better tool. Well, yeah, yes, like we have the software building environment like this is ongoing hammer that is hammering nails that we have that I have built for us in internally, and it's currently I think in this fifth iteration. It's been rebuilt, at least about five times because like the nails turned to screws and then you need a different tool and, and it's unfortunate that many people usually often have to like look at them okay what has been changed how to deal with it. But at the same time that that is usually also a good idea when you have like, you have something you have created. And nobody else understands it but you, you know that you also have a problem, like if you have created something you have created so complicated tool that you need to have like a big instruction manual to read it you also know that okay maybe I went over both with this. So, yes, this, this has happened to me multiple times and it's, it's not something you can learn out of immediately, but you can try and usually, if you try to meet like make it simpler, like, this is saying keep it simple stupid. If you try to make it simpler and simpler, usually you end up towards like the correct solution. But, of course, it is there's no guarantee that you will ever get something that is finished, but it's better to be critical about your flaws than like and it's better to be done with something that you always try to aim for like perfect because then you might end up also not doing anything. You still need to get stuff done. So it's this balancing act with between trying for perfection but falling and catching at least some, some, some nugget of gold. Okay, but let's go through a few situations where you might what you might want to like do when you're doing scientific computing. So first situation is like, let's say you, you, you, like, you go to this journey of like small journey of exploration, maybe have like a new data set or something like, like something you don't know if it will work you don't know what happens, and you want to like do some preliminary studies you want to check out a new maybe some new framework or new package or something what does it work. So what should you pack for this kind of how should you like start working on this kind of a project. These are completely arbitrary chosen by me, but I would say that the few things that really help a lot of our users is that they pick like a scientific programming language that can do like more than one thing. They pick an editor or IDE with some syntax highlighting. And then they use an interactive ID that allows you to write your like the ideas as scripts. So, quickly, make this a bit bigger. So let's first go through the first point so what are scientific programming languages. So, this is my definition, but I would say that scientific programming languages are languages that you can do some general stuff with, but also you can do scientific calculations. So, first off, you want easy file input output, because like if you ever written like in CEO Fortran, you try to read data in or out. It's not fun. It's not fun. It's not, you have lots of CSV tables and it's, it's not fun. It's not something you want to do like writing wrappers for like creating your input output. You need a language that can do that. That makes it a lot simpler. Like you don't have to start writing like printf statements and stuff like that. It should have some mathematical functions such as linear algebra, cosine, cosine, typical stuff, but also like integration and stuff like that, like stuff you need. So, easy plotting features so you don't have to go above like you don't have to go around like and search for something higher and other things like this. So what are options. You can look at the rest of the things later but options that are nowadays more popular, most popular a Python, which is like, it's a general programming language but there's so many scientific packages. The most popular currently but of course it probably won't stand there for millennia. Somebody will go and replace it eventually but currently is the most popular language for scientific computations. Then there's Matlab, which is a commercial product. So that's a minus in my book immediately because you have to pay for it or somebody has to pay for it. But at the same time it already has a good idea and it's easy to use and in many fields it's like in signal processing. It's very popular. Then there's Julia, which is like an upcoming newcomer to the competition, but it has the benefit that it's been designed from ground up to be a scientific computation language. So it's been designed to be this kind of a language that you can use for scientific computation. So it's fast, it's designed for these kind of problems, but at the same time it's newer. So there's you need to like, it's maybe for a bit more for the programming oriented, but if you feel like it's a great language. And then there's R, which is very popular in bioinformatics, statistics, stuff like that. It's an old language, but at the same time it has huge amount of different packages for these users. So it's very popular as well. And then there's like a bunch of, you can look at the list later on, but you can pick from a bunch of good graphical editors and IDEs like these development environments that have everything you need and pick one of them. Like, I personally use non graphical ones, I use Veeam or Neo Veeam, actually, but, but like it doesn't matter what I use. Everybody should use what they feel like is the best tool for them, like choose one one of the good these are some of the popular ones they might be a lot of missing. So you can even point in the HackMD, which is your favorite. But it will help you a lot because you don't have to like if you see if you have an IDE or an editor that can provide you syntax highlighting that will tell you okay I made a typo there. You don't like that that will already save you a lot of time debugging the code when you when you're writing. Then another thing is that you want something that you can write as a script, like you can of course type stuff into like interactive terminal in R or Matlab or Python or Julia. But that's not something that will fly in long term so it's better to start already writing a script. So, because that will mean that you have like the whole story in one file, like you have the start from the top and you read to the end and you, the whole adventure is gone through the whole. Well, it doesn't have to be the whole pipeline the whole workflow, but some stuff is done throughout the script. And that means that you don't have this kind of idea, like, well, you have everything recorded there, and you can run it as a whole. And that is much better because, like, if you run some simple commands, one at a time, you have this, you might have this situation where you next time you want to run these commands you don't remember what commands you run. So it's better to write a script because then you don't have to remember that it's all written in the script, and that will help you. Now there's also notebooks are very popular. So, especially like in Python, but there's also other, like you can use notebooks in all kinds of places but basically notebooks are this type of a document where you have this these cells that contain code or documentation and it's like one file contains both the code and the documentation and the script and you can run it cell by cell, and it's very easy to start using it and to do like quick data analysis. So these are something you might want to use for like you just want to check if something works. Okay, what if you want to like graduate from this. Like, you want to start a project, you want to start the project you don't know how what kind of project is going to be but you want to start a project that will eventually produce something for you. What should you pack for that but what kind of things you should keep in mind when you start a project. Well, the number one thing is that use version control. So, like, version control is amazing, like if you ever used it, because like you don't have to anymore wonder, like, did I have to correct like source dot back back dot one dot something like you don't have to have these kinds of like incremental backups of your like version one dot something, nothing like that you don't have to anymore like copy the whole source code folder to another and then you have a copy of a copy of a copy of a copy of a copy. Like that kind of structure. Should we say what that is. Yeah, that's. Yeah, I'll go through. Yeah, let's go. I'll go through the rest of the points and then we can look into it. So, yeah, that's the first thing. And the next thing is that you start documenting early because it's much easier to document early than later because you have less stuff to do them. Like if you constantly document then you don't have this kind of like, okay, I need to document everything. The next thing is that you should keep track of what kind of requirements your process, your project has, and last you should use existing packages. So let's return to the version control. Sorry if the talk is with goes one way and goes another way but so version control rather than ask a good question so how, how would you describe actually, how would you describe version control. We save snapshots of the project as we go along I mean it can be called it can be a script text, we save, we record, we recall all the changes as we as we develop so that we can go back and that we can compare. So how does version control manage these differences how does it recognize difference between if something has been changed. So we can think of it as really recording like a snapshot at specific moments, and then then it has a way to compare them. Yeah, and the comparison is actually like line by line, when it when it comes to like text files. So, these kind of version control systems they can handle binary files as well. They can handle an image, but usually they are handled when you have text, let's say you have lines of text like this can be file over here, this is under version control as well, like, I've put it into kids. So, so every line of text can be like stored. Whenever you add a line, you can see that okay now that line has changed and you can make a, you can add it to the version control so that okay now this is a new version. And nowadays, like, vastly the most popular version of control system is kids, like, they are alternatives, but it is the most popular. Some people will say, well, I'm not really programming, so is this something for me? Yeah, that's it. And I would say yes. Yes, yes. And why is the problem, I mean they can, I think this is also if you want to record your workflow, you are a reproducible workflow, doesn't have to be sort of programming programming. And also that is really good to version. Yeah, I personally have like under version control my like, like environment files that like, if I need to like, do you use a different system I can just like get those from my git repository I have my notes, like if I make notes of, of talks of, of meetings I mean I keep them under version control and stuff like that. I try to document as much as the, like code I've written and, and keep that as a version control like instructions how to use stuff and. And whenever I usually start a new project, if I know that it's going to be something, it's going to be something I will put it on the version control. Of course, like if you have let's say a data set or something that is actually not changing, it's immutable like it's only files. That's something that version control doesn't handle. Well, it's mainly for like that. Those ideas that you write yourself. That is the main idea of version control of course like some, some data files can fit there as well but, but mainly for like your ideas if you have written them. You want them to be able to be transferred easily you want them to be like more, you can keep the history of it control Z won't work if the, if the computer computer crashes. And I think we have four minutes left so let's. Okay, let's. Yeah, let's. So, commenting, documenting. I'll quickly note, there's few tools that help you with it. So in version control you have, by the way, like this amazing systems where you can push the stuff. You can read more in the documentation here and good courses there as well. Yeah, commenting. There's easy, like markdown, which this can be documented markdown or common mark as it's on the standard. It's easy to write like these documents use, I would recommend trying to learn how to use that it's very easy to write documentation. And then there's the, if you want to like, make your, you don't want to like, you want to know how to comment properly, you can look into some style guides from various organizations like how they suggest you write your comments so you don't have to think about yourself, how do I formulate my comments. That's basically like if you have a diary and it's prefield like fields, you can like, think of it like that. Like you just feel whatever you want in the prefield fields instead of like, trying to start from a blank page. Similarly, you have these linters that can automatically recognize that okay, there's a missing comment, put a comment here like they can help you look through your code and keep it in check the style and check. Then keeping track of requirements. So you should write, it can be instructions, it can be a script, it can be whatever. You should have one some way of writing down what you have done to make this end of like this coding environment work. What does it need the system, because often you end up in a situation where you're far from starting board and then if something breaks. You need to do it all over again and you do or your collaborator needs to do it, and they don't know how to do it. So it's good idea to keep track of the requirements. But this is like when it comes to starting a new project, this is probably the most well, they're all the most important things but but this is one of the most important things is that use existing packages and frameworks. So there's this quote by Isaac Newton that if I have seen further it is by standing on the shoulders of giants. And it's been theorized that it's probably not as honest as it, the quote has it said here but but the idea is that like, if you start based like if you use already existing information, you don't have to like recreate it. And basically, there are lots of people who are doing the same thing as you are. And if you have this kind of feeling like, like I mentioned like this, there has to be a better way. Usually somebody has to product and compare to like these infomercial products is actually a working product, like somebody has written somebody who's like I, I don't say that I understand everything about computing but I'm pretty sure that there are a lot of people who are a lot better in or or there's a lot of organizations like Google or something that have a lot of money on the line that their products are good. And those people usually like they provide the code available to you and a lot of the frameworks for you. So why not use those instead of trying to create created your own. Like I know how let's say Newton's Newton's method works but I don't want to code it myself like I would rather use already existing packets that I know that it works. The algorithm already works. I mean, it's not only about using standard packages but also standard way of setting up a project we have a couple of questions on the chat on the, I can be about like how to structure a project and I think many people struggle with that. And, and also there I think reach out to the community reach out to your colleagues. Yeah, do what they do. I would recommend taking the code refiner a lessons there's excellent material on how to start a project. And this, this last topic. This is basically about that question. And it's, it's mainly like this very hand wavy this last section, but I would, I would say that, like for lasting project. You need to like work together with other people, because you're going to be doing. You're going to be working with code that somebody else has created, you're going to be using doing stuff that somebody else has to read at some point. And especially if you need help with it. At some point you need to publish research and stuff like that so you need to make the code presentable. And when you're doing this, you, you're constantly working in an ecosystem with other people so it's usually good idea to involve them as much as possible throughout the whole project. And it doesn't mean that you need to like constantly better them and poke them in the in the shoulder like okay what's your opinion on this, but mainly like if you have a problem. Ask other people like rather one had a great talk on on how to ask people. If you have, you know that you're going to be used a framework. Check, check what the community around the framework is talking about it, like what kind of issues they have used that as a like a strength. Check what other projects that you feel like you aspire to be like if you see already existing like amazing machine learning model from, let's say, deep mind or something like I often go to just to read deep mind blog to see what what people in AI are doing, and not because I would necessarily do it myself the same thing, but to just to see what kind of things that people at the forefront of the science are doing. And trying to gather like okay what kind of learnings would be gathered from them. It doesn't have to be that it's like, like okay I need to be as good as like the team of engineers at Google, like that's not possible really and and you shouldn't put that kind of a burden for yourself, but at the same time. It's good idea to be like motivated by, like nice things that you see around it and think that okay maybe I might try to do similar kinds of things. So learning from other people, I think we are running slightly over time, we will take a break soon, but after the break my understanding is that we will have a Q&A and pen discussion and there we can pick up some of these questions and some of these topics and discuss them more in detail. Yeah, so that would be the perfect time for many of these practical things will have a lot of people here and can see that there's not always one answer.