 By its very nature, high-performance computing requires that code is extensively tested and it's extremely reliable and that any changes do not introduce vulnerabilities or risk into the project. Dr. Metia Montanari of Oxford University will talk to us about scientific computing and engineering design accelerated by CICD and using CICD as the enabler for large scale high-performance computing projects. Welcome to the session about scientific computing and engineering design accelerated by continuous integration and continuous deployment. Scientific computing and engineering design is what enable engineers to make history. And engineers make history when they build things. When they build buildings, the fastest porcar in this century, engineers need to have computer simulations at their hands and certainly the recent astronaut that flew back and forth our planet did use benefits some of this technology. So scientific computing and engineering design is what I've been working on for over 10 years now and is a combination of topics that I really find interesting and still fascinating. I'm Matia Montanari and I'm a postdoctoral researcher at the University of Oxford. At Oxford we committed to create better simulation tools for our clients and sponsors. What is better? What does that mean? Well, we mean better value and in particular, we have to show that the impact that our research does on their workflow on a daily basis happens very quickly. So we needed to accelerate that. And obviously, I'm here to talk about how we accelerated that deliverable with continuous integration. But the steps on identifying GitLab as a key enabler for it are not quite straightforward. So I'll talk up through you the story and give you a little bit of context specifically on which problems we solved with GitLab and the hope is that you can apply the solutions we designed straight away or you can use this recipe to come up with your own solution. So about the problem we needed to solve, here's a little bit of context. In our research lab, what we do is to develop a software and in front of you, you've got the history of commit since January 2001. And as you see, if you are a software developer, you might see that this is a rather bumpy road. Looks like a roller coaster, pretty much representing the fun we've got in our lab. The truth is that in a research team like ours, there is no smooth development by the nature of our job because we need to go back to the whiteboard, design the math, implement it and then spend time writing papers and publishing at conferences. And all the time that is taken away from software development also create challenges for software development itself because we tend to lose a lot of information in the process. If you are a fan of GitLab, you know what I'm going to say when I talk about keeping all information into one place. So I started approaching GitLab about seven years ago, immediately after I joined Oxford because we had some specific problems. But at that time our software development stack was rather simple. We had literally one IDE, one compiler, debugger and a version control system which wasn't even Git. And we had the problem of knowledge transfer and development over the years. When I did look at possible solutions online, I found various ones, particularly Git and GitLab. So I migrated to Git and I adopted GitLab kind of early despite their initial logo. GitLab at the end developed and delivered a much better logo. But that's not the main reason why we're still using GitLab. The other reason one is because of the capabilities that GitLab carries. To give a little bit more, to go a little bit deeper in what was the problem that we needed to resolve, I'm going to take you a few years back. And that was the time when, in that particular time frame, when a senior member of the team left. And I was just about to pick up C and the software architecture and the math. And there was very little overlap. This overlap reflected in lack of productivity. And so the challenge is on my end. And I wish that that person when left, had left more information behind. But when was gone, everything was gone. And so through GitLab over the years, we managed to improve on that front. And I believe quite a lot. So that's hopefully a solution that you can improve, that you can adopt to improve your workflow as well. But there was also not a key aspect that I didn't consider initially. With GitLab, with continuous integration, really, what we managed to improve was the documentation. Documenting code, not just written documentation, but also how to use the code and how to develop the code, because with the continuous integration, you need to have a pipeline that is always alive, always up to date. That's how we convey information on how to use the code. And we did that, bringing more and more tools into the software development stack. And we got a rather large software stack. Just over a few years, most of these tools, I implemented them, really, just to have a more modern software development stack, a more performant pipeline, and being able to create documentation, render, test, and so on and so forth, and also supporting different platform. Now there was a very good result, and the infrastructure we built is now underpinning a project called Asimov. Asimov is a really challenging problem because it's about simulating a whole engine model. A whole engine is an extremely complex system that takes more than five to ten years to be built, and the physics that happen, the combustion, the fluid dynamics happening all within the engine are extremely challenging. So Asimov project has got something to deal with science fiction, really, because all these problems we don't know how to solve them yet. And we have to do all this modeling from scratch. So the way we develop our methods starts from whiteboard, then we run prototype codes, serial code on scripting languages, and ultimately we have to run on HPC systems as high-performance computing systems. Why do we do all this? Why simulations are important? Because simulations enable engineers to predict the future, and if an engineer can predict how a component will behave, it can help them preventing problems like this. As you see in this picture, a half part of the engine is missing. It just disassembled, and it was a terribly nice flight experience for the passenger sitting just over the wings. So simulations are important, and simulating a whole engine is a challenge nobody has managed to achieve and to complete yet. Now I'll use this example, Asimov in particular, to illustrate you the pipeline that I built that supports projects like Asimov, but will also support projects in parallel to it and after Asimov is completed as well. The pipeline we built needed to be simple, because the physics of the problem we wanted to solve are complicated enough, we didn't need to have extra complexity in the workflow. So the pipeline we built is linear and very simple, and yet has to be flexible to accommodate different projects and in particular communication with external partners. And here is how it looks. These are five simple steps, build, test, run, render and publish. These are the continuous integration stages. Not all of these are required, only publish is always required and that's because the nature of our work. As researchers we need to publish, whether outside to the world, through conferences or just publish results to the client. So that needs to happen, everything else will be included if needed. For those of you who are new to continuous integration, the very first thing you need to do, at least my recommendation, would be to start from a very simple pipeline. Don't overcomplicate, just make it simple and linear. And then create a file called GitLab CI and put this in the root of your project folder. Again those more experienced with GitLab will know what I'm talking about, but everybody else just please follow along because it's really simple to get your pipeline to work. This file will begin with an image, which essentially specifies which virtual machine you want your jobs to run on. And this is a Docker image and you can create your own, you can download some from the internet for free and it's a very flexible environment. And the key here is that through Docker and virtual machines what we can do is to close that gap between the early stage development of the math that we do in the lab and the production environment, the HPC, so the supercomputer environment. It was a big barrier for us to translate the preliminary code we tested in our workstation and then translate into something that could be operational on the supercomputer. Now because the GitLab pipeline may mix the production environment, effectively what we can do is to bring that production environment during the development stage in our research labs. So I think there was a key enabler that closed the gaps. Then what we're going to do is to list the stages. I've got in total five stages, but for the sake argument in this example I'm just using three, build, run, publish, this list, this define which stage I want to run but also the order. So I'm going to have three jobs. The first one is when I'm going to create documentation, so it's called the docs build. It's a build stage and the script can be whatever you want. The second and the third job are run and publish stages called solve and pages. That's all I needed you to appreciate before I could take you to the next five examples where I'll illustrate this pipeline in more and more complex examples, ideally where you will find useful information. So let's go ahead and kick off with the very first example, which is about creating a published static website. This example is taken from GitLab documentation and the website we're going to build is preview at the bottom left. The website is created with Hugo and on the right hand side you have the CI, the adapted CI script. As you see, we have to specify an image. This image was provided by the GitLab team. And so if you don't know enough about Docker and compiling an image, you don't have to worry about. There are plenty of examples for you to grab and to learn from. The only stage I'm going to use out of the five I listed here in the center is publish. I could have omitted this statement, but again, I think it's good practice to have that. And so the two jobs in the stage publish are development. And in this stage, what happens is that Hugo simply builds the website and that enables you to preview the website and then go through review process. And once you're happy with the changes, you can then close the branch through a merge request and move on onto the next stage, which again builds the website, but also publishes the website. So the pipeline we built is five simple steps, very linear, but also flexible enough to accommodate workflows in which you need to have collaboration. So there has to be... Let's now have a look at another example. This is again about creating a website, but this website will contain simulation results. So what needs to happen in the CI pipeline is first, we need to run a simulation. We need to extract some results and then publish them online. You'll see the source code for this example online and it's essentially lecture notes that helped me teaching FVM, which is a finite element method to students. So the two stages we're going to run are run and publish. In the run, we do a couple of things. First of all, we install the Python dependencies. Then we run, in the script phase, we run the simulation. And in the artifacts session, what we do effectively is to take the output of that simulation and store it somewhere on the cloud. In the pages job, what happens is that we install a few more dependencies and we don't really run the simulation. We just take the results and we create a pipeline and we create a website with them. This doesn't just enable us to create lecture notes more quickly, correct them in a timely manner, but also enable students to learn how to run simulations. Enable students to understand what dependencies a simulation need to have. So setting the development environment would be more transparent from the very beginning and again in the case of much larger projects, it could be the very same environment as in production. And this is something we didn't consider about GitLab, about the DCI pipeline in general. And I find it really handy. Now let's build a little bit more complexity with the next example. This is about my own research in which I created a library implemented in C that calculates distances between two 3D objects. As you see, we need to build an executable and because the software is rather complicated, we need to have all sorts of tests, unit tests, application tests and so on. Then we need to run a simulation, take the results, process them and move them online or in a report, in pictures for papers, so on and so forth. So effectively we need to publish them. The CI script for this project is a little bit more complicated, and so I didn't want to put the whole script here. But I think it's obvious that this pipeline will accumulate more complex projects. And there's also another thing. Some of this code is open source, but there are good reasons why some of your code should not be open source or maybe only part of your project should stay open source. Through CI and GitLab, what we managed to do was keep private some information that we didn't want to disclose and disclose information that we could disclose. And there's recently another improvement I made to this pipeline. I managed to add a rendering step that is basically rendering in a 3D object from the simulation results. And in fact, if you now see the website, if you visit the website that I'm linking here, you can see that there is a hand moving left and right on the green cube. And that means you can drag and drop, you can drag that cube and interact with it. And so this gave us enough confidence that this pipeline would be flexible enough and strong enough to support projects like Asimov and the development of the software of the whole team. Now, going to Asimov. What we need to do ultimately is to mimic some work that was done by the Virginia Tech guys, which is simulation in which, for example, here you got a drone that impacts the blades of a turbine engine and all the fragmentation, spolation, damage that happens in the engine needs to be accurate to high fidelity. And that means includes various physics and having a performance code. What we need to do is not to model just the front part of the engine, but to model the whole engine. So we got a great deal of challenges here, but I believe it's going to be really good fun and it's going to be a very productive project in which we can learn quite a lot. And because we want to track every single change we make in the code and we want to test it and we want to be transparent, well, then we need to have a CI pipeline in place. Before I conclude my talk, I would like to leave you with two notes. Two reasons why I trust GitLab. The first one is because of transparency. Very early when I looked for various providers and solutions, I really liked what's in their DNA, which is transparency. Funny enough, you see that happening actually in the very first GitLab issue. Issue number one is about updating some licenses and updating terms and conditions. And they also have another thing, the handbook. Some of you may already know that the GitLab handbook is their recipe, how GitLab operates. And regardless of which stage you are in your career, I think you can go and have a look at the handbook provided by GitLab. And I believe you will find something to learn. I certainly learned a lot from GitLab's handbook. The second note is about security. Again, fun enough, issue number two is about GitLab security. But through the examples I just showed, I hope you've got good confidence that the security and the flexibility of the platform enable all to have a secure workflow and secure pipeline. Some of that pipeline is obviously well documented in the manual. You can talk to people, you can read through blogs, but ultimately what you really need to do is to get your hands dirty. This concludes my talk. I hope you found it useful. If you do have any questions, please don't hesitate to get in touch. Thank you very much for listening.