 Okay, let's get started. So the next talk is by Bart Janssen on the Julia programming language and the integration they have done with the Trilino's Application take it away Bart. Okay. Thank you. Good afternoon everyone So before starting the actual talk, let me give you a bit of information about my background so I work at the Royal Military Academy here in Brussels, which is The University, let's say of the Defense Department in Belgium and So why did I get involved with Julia language? Well, my background is mainly in Computational fluid dynamics for which I have written lots of C++ code Which I find myself a lot of fun to do but the problem with C++ code is that even though it is fast it is very difficult to convince others to continue working on it and Especially students doing a master thesis students who typically have a scientific background and no real programming backgrounds they have difficulties working on on a code in a language like C++ and so then I came across Julia which promises to be fast and easy to use so that's why I got started on this and Most of the most of my time using Julia is spent on making it interoperate with large C++ libraries things for solving large linear systems like Trilino's Which of course are not easily written in five seconds in a new programming language So you have to interface with existing libraries. So that's will be the main topic of this talk So just a brief outline first I will introduce the Julia language itself So just to know how many of you have tried the Julia language at some point Okay After introducing Julia, we will look at the integration of Trilino's and MPI so the typical some typical high-performance tools The notebook that I'm presenting here is also available for download on the website. There's a link on the Fosnain website So the first question you might ask is why another programming language And so the specific goal of the Julia language is to eliminate what they call the two language problem Which is typical something that happens in scientific computing the scientist will quickly write something in my club typically Then that will prove that the code works But unfortunately, it will be far too slow for production use and so then someone with more knowledge of computer science will have to rewrite it all in C++ or even Fortran because that's they know they can get a fast result. The idea of Julia is that you can Basically use the same language For both kinds of applications It's the same language, but not necessarily the same code because if you naively write to your codes It can be slower than desired as well, but with a bit of optimization and a bit of tweaking you can get a This prototype code and make it fast So what what are the key features of the language? So it's a high-level programming language high level in the sense that syntax is Quite easy to understand a bit like Python geared at scientific computing, but with potential for Generic programming, let's say general programming It's a dynamic language. So you have basically script like file that you run using the Julia commands unlike Most other dynamic languages it has strong typing and you can define your own types And you can even define generic types like template types in C++ And it is just in time compiled using LLVM The central concept of the language is multiple dispatch, which we will explain in a few moments So let's start with a simple function where we just want to add Two things a and b together So the syntax here is pretty simple, but because it's such a short function. You can also write it in one line Like this also the return here is optional if you just print If you omitted the return it will just return the last value that you have in the function So let's test this and we see that it works correctly So I said it was a strongly typed language So where are these types? You can of everything that you create or Every variable you create in Julia you can request the type using the type of function And here we see it's a 64 bit integer if we just have the literal one If we apply our add function to two integers we see we get an integer back And if we do the same for a floating point Numbers we get a floating point value back So Why is that important? It is important because Julia will compile a new function for every possible combination of types that you supply it So comparing the integer version with the floating point version We can directly look at the machine code using the code native micro in Julia And we see that the the machine code for both these functions is very similar Except for this instruction here, which here is integer addition and here we have floating point addition So it compiles a specialized function for every combination of types that we have So how does that then compare to C++ in C++ we can achieve the same thing the syntax is slightly More verbose so basically if you remove all the green Highlighted keywords you have back again the Julia code, which is just the A plus B But this and C++ achieve the same thing as we just saw in Julia So it's valid for any combination of types A and B It automatically computes the return type and well if you invoke this function C++ compile a new version for every combination of the arguments So in a way C++ can do the same thing as in Julia But the syntax is more verbose and here it is purely a static computation of type only at compile time in C++ In Julia what the multiple dispatch mechanism means is that It will choose based on the types of arguments that you supply to the function It will choose which function to call and it can do this statically just like in C++ template functions at compile time In this case just in time compiles or it can do it dynamically if the type of the argument is not known at compile time It will choose the function to call at runtime a bit like a virtual function in C++ So let's take a brief look at the user defined types We can define a type like this which contains one single field an integer value and We can then create a Value of my number containing the value 2 here for example So how does that work with our add function as you can imagine? I didn't specify any types type limitations on the add function so I can just call the function on it, but of course the plus operator in Julia is not defined for my number type So to fix this error we have two options We can override the plus operator of Julia or we can Specialize the add function directly. I chose the letter here and so you can annotate the function add Create a new version of it Let's say that takes a as a my number and b can be anything and then implement this to return a my number So we see now after executing this that there are no now there's now an add function, but it has two methods So then you might wonder which are those methods and for that we have the methods call and we can see that now We have the two add functions the one we first defined and the one The specialization let's say and now if I call it first with a my number It will call the specialized function and return the correct value for this kind of thing And this way it's very easy to build a complex system using your own types and overriding the necessary functions That makes for a very flexible code in the end So the next topic is about interoperability. So natively Julia is made to be Let's say in terms of memory layout compatible with C and it also provides Nice interface for calling C functions directly out of the box and then there are many packages that's Help when you need to call C++ or Python or are Matlab There is a whole series of them So calling C functions happens through the C call primitive It's not really a function, but it does something at the LLVM level directly and so we can call The fabs function so to get the absolute value using the standard math library Here of the value minus one and it returns one as expected Using the benchmark tools package we can get an idea of the overhead that this induces so calling the C function here takes about 3.6 nanoseconds and We can also compare this with the native absolute value function in Julia and we see that Considering that this is such a small function. The overhead is really small Next is the integration of MPI. So Using this C call primitive where we can call C functions. We can of course also call the MPI C functions and then if we write a Julia program using these wrapped MPI function we can just run this using MPI run Julia and then the script that we created so you'll compare a bit between C and Julia using a simple MPI reduce So this is the the C code where you have all the variable declarations in the beginning the MPI initialization And then to sum so the idea is that we will in parallel calculate the sum of the elements of an array So we allocate Our array and fill it with the value equal to the rank plus one and then we want to time the sum So an MPI all of this code will be executed on each Process and so every process sums together. It's part of the array and then it will call MPI reduce to some all the sums calculated by each process and then once the MPI reduces finished we stop the timer and Show how long this took So then in Julia the code looks pretty much the same Except that all the stuff about memory allocation and so on is much simpler using the native Julia arrays so we just create our Array using this line of code here To keep similarity with the C code. I manually wrote the sum here Of course, you have a sum function that if you call it on an array it will calculate the sum But then you might think I called an optimized sum and so I was cheating So here is just a basic for loop as we wrote in in the C codes and we call the reduce on that note that the MPI package Puts around some nice these around this reduced call so We can replace the MPI sum with a plus and it just returns the value instead of putting it in a reference argument and so this gives us the The same result as in the C codes Actually, this runs in the notebook on the slide on one CPU because I have the MPI package installed And that's the the timings that you see here on one CPU Running on my home computer on four cores we get in the C code The timing was a bit unstable, but I got between 0.022 seconds and 0.065 and then on Julia We see we get equivalent timings So for this kind of low-level loop, we see that we are at equivalent speed of C. So that's It's really showing that Julia makes good on on this promise. So Maybe I'll go over this a bit more quickly, but Cxxwrap.jl basically works in the same way as boost.pyton. So it allows you to write an Interface code in C++ that will automatically generate Julia functions to call from the Julia side And so that's what I've used to wrap the Trillinus library large matrix library Some comparisons showing The overhead so if we loop over 50 million elements and doing multiplication of two floating points numbers We get this time in Julia and directly in C++ this timing So again a low-level loop will run as fast in Julia as it will in C++ However, if we do the multiplication by C calling a function from C++, we see that the difference Is not that great. So there is little overhead in calling external functions So the Trillinus library, who here is familiar with the Trillinus library? Not that many. Okay. So it's basically it's a library that allows you to Solve large linear systems on a compute cluster The example that we will use here is the 2D Laplace equation. So you solve this differential equation On a grid of in this case one thousand by one thousand nodes and you will get this kind of To the parabolic surface as a result. So it's just a benchmarking problem now part of the code that this performance critical is the Assembly of the linear system. So filling in the matrix with all the elements This is the C++ code in Trillinus. So the core of this algorithm is a loop Over all the elements of the matrix that are on the current process. So that's really a low-level loop and in that loop You will fill out the elements. The values of these elements is basically where The link with the physics of a problem is so understanding where these values come from is Basically the job of the scientist. So we see that we have very close proximity here between a Scientific value and a very technical loop over the elements here The code in Julia looks basically the same because of course you keep the same names and so on when wrapping a library And so running these two things and comparing the performance shows that it is very favorable. So I learned this on multiple computers until I found one where Julia was slightly faster, but you can You can get the the opposite result also but the main point is that this is a very basic example where there is very little mathematical heavy computation going on and Most of what you measure is overhead. So if it already is this close you can Safely assume that you will get no real performance loss from doing this in Julia So these are just the different steps of the algorithm that I consider to be performance critical And basically this result is what motivated me to continue using Julia because even for performance critical parts of the code You can Rely on the fact that Julia will not impose performance bottleneck that you will not be able to get around At a later time with the advantage that the code will be understandable by a scientist in the end So that brings me to the conclusions so Julia is a fast high-level language as we've seen and we can Interoperate with existing work, which may be very extensive such as is the case of the Trillinos library And it really delivers on the performance of matching the C C plus plus performance There is much more that I didn't have time to mention in this brief presentation Also good reasons to choose for Julia. So the types you can have generic types a bit like template types in C plus plus there is of course support for parallel programming We have seen seen how we can reuse MPI But Julia also has its own system for parallel programming and the two can in fact also work together A Very important aspect is the meta programming so you can take any Julia expression and manipulate it in Julia itself And you can write macros And generated functions that leverage this expression tree to do for example loop unrolling automatically so you can take an expression and If it has a for loop you can rewrite this expression to just execute the expressions in sequence instead of having a loop A very interesting package is CUDA native Where you can write and video CUDA kernels write in the Julia language and you can make this Interoperate with Julia types with Julia areas and you get Very intuitive code, but it is running on the GPU and you don't have all the all the mess with Dealing with the NVCC compiler and so on and of course there is a Very large and expanding ecosystem of excellent packages also for machine learning and so on Differential equations. There are a lot of cool packages that allow you to already Dive in and use it for real-world world problems So this concludes my presentation. Thank you You can very much but any questions for Bart? Yes So when calling him to see your C++ code in practice how tedious it is to convert between Julia types and see your C++ types Yes, so in the case of straight C code you can like the the struct I wrote here so this The layout of this track is the same as in C So you can reuse C structs and the typical way of doing it is to write a C Julia struct that mirrors the C struct. So if you have to do that manually, it's a bit tedious what I do in CXX wrap basically the C++ class is just a pointer in Julia and you wrap all the methods and functions of the class to be a Julia function and So that way there is no real data conversion and everything happens at the function level and And then this CXX.jl, which allows you to in fact directly Intermix C++ code with Julia codes Because this works at the LLVM level and so it will actually compile the C++ code using the same LLVM as Julia runs on but I haven't used it myself that much another question Related to the previous question. How do you enforce boundary checks with C strings or arrays? So There are a few special types in the C call interface such as C string if you say So here you specify the types of the arguments So for example, if you put C string here, it will know that it's no terminated C string Okay, but you so you treat the C strings the same way as C does it you just But the native Julia strings are different native Julia strings are different and so if you put C string here it will and you put the Julia string there It will do the conversion automatically. Yeah Julia sucks when you're doing plots it takes a long time to do the first plot. Yeah, that's that's absolutely true So I myself I used to have Python plotting and I still use Python lot for plotting If you have the habit of typing Python my scripts my data In Julia, that's slower So you have to launch Julia and then call a plotting function every time that makes it fast But I think it's a temporary problem. I hope So is there a package manager to install packages from a repository. Yes You just call the command pkg.add and then the name of your package Anymore questions? Does the package manager have trilliness in it? Not yet anymore in the back Gonna have to run all over Hi Is there any scope in? Writing and executable in Julia for openform And using the C++ class libraries from openform I think you could interface openform using cxx.jl or Cxx.wrap.jl Or the other way around you can From Julia get a C function pointer to a Julia function and call that from openform That's a possibility too. Do you think there will be any performance benefit or? It can never be really faster than C or C++ because basically it's just LLVM compiling code in the end so the basic performance that you The maximum performance that you can get is the same in both languages. Okay. Thanks Okay, thank you very much Bart. Thank you