 Awesome, so I'm going to talk about black holes and pythons. So let's see how this goes together. So the idea is that this talk is going to be about an experiment that we did. So it's how to construct a particular piece of software which is called a general relativity rate tracer, which is basically a fancy term for saying we want to take photos, simulated photos of black holes. And the idea is that we are going to go together over the journey on how we started with a simple and inefficient pure Python version of this thing, and how we used different tools in the Python ecosystem until we achieved a much faster version of the software. So the idea is that I'm not going to go into detail of all of these frameworks and different libraries. It's just more or less transmitting my experience when I was doing this thing and the main ideas on what do you expect and which is more or less the different steps that you can take in order to optimize an application. But first, let's talk a bit about black holes. So the great question is like, OK, so how do we know that there are black holes? So how can we see a black hole? If, as you know, a black hole is a region of space time where not even light can go out. So these objects are black because they don't emit light. So the way you discover that there are black holes is because they warp space time around and they warp light. So if you have different sources of light behind the black hole, like galaxies or stars, these images will appear to an observer that is in front. They will appear warped. So the situation that we'll have in this ray tracer is this thing. So we'll have the image that is in the background is basically a source of light, let's say a galaxy. This red ball is a black hole, which is not a ball or even red, but you know, like artistic representation. And then we have a camera floating around in space. So the idea is that it works very similar to a normal ray tracer. In this case, normally the light comes from the background and reaches the camera. But because it's very difficult to just track all the infinite rays of light that come from the background, we basically do this in backwards. We will start in the camera and we will integrate the rays backwards to see where they come from. So the algorithm, in reality, is very simple. We accept that in general relativity, which is in space time of a string gravity, the light doesn't travel in a straight line. And that's the main problem. So we have to calculate with some equation, which is the curved rays, the curved paths of the ray until they reach the background. So basically, the algorithm is it is of every pixel in the camera, trace back the ray following these curvature paths. And let's check where the ray come from. Graph the image information of the pixel in the background and basically reconstruct the image this way. So how do we know how these paths of light work? So this is the boring part, because there is a lot of math, so these are the equations. It's a lot of boring symbols just to impress you. But the main idea is that at the end, you expand this thing with a lot of hard work and then you end with very big equations. But there is big equations and very easy to solve. They are not partial difference equations, so it's more easy to solve because we have very strong algorithms and we have collections of different ways to solve these things with bounding and error and things like that. You can go into the boring details of how to do this thing, but the only thing you need to know is that once you expand this thing, I can give you the formulas and you can just copy paste these formulas and implement them in your program. And you can know at every step the following point the light ray will follow. So this is the boring part. So let's talk about the interesting part, which is the optimization rule. So this is basically going to be a very slow program and we want to know how to optimize. And this is the most important thing because what I'm going to show you is the most important rule of optimization. It works on every language on every platform and it's super important. It's don't do it. And there is a more advanced version with this gentleman called Michael A. Jackson, which is the first rule of optimization is don't do it. And the second rule of optimization only for experts don't do it yet. You may know it because this also famous sentence with Donald North I think that is the premature optimization is the root of all evil. And the thing is that let's achieve first a working version of what we want to achieve and let's focus then on optimizing this thing. And this is the reason we started with Python because pure Python is a very easy language. We already know it and it's very easy to debug. It's very easy to follow which are the errors. And if we achieve a working version in Python with a condensed profile, the things that are not working and then we can see how when we proceed to make it faster. So the algorithm is super simple. Basically it's three, it's three, four loops. Basically you iterate over every column then you iterate over every row. These two, four loops is basically saying iterate over every pixel in the camera and then you have to integrate every ray backwards. So you have to go one step at a time seeing when the photon of ray goes until it reaches the background or it falls into the bracket. So it's basically all n three algorithm and this fancy pure Python here is basically the algorithm to solve the equation that I showed you but this is just a very slow function. You can imagine this code that goes in the loop as a very slow function that is independent given a pixel on the camera. So if you try this with pure Python you will soon discover that Python is not very suitable for numeric computation. One of the reasons, there is a lot of them but one of the reason is that because it has to create Python objects all the time. So when you add for example two floats it will need to create a new Python object. So it needs to create the CS track and it will put a real float inside. So when I, the first line when I add X plus Y plus Z first it needs to create a temporary Python object for X plus Y and then add this to Z and this is the result. Another reason is that it's dynamically typed. So when you say for example three plus five plus six plus seven, Python will grab the two first when I would say, oh, what is this number? I would say, oh, it's a float. What is this number? Oh, it's a float. And what is the operation for adding? No, it's this thing. And it will take a lot of time doing these questions until it really resolves the actual C function that is able to add numbers and then it will have to construct a new Python object. So compared to C when you just execute this traction to add, it's normally implemented in the hardware. Another reason is that it's normally very cache inefficient in the sense that, for example, you can use better data structure, but for example, at least it's full of pointers to Python objects. So the main problem is not the indirection because if I want to access all these Python objects I have to reference the pointer. The problem is that it's very cache unfriendly because the real floats live in different parts of memory. So the processor has to fetch one at a time and you cannot have cache friendly properties in here. Another reason is when you start the interpreter, the interpreter has to make all these reference to different objects. And one of the things that you can do is a little experiment on this, is that you can do this thing and then you can grab all the Python objects between zero and 1,000. Then you can check how many reference there are for every object and then you can plot this thing and you will obtain this graph. So it means that when you start up the interpreter, there is 10 to the power of a big number to reference to the number zero and if you go increasing the numbers, there is less and less reference but the Python interpreter has to create all these reference even to the same number. So this is a expensive operation but it will make your program very slow compared to C programs but the main idea that you have to turn is that this is very easy to debug and it's very easy to know if it's correct or not. So this is the main properties and why I suggest even if you want to do a very expensive operation, how is this ray tracer to start in Python? So the question is okay but does it work and actually it works and you can produce these beautiful images. So what you are seeing here is like a simulated photo of a black hole. So maybe the projector is not showing you well but there's a very black circle in the middle and then there is warp thing around. So what we are seeing here. So if we draw the rays that are casted from the camera in two versions, they do something like this. So the camera is on the left and the black hole is like this area when some of the rays converges. So you can see that the camera is launching rays like upwards but some of these rays are actually going down and seeing and going back. So this means that in this image, some of the in this like word ring, what you are seeing there is basically things that are in your back. So even if you look forward because the light is bending and going around, this ring here is corresponding to images that are on your back or even on your sides. And even if you think that this black, black, black region on the middle can be the even horizon, it's actually not. The correct term is called the shadow or the black hole. And this is because if you throw a lot of rays, you will see like the rays here in red, they fall into the black hole. So this means that these rays are not reaching the camera really because they come from the black hole and the black hole has the property that it doesn't emit rays and all the rays that goes inside cannot go back. So all these rays in red will appear black in the image and only these two rays in blue are the rays that you will see in the image. So this means that the big red, sorry, the big black circle in the image is actually bigger than the even horizon because it corresponds to this big bundle of rays. So basically it's this situation. This black thing is basically the even horizon is inside but the shadow, this black area is actually bigger than the even horizon because it basically corresponds to all these rays that goes into the black hole. So okay, how much time it takes in my crappy laptop. So if this takes an hour and 32 minutes, you really want to take this image. This is not even a good background image because it's like, you know, what, 1080p. So let's try to make it better. So one alternative is to start using NumPy. So NumPy is basically a C extension model for Python. So it's very famous in numery computation and data science and a lot of libraries are being used. So one quick experiment that you can do is that you can have a Python list, you can power every element to a square power and then you can see this takes like 100 microseconds. And the good thing about NumPy is that you can say, you can do this operation in a vectorized form. So instead of saying, oh, make the second power of every element, you can just say, oh, grab the array and calculate this number. And this will mean go to every element and do this thing. And if you see this thing, it will take one microsecond. And if you do some little calculation, you will see that basically NumPy is taking one and a second for floating point operation when Python is taking 100. My laptop has one gigaflop per second. So this means that NumPy is very close to actually peak efficiency. So this is good. And the reason is that the NumPy underneath is done in C. So the algorithm basically now is that for every row in the camera, for every column in the camera, just use these vectorized operations. And the thing is that you still have to do a lot of Python overhead for calling these functions. But it's much faster because the function that basically is iterating over the differential equation that we saw is done in C, and it's going to be super fast. And actually, you can even more fancy and collect these loops into an universal function in NumPy, which is a way of saying, OK, apply this function to every pixel in the camera and simplify the algorithm in this code. This is more like system suger. The good thing about NumPy is that you are not free of low-level details. And this is a very tiny example of things that you can see that will affect you. If you have an array of random numbers and then you do a simple operation that's calculating the sum of all the elements that are bigger than, like, 50, and then you do the same, but first you sort the array. So B is the sorted version. You will see that the sorted version actually takes much less time in NumPy than the not sorted version. And this is an effect of one very low-level detail, which is called branch-mix prediction. Because the branch predictor, when the array is subtle, it will fail in every element more or less because it cannot find a pattern from elements that are less than 50 and elements that are bigger than 50. While if you sort the array, the branch-mix prediction will go correctly in the first middle of the array. And it will fail one time, and it will go correctly again. So this basically means that even if you are in Python and you're using Python and NumPy arrays, you are not safe of low-level details. And you have to think, and if you really, really want to keep using NumPy and a rich peak efficiency, you have to know something about C and how to optimize these kind of things. Also, NumPy can be a bit difficult to read, like, for example, I can have these tuples that represent a position in a space of different particles. And if I can want to calculate the difference between the position of every particle and all of the rest, the way of doing this is efficiently in NumPy is using this. This is using some NumPy operations that it's calling broadcasting. And it's basically a way of manipulating these matrices so I can express this operation in vectorized form. But for a newcomer, this is not clear what is happening here. But like the end result is this thing, like the distance for every particle to itself, which is the diagonal is 0. And the magic is symmetrical because the distance for particle A to particle B is the same as the other thing with the minus sign. So if you use NumPy and Europe this experiment, basically you can go to 72 minutes, which is a bit better, but it still has a lot of overhead because of these Python interfaces. So you can go a step forward and use Scython and Numba. So Scython and Numba are a way of transpiling somehow the Python code to very efficiency and use that efficiency instead of the Python code. So this is the typical example of a Scython function. So this is a pure Python function that is calculated in Fibonacci. If you calculate the five Fibonacci and it will take 930 nanoseconds. And the Scython version looks like this. So basically, you type everything. You say, OK, this I is an integer. This A is a double. And you just do the same operation. So it's very similar. And you have to compile this thing. But you will see that it takes 195 compared with this 900. And this is awesome. And you say, oh, wow, I need to do this thing. And OK, it's some C and it's close to C. But this is really, really great. And the thing is that some hours later, this kind of monster will appear, right? Because you will start saying, OK, maybe I need to start passing functions around. And then I need to use the C version. So we'll end with this whole star madness and all these types. And it's not that good. Another possibility that is in the Python ecosystem is to actually use Numba. So Numba basically is an incredible project from the continuum analytics. Nice. It's open source now, fully open source. So this is the first example in Numba. Basically, the function is the same. It literally, you have no type anything. And then you put this app decorator. So the first time you see this example, you are like this, this is suspicious. What is this doing? So what this decorator is doing, and it's really, really great. Basically, it's passing the AstraXista 3 of Python and it's transforming this thing to intermediate representation on LVM. It's compiling the code for you, inferring the types. And it's executing that C compile version. So literally, this thing goes extremely to machine code and it's much, much faster. And if you take this thing, you will see that you can reach 40 minutes with Python and 30 minutes with Numba, which is really, really great. If you think about this, because this is an embarrassing parallel application, we are using only one core. It's like these companies that have like 40 managers and we're a software engineer. We need more people working, right? So you can start using parallelism. The bad thing about Python is that threads in Python are not possible, because you know, you have this global interpreter log that only makes one thread run Python code at a time. So you need usually pthreads. Pthreads are threads that run outside the Python LAN. It's threads that run in C LAN. And the good thing is that you can still use processes. This is how you use the... I'm going a bit fast because I need to finish. But the idea is that this is how you use the multiprocess module in Python. You can use this thing, but then you have to serialize all the data to different processes. So the main idea is try to use pthreads as much as possible. And Scython has this very good extension with OpenMP, which is a library to store in these threads. And basically, you have to go from this code, which is basically calling a function in a loop. And you have to have to do this thing with this code. But basically, it's using range instead of p-range and imports. And this will basically spawn a thread for every iteration of your loop. And you can do this code very easily because it's embarrassing, so you don't need to worry about concurrency and logs and things like that. But you can do this thing and then you can go back to 10 seconds. But you can say, okay, let's try it better. Let's use GPUs because this is an awesome thing and we live in an awesome era, the best time now. So GPUs. And in this case, we use CUDA, which is the NVIDIA proprietary evil version of doing GPUs, GPU computing. So when I refer to GPU computing, I like to bring this image. So this is the Dante's representation of the inferno. So every level is a different level of hell. Basically, the down you go is the worst for you. And I'd like to think that CUDA programmers are basically there because it's like, I don't know if you are there because you're coding CUDA or basically you're coding CUDA because you're a bad person. I know so exactly what happened. So this is, I say in this thing because this is typically a C code, right? So you have to call these arcane functions. There's a lot of stars going on. Basically the malloc version of CUDA instead of returning your pointer takes you a reference to a double pointer. So you end with a lot of stars. You look like the Microsoft wallpaper, like screen saver if you remember it. Like it's all stars going on. The good thing about this is that this is horrible but we have Python developers and in Python land everything is awesome. So this code that I just showed you can be shown as this other code which is very like sparse. You just create a matrix in the CPU and then you pass it to the GPU. In CUDA you need to really do some kind of code but this code is usually taken as a string. So you can call your CUDA code as a string. This is, you need to know a lot of C and C++ and CUDA but there are many ideas that you grab this code then you tell CUDA okay I have these matrix in the CPU bring them to the GPU and then you can grab the code that you just write in the stream. You can compile it, extract the function and just call it. And this is just 10 lines in total and literally you can go up to 0.001 seconds. And this is incredible because the GPU has all these threads and they are less powerful than CPU threads but there are many of them. So if you have a lot of pixels even if they are slower than CPU threads there's a lot of them and that's the power. So now that we have like very cool efficiency and this is the end of the journey in Python let's see what we can do about images. So because this is similar to this moving terrestrial we can try to see how a more realistic expectation of a black hole looks like. So usually black holes are surrounded by a disk of matter falling into the black hole and it's a very hot matter that needs lights around and it's falling slowly into the black hole. So you try to add this effect you will end with these things. Of course this is like a simulated texture but this is how it looks like. This is more similar to the movie that you just remember and let's analyze what we've seen exactly. So one of the interesting features here is this like bump that you see up there. So this bump basically and this is an image of basically the camera going around the black hole and you can see the distortion. So let's analyze what that bump is. So this bump that you see there is basically the light that is emitting from the camera is bending up and then it's bending down and it's telling you that it comes from the back part of the disk. So you have the disk, you have the black hole in the middle and it corresponds to lights that go outwards and then go downwards and come from that part of the disk. So basically it's this situation. It's right that go down and then it basically tells you that it comes from this backside of the black hole. So this bump that you see is really the back part and you are able to see this thing because the black hole is not occlusion you because the light is basically shorting the obstacle. Another cool thing is that really this disk doesn't correspond to the normal version. So it's not that rays that come to the right correspond to the right part of the disk and rays that come to the left correspond to the left part of the disk. Actually this is the real situation. So rays that come from the left part of the disk are actually seeing that as if they come to the right and base versa. So this means that in this image of the black hole this left part are actually the right part of the disk, the real disk and this left part are actually left part of the disk. And if you see a different perspective here, you can see basically this is the situation switching on general relativity. So basically, this is with general relativity, you are able to see the bump there. And if I just switch it off, and the normal ray tracer when light goes in a straight ray, you are not able to see the back part of the disk. But then we see another feature with this other bump down here. And this other bump down here is basically rays that go down and then go upward. So this bump down there corresponds to the bottom part of the disk that you are able to see because ray bends down there and then go upwards. And they come basically from the back of the disk. So literally in that image, you are able to, so in this image here, you are able to see all disks at the same time, like both the upper part of the disk and both the down part of the disk. And this is because ray lights are basically bending. So this is like a texturized version of the whole thing. So it's like the camera going around. And then you can see how actually if you pay attention how it flips. So if you see basically now, you are able to see this opposed image. So just there, basically you are flipping it out. So if you look at the ringing side, it's like the ringing side, which is a second image, I'm going to spend a thing a while. The ringing side becomes the outer ring. And this is because the ringing side corresponds to ray light that are circling the black hole twice and are reaching you. So if you look, if you do zoom over that area, you will see this is a closed version of the even or icon. This part over here is the saddle. You will see that there is this other images of the disk and there's an infinite number of them. And these other images of the disk correspond to ray lights that are basically doing this thing. It's like rays that are circling the black hole and then they escape. So this creates like a lot of images and infinite number of images closer and closer to the even horizon. It corresponds more or less to the situation. It's a ray light that circles the black hole, a lot of lights and then escapes. And you see this thing as little copies of the disk like closer and closer to the even horizon. The last feature that I'm going to talk to you about is that actually the saddle that you're seeing is not only the front part of the saddle, it's actually also the back part of the saddle because you have ray lights that bend around and go to the back part of the even horizon. So even if it's black and you cannot see it because you know it doesn't emit light, if you texture the thing, you will see that, if you texture this as a sphere, you will see that you see the sphere but the more close you go to the edge, you are actually seeing the back part of the sphere. And this is because there is ray light that is going around and coming from the back part. It's not really ray light because in the horizon and the saddle, there is no light sources and nothing can escape. But if you imagine that you can say, okay, let's see how this maps to the actual sphere that corresponds to the saddle, then you can see that actually the saddle that you're seeing, even if it's black, is the complete saddle, like 360 degrees. And then you can add more artistic representation and you can add with these kind of things you add a bit of bloom, which is totally fake. And you know because it's more artistic, you just bend a bit the image because it's just better. And then you add with these nice things, which is extremely fake, but it's really underneath, right? This week have calculated this thing and we have done auto-faster, then you can just frame this thing and give it to your friends when you are traveling to the space and things like that. And this is how you produce, you start with a Python code that takes one hour and you end with a, okay, semi-QDA version, Frankenstein Python code that takes 0.001 seconds and you can end with these nice images. So I don't know how many times we have for questions but... So we have a couple of minutes for questions. I saw before, in the talk before, some people taking notes, I don't think that you've been taking notes in this one. I don't know if you were a bit more lost. So first question here. Thank you for the talk. I was wondering if you tried combining a number to GPU and how does that compare with the native QDA? Yes, I actually tried this thing. So Numba has the possibility of specifying the GPU as a backend, but you still need to, so because the GPU has very strict mapping between the GPU model and the actual hardware, like it's structured in a grid and all these things, you need to be still aware of this. So actually it's surprising enough, like even Numba compared with Scython, is I'm really impressed by this software because it's slower than the pure version of QDA just because I can't have craft this code. And I have, I mean, I don't want to do these things but I have to take very carefully analysis of exactly how this goes because optimizing GPU code is horrible. That's the hell image because you have to be aware that branch image prediction in the GPU are extremely costly, so you have to try to do the code. The more you are able to modify the actual code and taking care of the instructions, the more faster it is. So Numba does this thing automatically for you but you have little control. So Numba is 60 times slower than my GPU version but it's much less code. It's less Python instructions. You need to be aware still of how one thing maps to the other but my final opinion over the thing is that it's extremely impressive how less Numba code you have to write and how it maps to faster stuff. But if you still want to pick high efficiency in GPU, today at least, you still have to craft the code and be really, really careful and to be aware of all the typical things that you can do. So I'm afraid that we don't have more time for questions. We have to go to the next talk but I'm sure that Pablo is willing to chat through any other question you have about black holes outside or in the green room. Okay, so please, another big applause for Pablo. And yeah.