 So, halo everybody, thank you the coming. First of all, can you hear me? Is the volume fine? In this talk, I will talk about PipEye and PipEye G it in particular. First few words about me, I've been a PipEye core developer for more than 10 years now. I'm also the author of some other open source projects like PDBX, VMPROFS and others. You can find me on Twitter. And before I forget, I will show you some demo code in this presentation you can find the source code at this link. So if you want to try it by yourself. So let's start talking about PyPy. Usually when we are at conferences and talk with people about PyPy, the general question is how fast is it? And actually the answer depends. There is no single answer to this question because it really depends on the kind of code you are running and as we will see, it will also depend on how this code is written. If you go to speed.py.org, you can see the benchmarks we run every night and there is also this nice graph, which is summarized. As you see, we compute that on average we are 7 times faster than Cpy, but of course this number doesn't mean anything. It really depends on your code. And as I will show in this presentation, the cool thing is that because of the way that it works, it is able to remove a lot of the override of obstructions in your code, which means that the better the code you write, the greater is the speed up against Cpy. So what does it mean to write good code? First of all, it needs to be correct and towards. And then it's nice to have code, which is readable, which is easy to maintain, which has an ASAPI, if you are writing a library, designing the API is very important because this is how the user will use your library and if the API is easy to use, well, your library is much better than the alternative. And of course, it would be nice if our code is fast. And in particular it would be nice if by writing a nice code, which is readable and maintainable and et cetera, we don't impact the speed of the code. Usually one way of writing good code is to make a good use of obstructions. Like, for example, it's something that we all do when writing code. We try to factor out common pieces of logic into functions. Then we group common pieces of data and behaviors into classes. We use inheritance to share behavior and et cetera. And yes, this makes the code better and more readable and et cetera, but sometimes this style of writing code has a cost, especially on C Python, as we will see briefly. In the following, I will show you a demo and I will show you various ways to write the very same piece of logic. And we will compare the performance of C Python and PyPy in the various versions of the code. The demo is about some image processing. We are running sobel filter on some video stream from a webcam to with the result to compute some edge detection of the image. This is just a copy and paste from Wikipedia. I am not really a computer vision guy, so I don't really know how it works. I just copy and paste it from somewhere else. So how do we represent an image? In this demo, we start by representing an image in a very simple way. An image is consist of a variable, which contains the width, the height of the image and an array, which is our data. We are talking about a greascale image. So each pixel is represented by one byte from zero to 255. And so, yes, we have this linear array and we can index a single pixel at position x and y by doing simple calculation. So, we go to Wikipedia, we see what is the math behind it. We just write a simple loop in which we do everything in line, basically. So you can see that here we have our image, which consists of three variables. We create the output image, and for each pixel we do the calculation. You can see, for example, to address each pixel, I do here the computation manually every time. And basically, these words, I can show you the demo, like this is the program running on pipi, and you see that it kind of fast, it works in real time. You see the number of FPS, it's a bit slow, because it's the frame rate of my webcam. If you use a video file it's much faster. For comparison, let's try with C Python. Yes. It's not even too bad. I mean, more than one frame per second. Yes. From now on, I will avoid showing you the real time image. I have also a quick program to benchmark the various version of the code, so this is C Python, which does almost five frames per second, and this is pipi, which does more than 270. So, yes. This is very good. Pipi is some large number faster than C Python. Here there are the histograms, but as you see, the way axis is at a different scale, because I'll say, well, the C Python would be too little to be seen. But yes, I mean, the version, this works, it's fine, but it's very bad code, because, for example, here we have the logic for computing the index for a single pixel, which is repeated again and again, and it's very bad. So, what we do, we start to make it a bit nicer, and we write some functions to get a pixel and to set a pixel, so our code becomes a bit nicer, so you see that we can call this function to get the pixel. What happens if we run it? Let's try on C Python. It's computing only ten frame on C Python, because it computed all the frames of the video file that would take forever. So, you see, it's like half the speed of before. Let's try on pipi. Almost the same speed as before. Actually, the number varies a lot, because, of course, it is a laptop with hyper-trading and the temperature of the CPU, so we take it with a grain of steel. But yes, in general, basically, this version is the same speed as before, and yes, so we start, we wrote code which is nicer and better, and pipi is already faster than C Python, much faster than C Python than before. So, we do more. We start to represent an image by using a real class, a natural instance, so we have this class, we save the width and the height and the data array, and we have this getItem and setItem special methods, which computes the actual index in the array for us. So, now it becomes much nicer to write. I don't have a slide here, but I think it's version two. Yes. You can see here that we can index directly the pixel inside the image, and, as you can guess, we run it on C Python, and it's, again, slower than before, and we run it on pipi, and guess what? But still the same speed. So, you start to see a pattern here, because we introduce more and more layers of abstractions, and every layer of abstraction introduces an overhead on C Python, and no overhead at all on pipi, and this is because, as we will see later, the get has logic to remove this kind of overhead, and so, as I said, the better the code we write, the faster it becomes. So, now we want to make it even fancier, and we start to represent a point inside the image, like with a class, and we can add a point to another so that we can compute offsets for the point, and also we want to abstract the notion of how to iterate over the image into a nice method, because, for example, in this version of the code, we run the filter, the kernel of the computation on all pixel of the image apart the border, because, as I said, we don't have the neighbor pixel to consider, but there are various possibilities and other ways to handle the edge cases, so we didn't want to put them in the for loops, like we see in the for loop filter, we have this range from one to h minus one, which basically means to avoid the border, but it's much nicer to put it in the method, and as we see here, basically we return an iterator, which iterates over all the pixel of the image by computing them using the product. And so, basically, we can start to write our code in a slightly different way, for example, here you see that we iterate over all the pixel, we are only one loop instead of the two nest loop of the for, and to compute the native words, we can use a nicer syntax, because we can just say take the point, which is one pixel right, and one pixel to the bottom, and et cetera. And, yes, I show you, on CPyton, it starts to become very, very slow, now we can even drink some water for the way, maybe I should drink the whole bottle, I don't know what's happening. Yes, so remember we started from on CPyton to almost five frames per second, and now it's like ten times slower, or something. Guess what? Oops, sorry. Version three. Yes. No, this is slow. Ah, yes. I think this is my laptop, which is getting hotter, but yes. No, actually, there is a bit of penalty here. If you run it a lot of times, you see that you get a slightly smaller percentage of number of FPS, probably the GIT introduces a bit of overhead, but it's really teeny compared to the overhead, which is introduced by CPyton. I didn't look at the exact code produced by the GIT to see where this fraction of the performance is lost, but it might be that, for example, that here hightertools.product actually creates a new object, which needs to be collected by the GC, so we had a bit of pressure or something like this. I don't know, but really it's negligible in confronting, comparing it to CPyton. So we want to go even faster. Is this code big enough to be read? I hope so. So basically what we did so far is to do a manual computation. We set this matrix, which is shown here with two matrices, and the idea of a filter is that for each pixel in the image we want to compute the multiplication of this matrix, and here basically we do it manually, but it would be much nicer to abstract also this computation to write it in a saner way. So here we have our class, which represents the kernel, and when we give it the matrix, this is a copy and paste from the Wikipedia page, and when we call this object passing an image and a point, it computes the multiplication of the matrix for the point. And yes, I mean, this started to, the code now starts to be really nicer than before, because for example if we want to try another filter or something else, we can just edit the values in the matrix and be happy, and so let's try it again on CPyton. Yes, now I really can. Yes, basically, I mean, the talk is scheduled to be to last one hour, but most of the time it's spent in waiting for CPyton to do the same. Is it ready yet? Yes, even slower, not much slower than before. So now, guess what happens with CPyton? Any guess? Now it's much slower, like 10 times slower than before. I mean, it's still much faster than CPyton. If you start from this version of the code, you run it on CPyton and PyPy, you see that it's 76 times faster and you're happy and you go to conferences to say that PyPy is really nice and it works. But actually, I mean, here we are running much, much slower than we could, because the title of the talk is abstraction for free. This is not free at all. I think that PyPy sucks and I'm a liar and please go home. Well, of course not. Yes, basically, I will show you later why this happens, but the point is that the PyPyGit works by computing loops, detecting loops and compiling assembly version of these loops. But here, normally, when we have this kind of code, we have one loop, which is the for loop, and the git sees a list of operations, so it emits assembly code for it and it is run in a linear way and it's, well, the CPU can do it very fast. But here we start to have a lot of nested loops and basically it means that if you look at the generated code, when you are iterating over all the pixels, then you do the equivalent of a function call to the other loop, which has been compiled by the git. And this kind of operation is slow-ish on PyPy. As long as you stay in one loop, it's very fast if you jump between loops, it's still fast, I mean, it's still 70 times faster than CPy, but it's not as fast as if you stay in a loop. So one possibility to solve this issue is to unroll manually the loop. So basically here you see that we have this function, which creates a kernel function from the matrix, and I'm using this PyPy tools module, which I wrote and it's available on PyPy.ai, and it does nothing special, it's just a nice wrapper to generate code on the fly, and you see I'm generating a function, which I call apply, which takes an image and a point, and I manually unroll the two nested loops here and for each combination of J and I, I accumulate the value of the computation in this special variable, which I then return. And then with this nice library I wrote, you can ask it to compile it. Compiling this code is really done by running exec on the string, nothing fancy. And then I return it. Basically, if we pass this kernel, JX, it generates this kind of code, which you see is very, very similar to the code I showed you earlier, which I wrote manually, but now it's much nicer because the API of my library still looks like this. I create a kernel, and it automatically gives me the nice, fast function. So let's try it again. I don't even try on C Python because you can guess now that it's utterly slow. And yes, we are fast again. Now I'm getting less FPS because probably it's really getting hotter, but trust me, earlier I got a higher result. So this is the final graph, and we see that PyPy is again 400 times faster, 400 times faster than C Python, repeat. So what we learned so far is that on C Python, every time you try to be nicer, well, you pay a cost because Python is a dynamic language, and every time you call a function, it has to do a dictionary lookup in the global namespace, and every time you call a method on the class, it has to lookup the method in the dictionary of the class or of the instance if it has. And every time you multiply the numbers, it has to create a temporary value and this kind of overhead. On PyPy, we saw that the abstractions are almost free. Yes, there are a bit of overhead from the first to the last version, but it's not much, and it's something that you can pay quite easily. If what you gain is much nicer code. So in the second part of this talk, I want to give you a very rough idea of how PyPy is able to do this kind of magic and optimization. So we will see some example of code and see what the JIT produces. But I warn you, this is not a detailed explanation and this is not even a completely correct explanation. I try to simplify things because I think it's easier to understand. If you are interested in a deeper and more detailed explanation and more correct one, I gave a talk at some Europe Python ago, I think it was 2013 or something, and you can look at the slides or the video of the talk and look at it. So let's start from a very simple piece of code. We have this function, which computes the sum of n numbers and it's just a Y loop. When you run it on PyPy, what happens is that at some point PyPy recognize that we are running a loop, so we are running the same code again and again, and after a certain number of iterations, the JIT kicks in and compiles the code of this loop. And what produces is something like this, this pseudo code, but I think you should understand more or less what's going on. The first thing to notice is that we are compiling only the loop. You see that there is no code for the two lines before the loop and for the line after the loop. We are running only the loop and the loops in PyPy are always infinite loops. And the way you exit the loop is by failing a guard. What is a guard? In this pseudo code I wrote the guard as assert. Basically, in the JIT code we insert some checks here and there to ensure that the preconditions we assumed are still true. So, for example, if we pass an integer to compute, we have total and i and n, which are all integer variables and we can produce a specialized version for this code, which is able to do an integer addition here. If we pass a float, the assembly code is no longer the same because at the CPU level we have different instructions for adding two flows or two integers. So, the JIT will have to generate another one and here you see that the guard is that we are checking that the variable we passed is actually an integer. And also because of the semantics of Python every time we do an addition of two integers we need to check for overflow because in this case we need to switch to longs. I'm talking about Python 2.7 here. So, you see that every time we do this integration we also need to check for overflow. This is also a guard. So, what happens is that in the normal case you have a very fast loop which sums integers. These are really low-level integers. Like if you write a program in C it's not like a fancy structure which is malocked in memory. It's really an int variable which is stored in a CPU register. And as long as there is no overflow everything goes fast. Also you see that the condition of the loop has been turned into another guard here so it means that at some point this guard will fail and this is how we exit the loop. And then there is a lot of complicated code in PyPy which backs from the JIT code to the old interpreted one. So, let's see what happens when things are more complicated. So, suppose you have an if inside your loop and in this case every other iteration you do one operation and every other iteration you do the other operation what happens is that at some point we are on the loop and after a certain number of iterations the JIT kicks in and compiles a version of the loop. And when the JIT kicks in it sees only the iteration which is being executed now so we only see one path through the code not the other. So, for example, in this particular case we only saw the then branch of the if but not the else. You see that here in the code which has been generated by the JIT it's more or less like the one before exerting that we are in this branch of the code. The other is not considered at all. So, what happens is that the next time you do the iteration this guard will fail and you get out of the JIT code. And there is special logic that checks that if a guard is failing very often then probably means that it's worth compiling it as well. So, what happens is that after a certain number of guard failures the JIT compiles what we call a bridge and basically it attaches another piece of assembly code to the code which was already in memory. We really go in memory and change the operation the CPU instruction from being a jump to the JIT fallback code to a jump to the newly compiled code. So, in this code what happens is that we have two different bridges one is the main loop and the other is the newly compiled bridge but you see that one important thing is that we never do a merge after. So, for example, this instruction which is done after this here is replicated both in the main loop and in the bridge. And this basically is one of the reasons why on Pi Pi you need to take warm-up into account when you are measuring your performance because if the programmer runs for not enough time you spend too much time compiling the various loops and bridges. After a while, not much, you see that on my example a couple of seconds were more than enough for warming up everything is stabilized usually and all the hot path of the code has been compiled by the JIT and you are happy. So, go back to our example if you remember the first version improved version of the code was the one which put the logic to compute the index inside the array into functions and we saw that Pi Pi did not suffer from performance. This is done because for the way the JIT is written inlining happens automatically. So here we have the same function as before but this time the addition is done inside this function and what happens is that the JIT doesn't care and they just see that inside the function we do the addition and so we put the low level instruction directly here but well, Python is a dynamic language and if I just give you this code and ask you to execute it well, you cannot be sure that fn is always the same because it might be that someone monkey page my module and fn is something different or it might even be that I manually change the under-under code attribute of the function and make it executing something completely different and so the function is still the same but the behavior is different and there are a lot of crazy things that can happen in Python and this is one of the reasons why it is hard to optimize and the Pi Pi approach is to insert a lot of guards for each of these conditions and so for example here we check that the global dictionary which is the dictionary where the fn lives is still the same version as before the version is just an internal number that we use to keep track of the state of the dictionary every time we modify the dictionary we increment the version so if nobody touches the dictionary of a module which doesn't happen often after the initialization and importing well, this guard will never fail and it will work fast the same thing for the under-under code of the function but also note that we are smart enough that we put all these guards outside the loop because the jit has a knowledge of what's going on and it knows that by executing these few operations we cannot change the globals and we cannot change the fn code attribute so basically we move the guards out of the loop which means that when you enter it it does a couple of quick checks at the beginning and then in the inner loop where the time is spent there are no guards or a few of them and now you start to see why the version in which we had two different loops compiled was slower because we need to do these checks again and every time we enter a compiled loop function the same for classes so for example here we have the class point which has two attributes and I can compute the distance from the center by calling this function from the math module and suppose I have a list of points I want to compute the total distance and here I insert more and more guards and a lot of things can change so here to make sure that the code inside p.distance is really this one I need to check that the global dictionary didn't change that the dictionary of the math module didn't change that the dictionary of the point class didn't change and that this function still points to the same global as before because it's doing a global lookup of the math symbol and other things so basically after all these guards we are sure that the code here inside is this one so you see that again in the loop we can insert and inline the code to this function which is written in C or in NER Python in the case of PyPy but yes so I think that now you start to see a pattern basically every time you have some dynamic behavior of Python which makes it hard to optimize but PyPy tries to reduce the dynamicity by putting guards and hoping that the situation doesn't change which is very often true because for example modifying the code attribute of a function never happens but if it does, PyPy is still correct it means that the guard fails so we cannot use the jittered code but we are low but we are correct which is important and finally one of the most important optimization of the jit is the one about virtuals and this is where you win a lot because for example suppose here I rewrote it in a different way and at each iteration of the loop I create a new point object and then I compute the distance I mean I could call map.hypot directly but in this case for show you this optimization I had to do this what happens on C Python it means that at every iteration of the loop I create the new object so I do a call to malloc and then I call the init and I set the x variable to i the y variable to i plus 1 then I call the distance so I do another method lookup et cetera and you see why you are paying a lot of performance what happens in pypy is that pypy recognizes that here at every loop we are creating this point object we are calling a method init but we are immediately destroying it it never escapes the loop so the jit is smart enough to explode the structure of the point object into the local variables so instead of creating an object in memory and storing x and y in memory it stores them for example in some CPU register so here instead of having an object p and the p.x field we just have a px local variable and the same for py and and so now we can again inline the call to p.distance note that we have less guards than before because now we know that p is of type point because we just created it but we didn't create it because we virtualized but from the abstract point of view is this kind of class so we can even reduce the number of guards and basically that's it and that's why why py py can be so fast and basically by applying this kind of optimization you can see that in all the version of the code of the demo I showed you we removed all the overhead of the abstraction to get more or less the same performance as before it's possible that the bit of slowdown you saw it's because of these extra guards you pay a bit of for the dynamics of py, but not much and there is another thing I wanted to show you and it's related to this guard I showed you before because for example when we do an inline call but basically the jit has to check that this code object is the same as seen before but let's go here until the version 5 here we are creating dynamically a new function so it means that we are creating a new code object again and again and in this particular case py py is fast because this function is created at global level so we create only once and the jit always sees the very same but if I modify the code actually I didn't try this and I hope that my theory is correct but I'm quite sure it is so if we move this call from outside to inside then it means that every time the jit will see two code objects so it cannot apply the same optimization as before so probably it will be very slow so this is an important thing to know because I have seen it in real life code which maybe creates classes inside functions or or things like this and in cpy this does not change much py py changes a lot because if it is the same function it has a different identity so the jit cannot be sure that the behavior is the same and it has to recompile the same code again and again so basically that's it there are more py py events here at EuroPyton tomorrow we will run the py py help desk so you can come on Friday there is a talk about the general status of py py which will be done by Armin and I don't know what he is talking about in particular but yes if you are interested in py py you might be interested in coming or you could just stop us during the conference and ask questions because I'm happy to answer all the questions you have if you want now first of all let's thank our speaker for the presentation and we have time for questions thank you for your talk wanted to ask why do you need to check the version number is it not enough just to check that the code pointer is still the same sir could you speak as lower because I didn't understand why is it not enough to check that the code pointer stay the same why do you also need to check the version number of the globals why is it not enough to check the version number why is it not enough to check that the code pointer stay the same because the dictionary might change maybe the same dictionary but the content of the dictionary has changed if I already didn't understand the question it was slide 31 if I remember correctly but it would still have the same code pointer because we are also checking the code pointer both need to be checked because you can change your python interpreter by changing what the name fn refers to but you can also keep the same name fn but changing what fn dot code refers to in python so we need to check both I don't know if this answer your question but let's try so now foo is free but if I can do this so this is kind of dynamicity of python it gets in the way when you want to optimize but I see for example you could have this two function with the same code object but different globals so it's easier to check for the function object or maybe you have a global look up of constant and then you want to check for the constant is the same basically Hello, thank you for the presentation I have a question about benchmarks you showed earlier I think it's kind of an unfair comparison because most cpython developers would not represent an image as a list of lists or import array of sane developer in cpython would go straight to numpy for this so I was wondering if you had benchmarks and comparisons against numpy in this case Yeah, I didn't try Yes, this was not the point of the talk I know that if you write cpython you probably can be much faster by using numpy or some specialized library but sometimes you just need to write your python code which is not in some library so it's easier foo to know what you pay and what you don't pay it was just an easy example to show basically going back to slide 31 I think it was you're assuming essentially that you're running single threaded there how do you cope with the world changing asynchronously so somebody could in another thread change one of the built-ins that'd be insane thing to do what happens with threads so you've got your loop, do you periodically check that the state of the world hasn't changed or I don't know Armin, do you know? So what happens if you have a small enough loop then the loop can be written in such a way that you know that it's not going to release the gil so as long as the gil is not released then you use the fast path and for the case where you call something and maybe the gil has been released then after the call returns you need to do the checks again when you went from version 4 to version 5 that sort of optimization how do we, if we run our own code on pipi or something and how do we find out that that's the sort of optimization we should be looking for how difficult is it to find out sorry, the question is how did I find what so if we run our own code on the pipi and it is kind of not very fast for a similar reason how do we find out on our own code that that's the sort of area we have to explore well actually I knew because I know how it works yes what happens in real life is that you have this program and you see that slow or not fast enough so what they usually do is first to profile like for example in VM prof to see where we spend most of the code then there is a tool to look at the code produced by the jeet and if you know what are the optimization which you expect to be done and you see that one optimization is not applied, it doesn't apply then you have a hint of what's going on for example a lot of time happens that you expect to be virtual like this kind of things when I write code with pipi in mind often I know that the object I'm creating is temporary so it will be virtualized but sometimes I look at the trace and I see that it's not because maybe it's passed to some function which was not compiled by the jeet or other things and this kills the performance so if you write the code by removing the line of code which forces the object it becomes virtual again and it's much faster yes, unfortunately there is no easy to use tool to detect these scenarios you have to know basically what are the optimization that you expect and if you see that some is not applied well then you try to understand and that's why this kind of talk is useful I think because you start to get an idea of what you can expect and cannot expect from the jeet basically did I answer your question? are there any plans to remove the gil from pipi? sorry then was it a question? are there any plans to remove the gil from pipi? the author of pipi? plans to remove the gil from pipi? the global interpreter log? yes then I understood completely different question yes, there is an ongoing branch but I don't know much about it and probably you should come to the Friday talk to know more hello you had some guards about that it doesn't overflow but I don't think you can guarantee this unless you can solve the halting problem so if the guard fails during the loop does it just revert to plain python code or what happens then? so the question is what happens when a guard fails? yes, inside the loop there was a assert that it doesn't overflow but you can't know this ahead of time well basically what happens is that we exit the jitter code and we have a special piece of code which tries to recover the state of the computation and we go back to the interpreter so failing a guard is expensive but most of the guard never fails basically or fails very rarely thanks for your talk in your abstract you mentioned that you're going to compare how using pipi compares to using other popular optimization methods like siphon do you shed some light on that? yes actually I'm sorry about this because when I wrote the talk proposal it was months ago and then when I prepared the talk I forgot I wrote about this stuff and so I didn't start in the talk but yes I have a real-world experience of it because I wrote a library which is cup and pie about parsing cup and proto in Python cup and proto is a binary protocol and the goal of this library is to be fast both on cPython and PyPy and it was hard because as you saw the two interpreters have very different performance characteristics so what I ended up doing was to write pure Python code which was very fast on PyPy and I annotated it externally with siphon annotations so basically on cPython I compile the pure Python code into siphon code which is a bit faster and on PyPy I just run the pure Python code which is much better in generally when you are running code on PyPy if you have to choose whether to run pure Python code written in c pure Python is faster because the jit has more knowledge about what's going on and there have been a couple of places in which I could not find a version of the code which was fast on both implementation so I had to write two different versions one for PyPy and one for cPython so in this example is pretty small how well does it work on real life complex code when a lot of in each trace a lot of levels are traversed a lot of functions are called like ten functions deep do you have a rule of thumb how far it works? I don't have a precise answer of course because it really depends on the kind of code that you have seen in real life code in which PyPy is ten times faster than cPython but sometimes it happens that you start from an existing piece of code and you try PyPy and it's lower and then you look at the traces for example this is what I did at some point and I saw that something that I expected to be optimized was not by tweaking a couple of lines of code I managed to make the code ten times faster or something like this in this particular case I'm talking about the problem was that there was a dictionary in which the user mixed the unicode and string keys and which is fine in PyPy but on PyPy we have a special optimization that if you have a dictionary and there are only strings or only unicode you get a specialized implementation but if you mix them you get the non-specialized lower implementation so in this particular case by switching to using a homogeneous type for the key I got it much faster but yes I don't have any magic rule to what developments can we expect for the jit in the future can you make it for example more intelligent to handle the example you showed at the end where you move the code around to handle that better sorry, the question is what do I expect for the jit in what new developments you can implement there and what we can expect in the future sorry, do you understand maybe yes ok, try again what developments can we expect for the jit in the future can you make it more intelligent so that it handles your last example for example better yes, I think that nowadays the jit is kind of mature so we already need all the easy things to do to make it faster but there are always new ideas to make a particular case faster and optimizing one particular behavior or something I don't think we have any magic idea in mind right now but we are keeping adding new features basically and it's very nice because every time you add a new optimization you unlock other optimization I didn't show it but you can see it in my other deeper talk the optimization of the jit they really work well together because sometimes an optimization like you make something virtual then produces some code which is handled very well by another optimization so at the end a lot of operations are removed or simplified so maybe if we add some new optimization to remove a particular piece of overhead then by doing this the other optimization will start to work much better than before and you get real speed up but yes I don't know how to answer this question more precisely basically and if we don't have any other questions I think we can thank our speaker once again