 Thank you, so the gil is dead or That's what I pretend What do I mean by this sentence exactly the gil the global interpreter lock? It's well If you have been at the previous talk in this very room, you know already exactly what the global interpreter lock is well, I assume a lot of you do if you don't well it is simply lock that in in the In the C Python or pi pi implementation of Python It is one global lock that needs to be acquired in order to run any Python code so Python code can run only if it has a lock which means if you are writing a Python program using multiple threads and And all these threads are trying to do some CPU intensive computations Then actually only one of them will run at the time This is the basic story and this has been the case since well since forever since 92 in C Python and forever in pi pi So Here I'm talking about pi pi. What is pi pi pi pi is an alternative implementation to To the Python language It was started ten years ago It has better performance most of the time because it has well just in time compiler built-in Try it is great Now STM means software transactional memory. What do I mean by that? Please see the part two of my talk And now pi pi STM is an alternative to pi pi So it means you need to use another version of Pi pi But this alternative does not have a built-in global interpreter lock. That's the point so Let's start with an example this is the result of Running I Run it before on my laptop so it is Richards which is some random benchmarks and CPU intensive benchmark Here I run it's like a total of 10,000 iterations so divided in four threads So here it's just naively using threads That starts new threads in order to create four threads each of these four threads run for 2500 iterations Okay, and we see how much time it takes. This is the time So 8.01 second is how long it takes on pi pi the regular pi pi so with a gill and Without a gill it takes only 2.5 seconds on this laptop Which is 2 slash 4 core laptop So it's good. This is an example where it works Great, I suppose. Yeah Okay, so well, I mean Obviously obviously Obviously such an example as Richard's well If my goal was only to run for Richard's example For Richard's benchmark in parallel anyway, I could run four processors. That's kind of obvious Okay, and then if if you are into running sub process, then maybe you are you are already using this Multi-processing module in Python, which is basically just running sub processors So the advantage of running sub sub Python processors is that each process Has its own global interpreter lock So if you are not using I mean typically in this case, you are not using threads at all. So it does not matter But then the well, of course the big drawback is that if you're running in different processors and then Well, your data needs to Well, you have different processors and you need to pass data around which is Well, which can be hard basically you cannot you cannot take an existing big uncomplicated program and just turn it into a multi-processing program that does not work usually So by your position of Pi pi STM is about running multiple threads in one program and well The idea that you have shared data and we can use the shared data, which is both good and bad So here is an example where I want to look into the source code So this is an example where I create a random graph. So it's about here Create my list of vertices Then I have edges that are from a random vertex to another random vertex On the goal here of this algorithm is to find the two points in this random graph, which are as far As far from each other as possible. So, I mean Which is a specific problem, which I'm sure has has complex has Nice solutions from a graph theoretical perspective But here here three the most naive algorithm I take one point in a merit all of the points that I can reach Etc. So so it means really try from every single point To reach all of the points. So this is a function that that's given one point find the furthest point here and Here is how I Yes, here is how I look for for for the answer for all the starting point I put in a queue The starting point and then I get the results from another queue and My queue is this So I get the point Compute what is the furthest and puts the furthest back in the queue and why why am I talking about queues because? Because well, I want a multi-threaded program. So what do I do? Here yes, so I have a loop that starts some number of threads And gets me two queues This is mostly standard Python, right? If you ignore the fact that the queues are from the transactions module You can think of them as queues from the queue module It's standard Python But the point is that if you run this in the standard Python then you don't get any benefit from from this multi-threaded This multi-threaded well Hacks basically okay, well here here you do basically the The program is unable to be run on on multiple threads in parallel well or It should in theory, but I'll ask this particular example which I which I draw which I quickly did yesterday Does not actually show any huge speed ups too bad Yes, I know it's a bad example basically, but have you seen before and as we'll see in afterwards There are other examples that are larger programs and for these larger programs. It seems to work better Which is good So let's So basically I'm going to assume that you trust me And I will show you now another larger program in which I did the exact same transformation So It's this is a program that was Already around 2011. I think where I did the same demo So it is a 3d renderer entirely in Python and The 0.9 is a number of image per second because that's on see Python Okay, it takes forever basically, but you can see it's changing And then Then if I run it on pipe by Whoa, it's much faster Piper is great. This was the 2011 talk But now Let's think let's run this renderer on multiple threads That's the point of this dog here now and you see that it's about twice as fast. It's great Yes, so we use just a pure Python rendering System It's it you can see that it's not just calling open GL or whatever because it's using a cylindrical projection See there's a It's a wall here. It's not a straight line Just just a random comment Okay, so What does okay now? What do it mean if I have a multi-threaded program in Python? Well This is what what it's this is a summary of in normal pythons or see Python or pipe I without SDM this is a summary of what you can expect from a multiple threaded from a multi-threaded program Well, if you well You need to divide your threads in two categories You have the threads that are doing input output so they are typically typically blocked in some operating system call like file.read or Receiving from a socket or doing things like that and you have on the other hand the kind of thread that is CPU-intensive that wants to compute something. So it's In standard Python with the gil Typically you get this result So and you can have any number of threads doing I oh that's fine that works great But you can have only one threads that's doing cp-intensive computations at a time So now if you if inside you're using pipa-stm you get this slightly strange result Well, you can have any number of input-out threads. Okay, and instead of one you can have up to n Threads that are doing cp-intensive things Where the number n is actually compiled into the pipa the pipa-stm So that means for example here I'm using a pipa-stm with n equal in equal 4 I think but well you can have one with n equal 8 Typically typically you will get one with n equal is number of cores on your machine or Well, I don't know a number larger than the number of core of your machine would work as well That's the point the Well, so far so good However, the the the problem with this approach in pipa-stm is that only one thread at the time Can be switching between these two modes What does it mean so I have an example This is a completely trivial example I Have my run thread which with the print I commented out It's purely computing things on the CPU if I run this on pipa-stm. It will happily use for threads Good, however If I run this On pipa-stm then things get slow again Because because each thread is trying to To be the threads Well, it's basically because each thread is switching from I need to compute stuff like this line here To I need to do input output Which is that line here So this is the main drawback basically this is a point to look for when you are using pipa-stm You need to find and well fix quote-quote somehow with And I mean in general yes it can occur for example if if this is Doing some some complicated computation like rendering a web page or doing this kind of stuff But but then in the middle you get a call to the logger to write the logs So well In this situation you need you need to think and refactor your code. So that's the call to the logger Would me would be moved somewhere else so This is To conclude the first part of my talk which is pipa-stm You get a compatible with standard Python well as much as pipa itself is Compatible with standard Python you get multiple multi-threaded programs that happily run out of the box and Well still a few bugs, but straight out Now I promise at the beginning that I would explain what STM actually means software transactional memory Well, what is a transaction? Well, let's look at it. Let's look at it this way you have a program and think about how the Gill the global interpreter lock works in CPython for example or in pipa So gill is acquired Then you run stuff then you really is a gill and once When you really the gill you do some input output maybe or maybe not and then you require the gill so one transaction What what I'm going to call a transaction is the amount of time between the acquire the gill and the next release the gill Why is it called a transaction because that actually is very similar to database transactions Why is it similar because well what is transactional about it is is the complete memory all your objects you have to look at Yeah, the complete memory used by your program and think this is the database as Well think about it as as a regular database What you have in a regular database is you you start a transaction You do some reasons you do some rights as long as you do reason rights You have a consistent view and at the end you try to commit and this commit may work or they may Cause an abort so it means the commit fails the commit fails if someone else Did in parallel another transaction and that other transaction happened to change The object that you have read So this condition is exactly the same one as the one used in software transactional memory so So this means that you will run So Let me draw pictures in the air When you when you have the normal CPI from you run one one Fred then you pose it Then you run another Fred and you pose it then you run again the first Fred and so on and with this Transactional approach instead you start running both Fred's One commits. Okay, the other one continues. Then the other one tries to commit and Then it may ever succeed in committing or fail if there was a conflict And if there was no conflict then you are happy then the two the two Fred's Work as if they were they had been executed one after the other And Well, if there are conflicts you are unhappy and you throw away the what you did and you restart it so so one difference with with the standard the transaction and System of database is that here? So the throwing away and retrying but is completely transparent. You don't see it That's a at the Python level Yes, so here is how it works under the hood. Maybe I will go a bit faster here So basically during a transaction you flag all objects that you read and you record the list of all objects that you write and when you commit you you you save that list of objects that you wrote into some log so there is a growing log and Well Basically, you have a relatively simple to describe algorithm to to to know that that Well to know if you have a conflict or not which which means basically if you have read an object That another transaction has written in parallel So concurrently Yes So as I said about and retries are transparent, but that also means that that also means that you can do well There's a input output like if if one Fred is really trying to do input output obviously Once you once you do input output you cannot be Cancelled and retried so one only one transaction Can be flagged as now this transaction actually did input output So this transaction here cannot abort anymore Okay, and well so everything that I described so far is basically something that that's That looks a bit like what's Larry Hastings Previously explained in his talk To how to make Python interpreter Actually not have a gill. However, this approach that I mean that I'm using here STM is better I think that's the approach of of adding Locks a bit everywhere Because it it also allows well it allows another guy another style of programming What do I mean? I mean that Well, you know that there will be no Switch to another thread In the middle of a bytecode but but this This you can actually add a small hack with atomic and With this you can make it you can make sure that there is no switch to another thread in a larger region So this is actually gives an interesting model to to to to program with multiple threads you can make programs that have actually large region and These large region just by saying with atomic you make sure that there is no possible That there is no possible switch Well, you make sure that this whole block of Python code runs in one action So what it means is that? You can have actually this kind of interface Well, this is just a simple thread pool right you you add you add Some function that will be called later in another thread some with some argument You'd say t r dot run and then the function the other all the function that you have added are Executed at that point in multiple threads. However However, this runs them with with the atomic Conducts manager that I described before so it means that it Gives exactly the impression to the programmer that each of the functions has run completely Serilized So I mean even for the various functions are run in parallel Transactions the transactions that are done here under the hood are each large enough to contain completely the function calls which has well With basically it gives it gives an incredible model It gives it gives well Yes, I'm bit running out of time here But what what I'm what I'm trying to say is that is that this is the model that I think is kind of the future of multi-threaded programming at at least At least if you don't if you are fine with generally I mean if you're fine with not getting to the last last 10% of birth months or something like that This is a very nice model to program. I mean The My 3d my 3d viewer that I did before It's just parallelized by So you compute colon by colon, but each colon Compute you send it to another thread. So you have a thread pool that computes columns But each colon computation is done in a with atomic block Which means all the logic to compute one colon even if the logic depends on some global variable or does things Like it can do whatever it will work It will work the same way. You don't get any race At the level of python Okay So yes That's it We have time for two questions and then we must leave the room. So I will be outside afterwards just I think you you worked around the effect of the print statement, right? Because you cannot make a rollback then Yeah, because you transactional memory As you said on some some memory matrix or whatever But when you have an effect like the print statement, you cannot roll back. Yes But Python has no effect system something like that. So so effects are not annotated. So How does the pi pi STM knows when when it detects an effect something which Manipulates the outside world. Well, it's when when the standard Python would release the gill At that point, you know that something strange is going to occur. I Mean, that's not the complete answer. I can give you a more precise answer, but that's roughly When the simple question would you merge the pi pi STM to the main STM branch? Some someday or main pi pi branch The pi pi STM because I think it's it's a branch of Pi pi so will it become the default in pi pi some day or will it be always Excellent question. I don't know so far I mean I mean this this has a problem that it is between 25 and 40 percent slower on a single thread So so again the same problem So I Don't know basically, but well here we are in pi pi So it's possible to think about more advanced ways where the pi pi interpolates which between Odds and I don't know jit compiled machine codes for the non STM mode and for the STM mode or something So so it's possible that at some point it will be the case. Yes Thank you