 The first talk in this room will be by Xavier Thompson, and he's going to talk about extending Scython with Jill-free types. So that's going to be a very interesting talk, also a very highly technical one, I suppose. If you don't know the GIL, then I'm sure that Xavier is going to talk a bit about how this works and give you an introduction also to Scython. I think overall it's about making your code run faster and, yeah. Please take it away, Xavier. Yes, hi, Mark. Thanks. Yeah, so there will be some technical stuff, but I'll explain all of it along the way, so no worries about that. Well, I'm super excited to be here at your Python. It's my first live talk like this. How do I go about sharing my screen? You have already shared your screen. We just need to put it on the stage. And thanks. Excellent, so I will go off stage now, but I will stay around, collect the questions, and then be back again near the end for the Q&A. OK, thank you. OK, great, thanks. So hi, all of you watching. Wow, live at your Python. It's really cool. I'm super excited to be here. So I'm starting. I'm Xavier. I'm a developer at a French company called Nexity there in the corner. Where I work on this super cool research project involving the Python compiler, we call it Scython Plus because we're extending Scython with experimental language features geared towards speed and multi-core execution. Also, working with us on this are Inria. Those are the people behind the scikit-learn library. Abilian, a French open source software editor, and Terralab, a French cloud operator. And before I dive into the somewhat technical stuff, let me begin with an explanation of our motivations for working on this. Why are we working on this? So we all love Python. I assume you do too, since you're watching this. And we all have our own different, slightly different reasons for loving this language. It's a very coherent language with great many strengths. One particular strength is its amazing productivity. It allows even small teams to build extraordinary things and sometimes in a short amount of time. Projects like Instagram, YouTube, Dropbox, were all built on Python. And people use Python all the time to do all kinds of things. And then they come to conferences like this and share them with the rest of us. So at Next.ED, we have used Python to create EAP5 for enterprise resource planning, SlapOS, one of the first edge computing projects, and Wendlin, a big data engineering platform. But not every project chooses Python. There are also other languages like Go and Rust, which have their own strengths. They are attractive because of their strength in multicore execution and speed. Even Dropbox, we wrote parts of its code in Go. But this should not be taken to mean that Python cannot compete in those areas. There are Python projects like the Scikit-learn library, which are extremely performant, have multicore execution, and are Python projects. And their secret, so to say, the thing we found very interesting with Scikit-learn is that they use Python, which quickly frame as a faster form of Python that can bring significant speed-ups. And so this stack is about how we're extending Python because we want to combine Python's productivity with the raw speed and multicore execution that Python is capable of. So here's where it gets a bit more technical. What is Python? No worries. It's all fairly easy, very progressive in the explanation. So Siphon is a project that's been around for more than a decade. It's led by Stefan Benel, who I believe has a talk just after this, so check it out. He'll probably explain it much better than I. But still, I'm going to give you a quick overview. So Siphon is essentially a kind of Python compiler. It takes code like this, fairly easy Python code, just a simple Python addition. And it compiles it to C or C++ code that looks like this. So this has been edited a lot for your readability. But this is part of a C or C++ extension module that makes calls to the Python C API to do from a C file essentially the same thing, the same functions that the C Python interpreter calls internally when doing the same Python work. So what's very interesting here is that this is still Python. This is still essentially the same thing. But it's in a C file. And this means or C++ file. And this means we can do this. In Siphon, we can write this line. This is not valid Python. It would be a syntax error in Python. But it's valid Siphon, which means Siphon is actually, as a language, a superset of Python. And this line is a type annotation that says that A, B, and C are not Python objects. They are instead C integers. And so if you add this line to your code, just add this line. The rest remains the same. Siphon will take this and compile it to a pure C or C++ addition. So you can use Siphon just by adding a few type annotations to avoid the Python runtime and go straight to C speed. And this is very useful if you do critical code sections. You can just choose and pick where you need to go to C speed. And there you have it. So this is great for linear code acceleration. And it's one of the things that makes scikit-learn work so fast. Siphon is also useful for other things. It's a great way to bind C or C++ external libraries to Python in a way that is very natural. So in this example, Siphon already provides bindings for the C library and the C printf function. So you can just call printf. And this will compile to C or C++ code that includes the correct header and calls the function. And the interesting thing with Python is if you compile the C or C++ file, you get further with a C or C++ compiler, you get a shared library that you can import in Python as if it was a normal Python module, completely transparently. So you can do this. In Python, you can import this shared library object and call from Python any C function as if it was a Python function, completely naturally. So this is very interesting. But all it does so far is bring linear acceleration. One thing that's also important is multicore concurrency. And there, too, Siphon can help. So what's the deal with multicore concurrency in Python? Well, in Python, there's this thing called the GIL, the global interpreter lock. And so quickly, what the GIL means is that when you have several Python threads, only one of them is actually running Python code at a time. The other threads can do IO or call external C or C++ libraries. But only one is running actual Python code. Only one is calling function from the Python API. And then the threads can swap, and then other threads can start running, but only one at a time. So this makes the C Python interpreter thread safe, and it provides amazing thread safety properties to Python code. But it also means that only one of your computer codes is actually running Python at a time. And I think this didn't matter much when the GIL was first introduced in Python, because back then, most machines had only one call. But nowadays, people want and need to write code that goes faster when you add more calls. And well, you might ask, why not just remove the GIL from C Python? And the thing is, there have been several attempts, but it's very hard to do. I think one of the reasons it's so hard to do is the way reference counting works in C Python. So this here is the C structure that represents a Python object at the C level. This is taken directly from the C Python source code, so the Python interpreter source code. And the field that we're interested in here is the ob ref count field in the middle. That's just a C integer that represents the reference count of a Python object. The way reference counting works is when you alias the reference, when you bind it to a new name, when you do A equals B, you increment the reference count, because now there's one more reference. When you drop a reference, when it goes out of scope, you rebind it to something else when you delete it manually, then you decrement the reference count. And you know that when the reference count reaches 0, the object is no longer used. And you can free the memory. And this is how automatic memory management works in C Python. The thing is, this is not thread safe. If several threads could do increments and decrements on this reference count at the same time, we would end up with bogus reference count values. They would mangle the value. And this is very bad, because at best, you get a reference count that is too high compared to what it should be. And then it never drops down to 0, and the memory is never freed, so a memory leak. But at worst, it's too low. And then it reaches 0 too soon. And it's freed while there are still references that use this object, so you get a segmentation fault. So this is a fundamental part of how Python works. And it wouldn't work without the gil. So we cannot do this without the gil. But Python provides a way to release the gil. There's even a syntax for it, and it even looks like Python. Magic, boom, you have released the gil. But of course, as I said, there's a catch. If you do this, you cannot use Python. If you do this, you cannot write this code, for instance, because this is a Python addition. Python sees this and knows that it's a Python operation and throws a compilation error because it won't allow it. But if you add this line, then it's fine. Then it's just a C addition, so no problems. Same thing, you cannot use a Python list without the gil. You would have to instead use a C buffer, which is more error-prone, harder to use. It's not Python, it's C. And also, you cannot call functions from the standard Python library. You cannot open a file. You would have to call equivalent functions from the C standard library. So you can open a file this way, but it's more error-prone. And essentially, what you're doing in this no-gil section is a C code with a Python-like syntax. So this is great, but it kind of seems like you can have one or the other. If you look at Python as a language, actually, it's very powerful because it's a language in which Python and C as C++ can coexist. So you can have the best of both worlds. You can have Python objects with all their high-level features, guarantees, but they're not so fast. And you have the gil, so only one core is used. Or you can do C or C++, which can go very, very fast. But you don't have as many high-level guarantees. No threat safety guarantees, no automatic memory management. And so this is great, but what we would really want is to have all of it at the same time. It would be great if we could have high-level features and multi-core concurrency, multi-core performance together. And since it's not there yet in Python, we thought, well, maybe we can add it ourselves. Maybe there's something to be done here. And so we started hacking a bit in the Python compiler. We experimented a bit. We tried several things. And we came up with this class system, this language feature. So this is a class system that looks a bit like a Python class if a Python class had static typing. And another thing is it behaves a bit as much as possible like a Python class. So it's familiar to a Python programmer. And the interesting thing is this compiles directly down to a pure C++ class with C++ methods. And this class inherits from a base class we call psi object. Kind of plays the same role as the object base class in Python. And this base class has an atomic integer as a reference count field. And the fact that it's atomic means that C++ compiler will compile this to low level hardware atomic instructions that will guarantee that even if several threads increment or decrement this reference count at the same time, there will be no bogus value. So the reference count will remain correct. So it's a thread safe reference count, which is great because this means we can do this. We can use our class system without the gil safely. And the memory will be automatically freed when it's no longer used. So this is great. But all we have so far really is a C++ class with an atomic reference count, a Python-like syntax, and a Python-like behavior. But it's not an actual Python object. So we made it one. We made PsyObjectPythonCompaddle. We made it possible to do this. You can import once you compile this class. You can import it in Python. And it looks like a Python object. It looks like a Python class. Python thinks it's a Python object. You can access its fields, call its methods seamlessly. It all works like you would expect. And how does this work is that. So what we did here is we took again this PyObject struct I talked about earlier that represents every Python object. And we made PsyObject inherit from it. So now every PsyObject is also a Python object. And then we reused Psython's extensive expertise on how to wrap C and C++ functions to the Python interpreter using the Python C API. They already know how to write all the boilerplate to a C extension type, all of this. So we just adapted this to make it so that all the C++ methods our class has can be called from Python. And then it works great. Except I kind of glossed over one thing. If you pay attention, you'll notice there's something weird. We now have two reference counts. We have the atomic reference count and the Python reference count. And we have to use both because the atomic reference count is the only one we can use without the gil. The other one is not thread safe. And the Python reference count is the reference count that Python will use when you import the object in Python because it doesn't know about the atomic one. So we kind of have two independent reference counts and we need to free the memory only when both reach zero. So how do we do this? Well, we came up with a need scheme, I think. What we do is when the object is created, it's first seen as a side object. So the Python reference count is still zero. And then at some point, we will cast it to a Python object and the Python reference count will go to one. And we can notice when it goes from zero to one because we know it was zero before. And at this moment, if it's the first time this object is cast to Python, we not only increment the Python reference count, we also artificially increment the atomic one. So this is completely artificial, but it means that as long as the Python reference count is not zero, the atomic reference count is not zero. So this is like half our problem solved. And then what we do is when the Python reference count drops back down to zero, instead of freeing the memory like a normal Python object would do, we just decrement the atomic reference count. This compensates the previous increment. And then we only free the memory if the atomic reference count also drops back down to zero. If it doesn't, there are still other references and the object will be freed later. The way we do this is a bit technical side note. In the C structure that represents every Python type, there's a TP dialog field that's the function that the interpreter will call at the C level when the object needs to be freed. And we override this. And instead of freeing the memory, we just decrement the reference count, the atomic one. So this is great. Now we have actually a full-fledged Python object. But we still need to make it fully thread safe. For now, the reference count is thread safe, but several threads can still access the object's fields at the same time. And so we tried several things. And the idea we choose in the end, we found is the best one, is reference capabilities. This is inspired from languages like pony and rust. It's still experimental. It's still a work in progress. We were also inspired by this thesis that explains it all quite well. But quickly, the way it works is every object at any point in time should be in one of these states. It should be either thread local, which means that all references to it live in the same thread, or it should be immutable, which means that all the references can only read from the object. And that's safe too, because the object cannot be changed. Or it should be locked, which means that all references share a lock that they use to access the fields and call the methods, and this serializes the access. Or it should be active. That's references to the actor model. So the idea is they don't use a lock, a mutex instead. They use a message queue to access the object asynchronously. So when you want to access the object, you can just push a message in the queue. And at later point, in a separate thread, the object will pop the message, handle the task, and do it asynchronously. And this is quite safe because only one message is handled at a time. And then there's this last one that's more peculiar. It's isolated. This means that the object is only reachable through a single reference. In other words, if we deleted this reference, the object would be forever unreachable, and the memory would be freed. It's like the last reference. And this is useful because thanks to this isolation notion, we can make the type system more flexible. So the type system will guarantee that these properties are respected. So by default, an object is created at thread local. So this means you can do all you want with it, like a normal Python reference. You can do everything you want. But you cannot share it with another thread. And the type system, the compiler, will enforce this. So you cannot do this either. You cannot alias it to a locked capability because then the object would be seen as thread local and locked at the same time. And so you would be able to share the locked with another thread. So this is not safe. But what if you want to stop seeing your object as thread local and change its state to say, from now on, I will use a lock to access it? Well, you can do this if the object is isolated. This is okay if the A reference is isolated. In that case, you can consume it. We introduced an operator for this. And then the A reference goes out of scope. And this is okay because now there is no other reference to contradict the idea that the object is locked. So now there's only B, which sees the object as locked. So from this point on, the object is locked. And then you can share it with other threads. And this is okay if A is isolated. How do we know it's isolated? Well, we can do in some cases just a bit of static analysis or we can use the type system if we know the object, if you already know the reference is isolated. But if we don't know any of this, we can just at runtime look at the reference count. And this works too. So this makes the type system very flexible. And I think it's a good idea because it means that in single thread code, you don't have any performance impact, no overhead. And also you don't need to think about any of this. You just use thread local references and everything works as normal. And in multi-thread code, you have some overhead but only where you actually need it. And you also have a guarantee that once your code compiles, you will have no weird bugs. So this is good. So I see I'm a bit short on time, so I'm gonna speed it up. So now we have all we wanted, we check out all the boxes. We have high level features and fast speed and multi-core execution. And so can we use this to speed up Python code? Well, it's still a research project. It's still preliminary, but we already have two results which are interesting. The first is that next to the we have, we're using Syson to run a program in the initrd to implement a secure boot strategy. And we discovered a non-expected bonus. We can actually compile it directly to a C++ executable if we avoid using any Python features and avoid, we don't even need to start the Python runtime. So we save up on Python startup time and on executable size. And the other very interesting result is that at Scikit-learn, they have started using experimentally Syson Plus to try it out and see if these features are useful. And just the other day, developer Julien Gervagnon sent me this benchmark, which just compares the initialization time of a KD tree, which is a useful data structure. And we see that in blue, the Scikit-learn version that is already quite optimized is slower than the yellow version which is the Syson Plus version. So we also already see that Syson Plus brings interesting speed improvements and also that it scales very well over multiple cores. So since I'm a bit over time, I'm just gonna stop here. We're very excited by this result. I wanna say thank you to you for watching, to the Europeans, your Python staff and to all the people that made this talk possible. Thank you very much. Thank you very much. That was a very nice talk. Right, that's a good slide because that's what I wanted to ask where we can get more information about Syson Plus. Yeah, I'm definitely gonna have to have a look at that. That's very interesting what you're doing there. Do you have any plans of putting this code back into the original Syson? Yes, totally, that's... Well, we hope we can come up with something that is polished enough and well enough to be accepted by the Syson core team into Syson. That would be a very great end goal. For now, it's still a research project so it's still a trying stuff out phase. Okay, excellent. And then I have one question here set up. So if Psy object basically reassembles extents Python object, is this similar comparable to replacing C Python to basically a new interpreter? No, because... So when we use Psy object without the GIL, the difference is that we have to use static typing. We cannot have all the dynamic aspects of Python. So everything is typed and it behaves a lot like Python, but it's typed. And you have no choice about this. And this is what allows us to not need the GIL because if it's typed, there are lots of things that are easier to do. When you see it as a Python object, if you cast it to a Python object, then it uses Python features from the Python C API. So it's normal Python wrapping this underlying C++ and that requires the GIL. So you can see it as a full-fledged Python object with the GIL or as a statically typed Python object without the GIL. Okay, thank you very much. So if there are any additional questions, I would suggest that you go to the breakout room too, the breakout Brian, and then you can ask more questions and Savallier will hang out there to answer them. Thank you very much, Skavir. Thank you. Bye-bye.