 That was a good introduction. Thank you. All right, so today I'm going to be talking about building Python extensions with Rust. This talk came about because I wanted to write a back end for Datetail and found that there were a bunch of different options available, and I wasn't really sure exactly which ones to choose and how they worked. So a couple things, a couple caveats I want to start with. One is if you don't know Rust already, you should probably leave because this talk really requires deep knowledge of Rust. I thought that you would laugh. No, you do not need to know any Rust. There are also a bunch of slides with a bunch of code on them. Just get the general feel of the code. You don't have to look at all the lines and make sure that I did things correctly because those are more for reference later. So I don't want you to feel overwhelmed by all the code on the slides. Yeah, so let's get started then. OK, so why would you even want to write any sort of extension in Python? Well, it's a fairly common thing to do because Python is often known as a good glue language in the sense that Python is very expressive and sometimes people call it pseudo-code as code that you can just run. And it's great that way. But that often comes at a runtime cost. It's slower than many compiled languages. But you can kind of have the best of both worlds if what you do is you write the slowest parts of your application in another language or even you just use it for something like system orchestration where you're just calling system functions which are fast. And then you're providing a high level API to your users so that they can write their code in Python. And in fact, if you look at the Python ecosystem, a huge amount of the fundamental libraries that we use, including the standard library itself, are essentially just glue libraries. Like NumPy is glue around some C and Fortrain code that's super optimized scientific code with a nice Python wrapper that we can all use. OpenCV, TensorFlow, PyTorch, Pillow, these all have compiled backends. And so let's look at exactly what it means to have a compiled backend. So here is the demo function that I'm going to be using for these comparisons. It's called the, this is an implementation of the SIV of Eratosthenes. And what this does is you give it a number and then it calculates all the prime numbers that are up to that prime number. And the way it works is that it allocates an array of all the numbers, one to that number, and then it goes through, or two to that number. Then it goes through and it eliminates all the, it goes to the first prime, it eliminates all the, even multiples of that, and then it goes to the next prime, which is the next number in the list that hasn't been eliminated and eliminates all the multiples of that. And you can see this Python implementation is like six lines or something. It's nice, easy to read, and the output seems to work. We have two, three, and five for five, and then if you have something like 20, gives you all the prime numbers up to 20. And this is the implementation if you use the CAPI. You'll notice that the font is a little bit smaller because I couldn't really fit it all on one slide, even using this weird terse way of programming where I have like a for loop all on one line and some of these like a bunch of stuff where I'm just cramming things onto one line. So it's more verbose, but does that mean it's harder to program? That's not necessarily true, but you notice that the core part of the algorithm is this little part in the middle. Like this is the equivalent part where it's like where it says sieve out composite numbers. The first part is some kind of thing where we're casting Python integers to regular integers and then checking for errors. There's a lot of error handling. I'm allocating memory here, and then I have to keep track of all the references and construct a Python list from a C array. So why would I bother doing this? The Python version seemed easy enough to write. And the reason is performance. So if you compare the performance of these two, you can see that for some modest number, you get 30 or 40 times speed up when you use the C version, because again, it's much faster. But it has downsides. A lot of the things that I mentioned, the memory has to be managed manually. You have to handle all the reference counting that's usually done by the interpreter itself. And C is not a memory safe language, which means that you can do things like double free something or allocate some memory and then never free it. And in fact, you can also do things like this, which I don't know if you guys are just amazing programmers and you spot the error with this right away, even without the comment, but this is an actual section of code from the original version of this that I didn't notice and it passed all my tests. But what was actually happening was that here, when I went to iterate over the array, I didn't realize that because SIV starts at the number two, its length is actually one minus N, or N minus one, not N. And so I just did this thing that's almost wrote, which is iterate from zero to N. But what happens in C is that you just kind of go one past the array in memory and then you take whatever's there and then you turn it into a Python object and you put it in the list. So my list would be instead of two, three, five, it would be like two, three, five, 82, which is not what we were going for. And that's a problem because with C, you're just really bare bones. You're looking at the raw memory and you're manipulating it. So what's alternative? I'm going to pitch that Rust is an excellent alternative. Rust has a lot to like about it. It is memory safe by default and it tends to use zero cost abstractions for this. So you actually still get high performance without losing your memory safety. It's also tries to enable fearless concurrency, which means that instead of using the GIL or whatever, it just is kind of nitpicky about when things are able to be used for multiple threads and then you can feel safe that if you manage to get something past a compiler, it will work. And something that will really appeal to Pythonistas is that it has a broad community and like a really big open source ecosystem. So when normally you would just pip install something, here you could just add it to your list and when you do your cargo build, it will just pull things from crates and you can just use them. And there's already a ton of crates out there. So I don't have time to get into exactly all the benefits of Rust and how it achieves these things, but I thought I would cover one little topic just to give you a flavor of what it's like to program your Rust. So I figured I would talk about ownership, which is one of the most important concepts in Rust. And this is about again handling resources. So variable bindings in Rust have ownership over the resource that they're bound to. So when I assign V to this vector, when V goes out of scope, we know that because we owned that vector, we can free up all the resources of the vector. And if you assign that variable to something else, then it actually moves the ownership to the new variable. So here's an example where I have this function called take ownership. All it does is it takes some variable, but it will take ownership of that variable. So here, when I assign V to this vector, then I pass it to take ownership. After that, take ownership owns the resources in V, and then at the end of the scope of take ownership, when that V goes out of scope, you free up all those resources, which means that normally this last line would be like a use after free or something. But Rust has this very picky compiler. It's also a very verbose compiler, and it'll tell you exactly what you did wrong because I just explained in all those words what went wrong. But if you just tried to compile this, you would see, hey, this thing was moved from here into here, and then you tried to use it afterwards. So generally speaking, if you can get your Rust stuff to compile, then it's not saying that it's a good program, but it eliminates large classes of bugs like memory safety bugs and some concurrency bugs. I'll also note that there are certain things like dereferencing pointers, raw pointers and stuff that you can do by bypassing these safety things by just putting it in an unsafe block, and you'll see a lot more of that later. And the value of that is that it makes your code a lot more auditable in the sense that you can stop looking for these kinds of bugs in anything that's not an unsafe block, and then you double check or triple check the unsafe blocks. Okay, so far we know how to write Rust programs, but how do we write Python programs using Rust? Here is my Rust implementation of the similar atrocities, and this first part is just pure Rust, and that could be in a crate somewhere, or it could be in your file. I separated it from the part that converts it to Python just to show you kind of what it would look like if you were actually just wrapping a crate that already exists. And so here I have the Rust version, which gives us a VEC of U32s, and to expose it to Python, I just put this little decoration on it, which is a procedural macro called PyFunction, and then what that's gonna do is it's going to transform this function such that it exposes to the C API, or it exposes a C function in an SO that takes an integer and returns a list. And this is how it works. You can see that I can just import it like normal, and it looks just like any other Python function, and you can see from the speed that it is of comparable speed to the C extension. In fact, in this case it's a little faster, but like well within the noise, and I don't think that that generalizes, so I would say it is on the order of magnitude same speed as C. How does it do all of this? Because the C API itself obviously has a lot of unsafe code with a lot of mutable, I mean almost everything in Python under the hood is just some mutable reference, or mutable pointer to some memory somewhere. The way it works is that PyO3 is in two layers. The first layer, the lowest layer is the FFI layer, and here is an excerpt of the daytime bindings, which I actually added to PyO3. And you can see that you have the functions, and you have to recreate bindings to the functions where you recreate exactly what the signatures are. You recreate the exact data structures, so it has to have the exact same data structures, but exposed as a rust struct, and then all the macros we have to implement manually because there's no symbol to bind against. But once that's all done, and all of this is done in the PyO3 crate with a whole bunch of unsafe code, our end users can use it wrapped up in a nice safe layer. So assuming we did all the FFI stuff correct at the lower level, then we now have this safe layer, which has constructors and function calls that are using unsafe code, but are not themselves unsafe, so essentially as long as the stuff in this unsafe block is correct, then you can build on that with safe abstractions. And so here is the implementation of the new constructor for PyDate, and it basically just passes some rust stuff back to, down to the C API. And for each of these sort of safe rust wrappers, we have some constructors, and we have some access traits and various traits. There's a lot to go into there, but I think you get the general gist of we have the safe and the unsafe layer. So when you go to actually use it, that's how it's implemented. This is how you would use it. You would use your PyFunction procedural macro. Here is an implementation of something that just takes some seconds and then gives us a date that was that many seconds ago. So this first block is a Python function, and then I have to expose it in a module. So I create this function which initializes the module and I decorate it with this PyModule procedural macro, and I just add my existing function, and then I can call the function, and it just works. It does, it constructs the date time, but you may note that the date constructor has a valid range and it can throw exceptions, but I don't have any exception handling code here, right? I do actually have this, this part, that says PyResult, right? So that's the return value. It can either be okay or an error. And PyO3 does this nifty little thing where if something can raise an exception, it'll return one of these PyResults, and if you just let that PyResult bubble up to the Python layer, it will automatically turn that into an exception with a traceback. You can also make classes. So you take a struct and then you call it a class, and then you can implement whatever methods you want, including certain special methods like new. Here I have implemented something that is just a point and you can take the normal of the function. I add it to this module that I call classy, and then when I construct this point, I can look at it, I can calculate the norm of it. Well, you'll notice that X and Y, well, maybe I haven't shown you this, but X and Y are not directly accessible on that. If I want them to be accessible on that at the Python layer, I need to do a little more code that will take those 32-bit integers and translate them into Python integers. Okay, so that is the PyO3 approach, the API approach, and that, generally speaking, is gonna be a much more all-inclusive experience where it's really built to work for Python. There's another approach that you can use, which is to write CFFI bindings. And FFI, I forgot to mention, stands for Foreign Function Interface, I think so. It's for exposing functions to other languages. So in this case, usually, most languages, they speak C, right? Under the hood, either they're written in C or they can handle C memory structures and they can handle C function pointers because so much at the lower levels, it uses C as a lingua franca of programming. So you can use this approach. So what this approach does is it says, all right, we're gonna take our Rust function and we're going to expose it in such a way that you can use it if you can use a C and equivalent C function. So here I have this Rust function and then I have this scary looking function that is unsafe, extern C and it returns this mutable pointer. This is like a super unsafe function. And in fact, this little mem forget, what this says is like, actually just like forget that we ever owned this S vector. So what's happening is I pass it a vector and then I just say like, I'll get all this memory as a vector and then stop paying any attention to it. And then I'm going to expose that to whoever calls the C equivalent of this function. And this is going to be a problem because nothing except for Rust will properly know how to deallocate the vector that I have. So now I also have to pass it some similarly, some similar C function that deallocates vectors. You give it some memory and then it deallocates the vector. So this is obviously like a little bit more complicated in this sense, but it does bring one big advantage which is that JavaScript, Ruby, Python, a lot of other functions, a lot of other programming languages are already optimized to take advantages. They have bindings for generic CFFI libraries. So what that means is that you're going to be writing your low level Rust library once and you can have bindings to it in all kinds of other places. On the Python side, the way that you can talk to this there's a library called CFFI and CFFI allows you to work with CFFI interfaces. And then there's also this library called Milk Snake which comes from Sentry. And what that'll do is at setup time it'll generate a bunch of Python code that wraps CFFI for you. And it'll generate this FFI and lib library and it'll basically just generate a little thin wrapper in Python but you still have to do things like here what I'm doing is I'm converting this to a list and then I'm deallocating the vector. And I'm doing this on the Python side, not on the Rust side. So this makes your Python APIs a little bit more complicated but there's nothing saying that you can't just write this code for your users. And you can see if we compare this to our other implementations it's on the order of magnitude of the same speed. In this case it's a little bit slower sometimes it'll be a little bit faster. But generally speaking there's not too much more to say about this approach because it's not everything in the kitchen sink it's a very bare bones approach but it is very versatile. Okay, so far I've talked about the ways that we can use Rust but I don't know that I really made the case for why we should use Rust. I know that I've made the case for why we shouldn't use C but that's not all the other options. Probably the best contender with all this is Scython. So what Scython is is that it allows you to write this Python like language which is some super set of Python and it will compile that down to either C or C++ behind the scenes as part of your build process and it'll create a C function for you. It has a lot of the same advantages of Rust in the sense that it has memory safety mostly guaranteed by the Python interpreter. You can do memory unsafe things with this but you and there's no unsafe blocks or anything but for the most part if you write things that look like Python they'll probably be memory safe. And it'll generate code that is pretty fast. So in this case I've written something that is a little bit slower than the C extension I've written it in C++ but it's still 10 times faster than your regular Python. So what should we choose? I mean we have Scython, we have two different kinds of Rust. We can just use pure Python. So just for this little function that I wrote I have this little chart of speeds. Most of this slide I'm going to talk about how not to use this chart. So you can look at this chart and say oh well this stuff from here on out from Scython over is on the order of magnitude about the same speed. In my experience Scython is a little bit slower than these other things but it has a much nicer interface. You can write something that looks like Python. But also you should note that I didn't go out and choose some set of functions that are a perfectly representative benchmark of all the things that you would do. I went out and I picked a function that I thought would illustrate some of the difficulties of using Milksnake and using PyO3. So I would recommend just writing, especially if you just want to wrap a Rust library, I would recommend just picking a couple things that you want to benchmark and trying out a couple different approaches if it's really important. But honestly you could pick any of these and it wouldn't make much difference. So let's say that you do want to use Rust. Which between the CFFI approach and the API approach what are the different downsides and upsides? So the CFFI approach as I mentioned is more portable. It also has a smaller Rust dependency in the sense that it doesn't have to compile this huge set of safe wrappers for the entire Python library. It just compiles in whatever it needs. So the binaries are going to be a little bit smaller if you care about that. And then also PyPy the alternate implementation of Python I guess can do better optimizations when it's looking at CFFI than when you're using the CAPI. So you'll probably get better performance in PyPy if you're using CFFI. The downside is that it pulls in a runtime dependency on both Milksnake and CFFI. So you're gonna have to be pulling in third party code for that at runtime, not at compile time. It also has no support for Python specific types like lists, date, time, tuple. You have to just write your own wrappers for all of that. And you have to manage your memory in Python. Also, I'm not crazy about the fact that your public interface that you have to maintain is all unsafe Rust. So I prefer to hide that away. But again, if these other pros outweigh it, I think it's still a great approach. And then on the API side, you're using safe Rust for almost all the code that you're writing. It has no runtime dependencies. It has native support for all the Python specific types. It's actually easy to call back into Python from Rust. And it also manages the gil and reference counts and stuff for you. The biggest downsides are to do with its stability. It's still a somewhat immature library. So it's somewhat buggy. It hasn't been, I don't know what happened there. It hasn't been optimized for speed. And it requires nightly Rust. Also, the API is still changing a little bit. But I recommend just jumping in and getting involved because this can also be cast as an upside in the sense that you, as an early user of PyO3, can probably influence the direction that it evolves in. Okay, so I think both the Milksnake approach and the PyO3 approach are early enough in the game that there are lots of opportunities for improvement. Which is to say it's kind of hard to even make a choice for the long term at this point because both of them I think improve a lot. So for the CFFI or Milksnake approach, a lot of those problems that I talked about about how difficult it is to wrap these things and how much you have to manage the memory, I feel like those could probably be fixed by taking some of PyO3's approach and writing procedural macros which will automatically generate this kind of code from macro. So that would be a library level implementation. And it doesn't have to go into Milksnake. This could just be a new crate that you write. And then you could similarly have an equivalent function in Python that says, given that you're using that procedural macro, we'll just import this function that will convert things correctly to the right Python types. And then you can get a pretty similar experience in using the CFFI approach that you would get with PyO3 except without any of the container specific stuff. And then for PyO3, I think the biggest thing that you could do to help improve PyO3 would be to just contribute. It's a super active project relative to how active open source projects usually are. This is now a somewhat dated screenshot because I didn't want to make a screenshot right before I started this talk. But I think it did have commits merged as of early today or late yesterday. So it's actively being merged. I am not a committer on this library, but I would be happy to review any pull requests that you want to make because I think that both of these approaches could really turn out to be something special in the long run and it could allow us to use more Rust and more Python because I think there's a lot of synergies between... Did I just say the word synergies out loud? This is really going off the rails, people. All right, in any case, I think Rust and Python are sort of natural good friends and I'd like to see us foster that a little bit. All right, so that's the end of my talk. It looks like I do have a couple of minutes for questions. Right? Thanks for your talk. I just wanted to ask a small question. I checked this thing out and I think it is an amazing project also, but also you can kind of think of a different approach. This is not like getting all the C stuff and trying to put a nice layer on top of it. I just wanted to ask you if you have any thought about the Rust Python project, for example, which is intended to write the whole C Python in Rust. Well, so Rust Python is an interpreter written in Rust. So in some sense, it's actually a completely different beast from these things. These are, to the extent that they can be, interpreter independent. So something like Rust Python, it may be easier to write Rust extensions for targeting Rust Python. And if they continue, and I really hope they do, because I think that project is great. How many times when I've been working on C Python code have I been like, oh, I wish that I was writing this in Rust? But yeah, I think that they're sort of orthogonal and both can work together. And it may actually be easy to just add more stuff to PyO3 and or Milksnake to say, hey, if you're using Rust Python as your interpreter, you can take these shortcuts. You don't have to go through this whole C layer. Okay, thanks. Hey, great talk. So you said one of the cons of PyO3 is that it requires the Rust nightly. My experience using Rust, it seems like a lot of stuff in the ecosystem requires the Rust nightly. So is that, how big of a deal is that? I don't know. One of the problems is that I don't write a huge number of Rust applications. So from my perspective, it's never been any different. Like Rust nightlies do not feel unstable. And I think if you're using Rust at all, you may find it acceptable to use Rust nightly. But I have heard a couple of places that are using Rust regularly in production. I think they're a little uncomfortable with it. They prefer to use a stable built. So it's just something that I've heard people worried about. They say I would use PyO3 if it were using stable Rust. But Rust nightly seems fairly stable to me. They're super responsive. Thank you. Thanks for the talk. One question, how much business value do you see in it? By optimizing, for example, this generator of prime numbers, what do you think is the scope in the real world application which you could optimize? Yeah, so, yeah, obviously, the civil varitastanese thing is not, it's a toy problem. It's a project, it's an algorithm that I could fit on the screen in three different languages. So, but I think it's, there are two main ways that I can see this being useful. One is that people are writing things in Rust. And this is probably the bigger one. If you're already writing, say, like cryptography libraries in Rust, hashing libraries in Rust, JSON parsers in Rust, like those are going to be increasingly used as back ends for other languages things. And this is the way to get that exposed in Python. Something that's already in Rust, and then you just want to use the good implementation in Python. And then the other side of it is, if you, in the same way that, usually you just use NumPy, if you're doing some big number crunching things, but occasionally you have some very low level bit twiddling that you want to do in a very tight loop. If you can move that tight loop into Rust, then that will be helpful. But generally the kinds of things that go into hot loops are common enough or admit a sufficiently good abstraction that someone will just write a general library for that. So you may be right, but you may not be writing too much, like Rust optimized hot loops in your own code. Thanks. Thank you for your talk. Are we at time? Should we just do this? I'll be in the hall afterwards. Okay. Sorry. We have to go to the next speaker. So if you have any more questions, please do it after the talk. Thanks. Okay.