 Thank you all for coming. My name is Raphael Gomez. I work at Octobus. We're a small consulting company specialized on Mercurial. I'm going to talk about how we use Python and Rust together in this big old project. When I say old, it's like an old code base and what were the pain points and how we fixed some of them. So as a recap for people who don't know what Mercurial is, there are few of them. It's a version control system, same generation as Git. It actually was made in the same month of April 2005 by a kernel developer. It's written mostly in Python. It has a decent chunk of C extensions mainly for speed. It handles huge repositories for companies like Facebook and Mozilla, for example, with millions of files and revisions. And it has a very interesting and powerful extension system, which I have very little time to talk about. So maybe I'll have like a few snippets. It's very cool. Check it out on your own. But we're here to talk about Rust and why we chose Rust for Mercurial, because we just said that we have like 40,000 lines of C code in Mercurial, so why switch to Rust, why move. So most of you know what Rust is and why it's pretty good, so I'll keep this short. Basically, it has a better signal to noise ratio for us. You have fewer lines that are completely orthogonal to what you're trying to do and you get in focus on actually making the algorithm that you need to fix the issue. The compile time guarantees are interesting for VCS. Cargo and formatting and the testing framework, all very nice. And the save by default aspect is very reassuring. But I think that's no news for any of you, so I'm going to skip right into the performance aspect. So there was an experiment by Valentin Guétien Barron, who's a developer at Jane Street. And he built a very small subset of the status command in pure Rust. So it's not complete. It does just a little bit of what status does, but it was good enough for their purposes. And as you can see, the performance is just miles better from the reference Python and C implementation. So that sparked a lot of interest in the community. And although the plan to introduce Rust within Mercurial was already put in place maybe a year before that, it really was about around this time that we really started to put it to use and to really dig into it. So there are many different ways of connecting Python and Rust together. I'm not going to go into detail as to why exactly we chose Rust C Python. It compiles on Rust table. All right, that's it. There you go. It's composed of two crates. The first one being very low level crates that just binds to the C Python ABI and does all of the tedious work that you really don't want to be doing by hand. And the higher level crate that is more functional in the sense of exposing a module to Python that just looks like a Python module from Rust, creating classes, functions, that kind of stuff. You have an eval function, which is pretty useful sometimes. So that gives the following structure. So for people who are not using C Python, I'm sorry, but there you go. Pure Python code, of course, talks to its back end. The C extensions also talk to C Python. And from the Rust side, we chose a, well, I should say we just before I came into the project, this structure was chosen of using two separate crates. This one, HG Core, is a self-contained mercurial library with no idea whatsoever that there's a Python somewhere. The idea is to have it self-contained and it should work on its own. And HGC Python is one of the crates that we have, which is way more developed than the others because it's the more common one, but it's one of the possible crates that you could have to bridge the Python code and the Rust, pure Rust code. So I was very excited to start working on this because, you know, you get paid to write Rust, so that's super cool. And an open source Rust. But it was not super convincing at first. The first non-trivial program that I tried to write was about twice as slow as the reference implementation, even though it was written Rust and it was pretty sad. And this is due to a couple of factors. The main one being friction in general. If you're trying to bridge two languages together, you always have friction at the FFI layer. You have a border within the two languages and you have to interface them. They don't work the same way, especially Python and Rust have very different ways of handling memory and thinking about ownership and that kind of stuff. So you pay two prices. You pay the cognitive price, the developer price, I could say, of the complex interface code that you have to write that is basically not... It feels useless because you know that you're just trying to exchange data and that's it, but you have to still write a lot of code to do this. And it's not the main thing that you're trying to do, but you still have to do it. It's an engineering constraint and it's what you have to do. But it's complex and so it takes away some of your budget to have to think about all of those things. And the fact of the matter is exchanging data in general is costly. Just having to allocate memory at all, moving memory around, looping on objects and just in general having the gil, for example, you cannot do those things in parallel because Python has the gil, so if you're trying to create an object in Python you can do some of the stuff outside of the gil. But at the end of the day, if you're trying to communicate with Python, you still have to have the gil, so global interpretor for some people who don't know. So I have an example. On my laptop, if I try to start 100,000 files in Rust in parallel with hot kernel caches, I get about 30 milliseconds of wall time, which is pretty cool. And then I try to give those results to the Python layer in any meaningful way and I get an order of magnitude more on top of what I was trying to do, completely negating the usefulness of doing it in Rust in the first place. So that's pretty frustrating. So we have many possible solutions. One of them is to communicate with C directly with the C layer. There's a thing in the Python standard library called capsules that allow you to share an API of function pointers between C extensions and you can just target the C ABI with Rust and use capsules to communicate with the C layer of Python directly, which is pretty cool. You can change less data. So in general, move up an abstraction layer and maybe give the file name instead of the file contents and that kind of stuff. And in general, just do more in Rust. But to move up those abstraction layers, you need support for features, for those abstractions. So in Rust C Python, we had a few missing features that were quite important. The first one of them was just a set in Python. A set is a very useful collection that appears a lot in the mercurial code base and there was just no way of interacting or creating a set from Rust at that time. So that's what we had to do that. Supports for capsules, so that just made like a PyCapsule macro that would allow you to get and create a capsule. And then a few more hairy ones. The first one being inheritance for classes written in Rust. So you have a PyClass macro in Rust C Python that allows you to create a Python class backed by Rust collections or data in general. And that's pretty cool. But if your Python code then tries to inherit from your Rust backed class, it will just crash and say, no, it's not a valid base type. You cannot inherit from this. And the reason is that the feature was not added because what happens if you forget to call the initializer? What happens if you inherit and you just don't call super? What happens to the memory? It's a complicated program because either you have some very strong runtime invariance or you just don't do it so that what was chosen was to just not do it. So you have to use composition over inheritance, which sounds good in theory, but in practice it's just writing a lot of boilerplate code that adds an indirection. And that method calls in Python are pretty expensive. So if you're trying to shave off milliseconds, that's not really helping. So that's one of the issues that we have. Properties and the set adder. So giving instance attributes to instances of class so they could have attributes and properties. It's a very common pattern in Python. So if you're trying to have a dropping replacement to just change a class, you can't because there's no support yet for properties. And the last one, I want to talk a bit more about it because I figured it's more interesting and it has to do with the terrible joke in the presentation title. Being iterators on Rust collections. And what that means is you have any Rust collection, for example the vector to take a simple one, and you want to share the reference to that object with Python and Rust at the same time to allow for a lower memory overhead, better performance and just in general not moving the entire object through the FFI layer at once. So what that means is that a Python iterator in Rust should behave the same way as a Python iterator would. Which means that it has different rules about what you can and cannot do at runtime. So if you're trying to mutate an iterator, so calling next on a Python iterator, if you already have a reference to it is valid as long as you don't try to read the reference after that. If you then try to read from the iterator it will get a runtime error and just said something moved between the iterations or anything. This is different from what Rust does because Rust would just not allow to have those two references, one mutable, one immutable at the same time in the first place at compile time. So you have to use something called generational poisoning which is just basically keeping a counter of which reference we were part of. So every older reference gets invalidated if you're moving something. So that's one of the ways that you can do it. Of course you have to tell the Rust compiler that it really has to let go of the memory because you're trying to push memory and to share it with something that has no control over which is rarely not good in Rust terms. Which is basically sharing the reference between the two languages. So there was a proof of concept by my colleague George over here in June of last year that I upstreamed about a month later for a small but non-trivial data structure in Rust. And since then it has been upstreamed by another Macural contributor in Rust C Python. So it's something that you can use if you try to bridge the two languages. You can use this system that we've put into place. And it relies on the lifetime extension trick. So it means that you take in the lifetime and you just say this is static and it's fine. Everything is okay. And then you have, it's unsafe of course. It is FFI, it is unsafe, but the invariant to have it not actually unsafe is pretty easy to uphold. The only thing that you have to think about is to not move the reference out of the boroscope. Which is like you don't have to do any manual dropping or something very complicated. You just have to not move it out of the scope. Which is reasonably easy to do in the context that we're working on. Because you have like a Py class so it's pretty hard to move things out of it. I just said it uses generational poisoning. So we did some upstream work as I said. The Py set has been upstream last year. So has the capsule support. This iterator thing is more general than just iterators. It's technically any shareable data. So it can, I actually still have an issue with a Python class that I want to expose multiple parts of as separate Python classes. And that's just super, like there's a ton of boilerplate involved if you're trying to do it. Because you have to always go through the same indirection layer. So maybe that can help. I still have to try. Properties are being worked on. So that's going to happen. So that's our target. That's what we're trying to achieve. But it is the unrealistic target. Because it does not do what it should do. It does not do all of what it should do. So where are we now? I've been working on it for a few months and there was a lot of work that was ungradifying as you saw. But we're starting to make a little grounds. So I have two cases. The first one being a pathological case of something that is in our favor. So it's a particularly bad repository with very terrible performance. And as you can see, the standard status takes about six seconds, which is like super slow. And the new version with some rust takes about one and a half. So it's better. It's definitely not perfect, but it's better. For a more realistic case on any other repositories that I've seen, you get more of a 50% increase in performance, which is definitely nice. Like 50% actually compounds a lot if you have a CI system that calls Mercurial a lot. But it's still so far from the 50 milliseconds that we had in the first place. So what can we do and what can you do in general to make it go faster? It's not that complicated from a high-level perspective. Do more things in parallel. So we all know that building parallel code in Rust is a lot easier than in most other languages, especially in C. And I think there's only, and I wrote the code so I have to remember, but there's only one loop that has been done in parallel, and it's not the most hungry one, as you could say. So there's still the performance numbers that we just had was just optimizing one of the three main loops and not the biggest one. So you still have a lot of performance to gain from that, from doing more things in parallel. Of course, better conditional execution. That just comes from the fact that rewriting a 15-year-old code base, you don't want to do it all at once. It's a very bad idea because you will get new bugs, you will have two different complete implementations with basically just you to maintain it. It's just not a good idea. So you have to do it piece by piece at places that make sense that are very performance critical. So to start, you just do some work that is just better in any way. If it's good enough, you start upstreaming that and then you work your way up. So better conditional execution would be to think about removing some paths in some cases and doing a little bit better. It's absolutely not specific to our case, just in general, try to move more incrementally. We're thinking the order of execution, that has a little bit more to do with Rust because Rust allows us with its very strong type system to define better constraints on what we're able to do than what we could do with Python in the first place. So maybe we can do two loops at the same time that we didn't really know if we were capable of doing in Python because it was just too complicated to do. Of course, just fewer extensions between Python and Rust as I just said, anything going through the FFI layer is just additional time lost. The usual suspects of fewer allocation, memory alignment and bit fields and all that kind of stuff. So right now, I'm expecting to have way better performance without doing that kind of... I mean, I'm not doing completely wild allocations or anything, but you don't have to go particularly deep into performance to gain a lot of it for something very important like IO, for example. But also you could just not start Python. So I know this can be controversial, maybe less in that room than yesterday in the Python room. But Python has a startup time that kind of adds up. A third of the entire run of the test suite that we have, we have about 900 integration tests, a third of that is just starting Python and getting the imports going through. And we have optimized imports in Mercurial, like we've tried to get it to go better. And if you're aiming for 50 millisecond status time, you've just lost because you have to start Python. So I am absolutely not saying that Python is not a good language or that you should not use it, et cetera. I'm just saying that some use cases are better suited for faster languages or compiled languages. For example, we could start embedding Python. So that means that Mercurial could become a Rust executable that embeds Python and that can fast path some commands. So for example, the hdversion command has no reason to take more than one millisecond, for example, it's just like give me the version. And it actually can slow down considerably some CI systems. Even if it just takes 60 milliseconds or something, it can add up. So it's tiny things that make up a lot of difference. And most CI systems don't care that much about the extendability, like using extensions and making custom things. Usually they want the diff, they want a status, they want a log and they want it very fast and they want to clone and that kind of stuff. And it's not operations that are super complicated or user-based, they're based on data structures and them being executed fast. So there's a Pyroxidizer. Who's heard of Pyroxidizer here? A few people. All right, cool. It's something that was done by a major Mercurial contributor. It's a Rust crate that allows you to embed Python applications. It's basically a tool for distribution at first, but you can do it for a little bit more than that. And we have a plan to start distributing Mercurial with Pyroxidizers for packaging reasons at first, but also now I have a Rust entry point. So I can just maybe not start Python in some cases, so that would be a major performance gain, of course. Rust is a great language for writing VCS. I don't think that may be controversial. I don't think that Rust is a great language for any single application. In much the same way that I don't think that Python would be very nice for every single application. But VCS, they like data structure abstractions because a lot of it is just graphs and depend only databases and that kind of stuff that works really well with the trait system, basically. It's been very nice to work with Rust, to have invariance that are compile time instead of just figuring out with Assert whether it's going to explode or not. VCS, they need to be both correct and fast. So you have different constraints than, say, in the video game industry in some aspects. I'm not saying that they don't have to be correct at all, but it's not a huge deal if your game crashes sometimes, but if your VCS crashes and it crashes every second because there's like a million people using it. VCS, they like to do things in parallel because you have hundreds of thousands or millions of files and most of it you just want to repeat the same operation for all of them and then collect the results. So Rust is very good for that. And also VCS, they have to work on bytes. They don't just like to work on bytes. They have to work on bytes because encoding is not always unicode, I'm sorry. And I know that I would really love to just use path from the standard Rust library to use a path, but we can't do that because it's a VCS and people use different encoders than unicodes. So I like Rust and I like it for writing VCS fast stuff. And yeah, thank you. Do we have any questions? Yes. Most of the... Yeah, sorry. The question was did we do any profiling on the startup type of Python, basically to figure out... In general, yes. We do a lot of profiling. One of the reasons why we started to put C and Rust etc. was because the profiler showed up some hot spots that we'd have to help. The startup time of Python is sometimes out of our hands. Most of what we can do about it is to simplify imports and do less and do things lazily. But most of the work was already done a few years ago. So it's just a cost... Because Python is like a fantastic machine to do very, very complicated stuff, so it has to have a cost. So I hope that answers your question. Yes. Have you heard of Mononoke, which is the... Have I heard of Mononoke, which is the Facebook server implementation of Mercure? Yes, and I actually know one of the developers of Mononoke. He's a very nice person who is way better at Rust than I am. And he... It's very interesting, but it's very good for Facebook. I'm not saying that the idea is not good in general. It's a good idea, but it works for Facebook because they have the problems that they have and they're just using PATH because all of their PATHs are UTF-8, so they don't care. So it's very specific and it's cool. And some of the ideas that we can actually talk to them to maybe get them inside Mercure. But yes, I've definitely heard of them. Do I have any... Some more time? Yes, that's one of the issues that we have. So that... Yes. Sorry, I really have to. How do we figure out how to handle extensions that can change commands if we're just not starting Python which holds the entire extension system? Good question because we still have to figure that out. There's a need right now to get HD to go faster in general, so people will give up some extension system for their CI, for example. So I think a FastPass is acceptable as long as it's configurable. And after that, I really want to get extensions to go faster also because repositories have been getting huge and extensions are slow. So that's another point. Should we rewrite the extensions in Rust? How do we interface? Right now. Thank you very much.