 Thank you very much. Welcome to Python Interoperability, subtitled Building a Python First Petabyte-Scaled Database. So my name is William Jiltry. I've been a C++ developer for more than 20 years. And for more than half of that time, I've been a member of the C++ Standardization Committee. I've kind of worked all around in different areas of finance. I worked inside and exchange, putting data out. I've worked in risk analytics, on the buy side in hedge fund, on what's called the sell side in investment banking. And over that time, I've worked on a whole range of different proprietary time series databases. And I'm now the lead architect on a project called ArcticDB, which is a Python first data frame and is subject of an open source collaboration between Man Group and Bloomberg. And our aim really is to be nothing less than the fastest and the most intuitive way to work with really big data frame and time series data from Python. So the image in the background, recently my son who was 13 was playing around with stable diffusion and was training laureates on images of forests, but particularly on images of forests and mycorrhizal networks that live underneath the soil. These are the kind of fungal networks that the trees use to sort of send chemicals to each other. They share nutrients. Arguably, they also communicate with each other and alert and things like that. The science on that is not entirely settled, which wasn't entirely accidental because my father had just visited and besides being an all-round cool guy, he is also biochemist and he was one of the authors of the first paper that demonstrated that plants do, in fact, sort of exchange nutrients and information using these fungal networks. And it seemed to me that this is a really great analogy for the way that the Python world works in that we live in this sort of Python forest and there's all these beautiful environmental objects that you can see, but there's also this really kind of fascinating thing going on underneath the ground and that is the kind of C API and all of the C-based functionality, you know, that kind of sustains and enhances that. And the interesting thing about Python is that it was entirely deliberate and intentional. It's not an accidental feature. The very first Python release notes, it says the Python interpreter is easily extended with new functions and data types implemented in C. And that was true then and it remains true today. So you might think that, you know, that was a natural thing to do. C is kind of ubiquitous and surely, you know, it's just the obvious thing for a language to do. But actually that's not the case. And if you look at, for example, Ruby, which is a language I've also programmed in and I must say enjoyed programming in and you go and search the documentation. I won't read all of this out, but certain words are actually highlighted and others kind of jump out. Huge, undocumented, unintuitive, clunky. You know, whereas a Python API for C is quite well contained really, it's extremely well documented. You know, I think it's relatively intuitive. It's maybe only a tiny bit clunky. Beyond that, what I want to kind of talk about and my area of expertise and the thing that I would like to encourage you to do, given my background, is to interact with that via C++. And you know, you might ask yourself in 2023, why would I do that? You know, there are clearly other languages around. They have borrow checkers and better haircuts. You know, C++ is kind of sitting there looking a little bit tired. Maybe its genes don't fit so well anymore. And it certainly has had this sort of checkered history. So standardized in 1998, for a long time it was really used as a slightly sort of enlarged object-oriented language in industry. You know, not so much C with classes, more kind of Java with segmentation faults. And then in the last decade or so, suddenly you have this flurry of activity, some of which is perhaps a little impenetrable from the outside, lots of argument about the type system, lots of new standard library features being implemented, lots of debate about what it can and can't be resolved at compile time. But what I would argue is that in amongst all that, what has happened is that this quite subtle and sophisticated modern language has emerged as a subset. You know, for example, you can do generic programming without explicit templates. You can do lots and lots of the things that you like to do in Python. Like, it's very easy to return tuples of different types. You can bind to them with structured bindings and assign them to different variables. And now with things like ranges, we have zero copy views, you can construct very, very elegant data pipelines that kind of express clearly what they want to do. And they're also very efficient. One thing that we make a lot of use of is pattern matching on types and also being standardized at the moment and available in a library by Michael Park is pattern matching of values. So it's starting to acquire also these kind of hascally type elements that again, you know, they aid you in writing very synced and, you know, sort of almost functional code a long way from the sort of bloated object-oriented C++ of the past. The thing that C++ retains is that it is effortlessly close to the machine. This is a website that will be familiar to most C++ programmers and maybe to some of you as well, done by a man called Matt Godbolt when he was working at the hedge fund DRW. And this is a compiler explorer. So in real time, you can just write some code, you can choose a compiler, you can see exactly which operations it's going to emit. On the same website, there's also microbenchmarking functionality, which is super useful because one of the problems with microbenchmarking is that the compilers are now very clever if they detect that your microbenchmarking doesn't do anything. What they're likely to do is to optimize it completely out of existence. This obviously gives you great performance, but not necessarily the information that you were looking for. The main reason that you might want to do this, specifically in Python interop, is that C++ generic programming is not just present, it's actually pretty awesome. There is really no limit almost to the sensible things that you can do with C++ generic programming. And that's really, really important when you're going from a language that has dynamic typing to a language that has static typing. The risk always, as you end up with this massive boilerplate that says, if Python gives me a UN64, I do this. If it's signed, I do this. The same for all the sizes, all the types floating point. You end up with something that's extremely brittle and extremely difficult to modify efficiently. What you can do with C++ is just push that all onto the compiler. You write generic lambdas. You have some kind of variant typing of all the Python types. You just let the compiler emit the object code for it. And it will optimize that and it'll emit the best possible object code it can. Where you need to categorize types, you can actually now do that at compile time with things like constexpr. And the really nice thing about that is, again, it gives you that kind of Pythonic behavior where if something isn't actually going to be executed, it doesn't actually matter if it makes any sense. In the lower example, if value is an integer, it doesn't matter at all that integer doesn't have a substring function because it will never be called. We know it's compile time, it's never gonna be called, so you can just exclude it. You get smaller object code and you get much more elegant branching over the types. Probably the best selling point for this particular audience. You can also do it in a notebook. Thanks to our friends at Quantstack, there's a beautiful C++ kernel for notebooks called useCling that relies on the Cling interpreter, compiles down to LLVM. Between things like the compiler explorer, microbenchmarking, notebooks full of C++, there's a really actually quite pleasant experience of profiling stuff, playing around with it, go and see how fast it is, look at the instructions that are being produced. It's a very, very good way to optimize code these days. I mean, primarily I want to talk about writing C++ extensions in Python because that's the thing that I know and I think that if you want to write large amounts of code to really shift work from Python, that's probably the best way to do it, but I did want to sort of give a nod to the alternative ways of doing that. The most famous, obviously, is Scython. A lot of you probably know as much if not more than me about Scython anyway. Let's use both as a way to accelerate Python code, also sometimes as a sort of glue between C++ and the world of C-like languages. The other two that I would just mention briefly are Ctypes and Ctypesgen. So these are really for particularly code that you don't own, targeting functions and whole libraries independently and they allow you to access that in a pure Pythonic way without having to add any sort of module code. Obviously what you could do is you could write a module at Rapsize, you could link it and then you could expose it yourself, but personally I like programming tasks that are preferable to drinking bleach rather than going around, so these give you kind of pure Python access to things that you don't have that degree of control over. Another particularly honorable mention goes to, I never quite know how to say it, Cpypy, Cpypy, maybe someone can tell me. This again uses the Kling interpreter and it allows you from Python to write pure inline C++. I think it's a really, really exciting project and something which is well worth looking at. It's certainly a really, really lightweight way if you just have a small function that you want to optimize to get started with that. There's no compilation step involved, it just does it all for you, so that's definitely worth a look. But assuming you do want to build some kind of Python extension, what are the kind of different ways you can go about it? You can use the API, it's actually not terrible, there's a big splurge of code here, please don't feel obliged to read it, but if you do delve in, it makes sense and you kind of override a few things and it's highly possible, but in 2023 we probably can do a little better than that. One of the quite venerable ways of doing this is SWIG, SWIG involves an extra binary step that will be familiar to you if you use things like protocol buffers or the Windows world, things like M-I-D-L. SWIG is perhaps a little bit behind the others in terms of its C++ support for the standard library, but the other, the huge advantage it does have is that it will generate bindings for a whole bunch of languages, you can do Perl, you can do Ars, a whole zoo of different bindings you can generate, so obviously if that's part of your use case, that's pretty good. But then the real, the modern too, and as you can see it's already becoming syntactically much, much nicer, boost Python, and then the project that we actually use, which is based on boost Python, started by a guy called Venzoyakob, a now a major open source initiative and that's called PyBind 11. And really between a lot of the modern ones, I think that syntactically they're all fine, you can obviously get up and running, this is everything you need for Python module that will expose a not particularly useful function to add two numbers together. You compile this and you can just call it directly from Python, that's all you need. The thing that really differentiates in my mind is the marshalling to and from the C++ standard library types, because I'm gonna try and encourage you very much to use a C++ standard library, to use a sort of modern C++, and the thing that you don't want to do if you do this is get lost, end up trying to implement support for variants or things like that. If I have an object that might be a type or nothing, like a non in Python, I just want that to be a standard option on C++ without having to worry about it. If I have got a sum type that might be a whole bunch of types in C++ or non, I want to effortlessly expose that to Python and say any one of these types is fine, anything else is an error and PyBind 11 really does that extremely well. So after probably 25 years of writing C++, what's my opinion on how to do it right? And that fundamentally is to use the standard types. This is a little bit of code from a hardcore C++ exposed to Python database, but we're mostly using relatively common things. We're using shared pointers, we're using vectors, we're using the standard mutex. I think it's very important to steer clear of the temptation to say, I'm writing lower level code now. It's got to be full of like by hand memory management, got to be full of pointers everywhere. I actually once incidentally rewrote something, take out vector iterators, using pointer iterators, using pointers instead over contiguous data, actually got slower, which shouldn't really come as a surprise because it turns out the compiler writers really know what's in the standard library, the standard library authors, they really know how the compilers work. It's generally going to be the best thing. Having said that, there's another aspect of design for performance, and that's really kind of thinking about the high level performance behavior, what Martin Thompson calls mechanical sympathy, and what I've come to call toaster-based design. And that's essentially, the computer's a physical device. It has a memory hierarchy. Fundamentally, it's something that needs to be plugged into the wall. An awareness of how that works, an awareness of the penalties. For us, that often means fast scans happen over contiguous data. The prefetching all does what it wants. The processing unit is kept fed, so you're not having data stalls all the time. But the point is you can do that with vectors. If you want to share it, have a shared pointer to a const vector. That way the compiler knows it's not going to change, you can share it around, you get that excellent performance behavior, but you also don't have segmentation faults left, right, and center. A quick zip over things that are kind of useful to know. Python allocates a huge number of small objects because it tends to box things like integer types. It has its own internal allocator for this reason. It's actually very performant, very good. But you do need to abide by Python's rules when you do that. You need to hold the global interpreter lock in the right thread where you're doing that. And that will come to ways of debugging that. Python objects are reference counted. They use I think intrusive reference counts, which is actually one of my favorite techniques from C and C++, and it's very good. It means that the reference count lives in the object, the things that are shared to just the pointer. You go to the object in order to increment the reference count. This introduces this concept that you'll probably run into, called borrowed versus owned breaths, which is kind of non-obvious. But all it really means is if you're responsible for keeping that object alive, then you need to increment the reference count when you start with it, and you need to decrement the reference count when you're done with it. And some awareness of how the scope rules in the C and C++ works is useful here. Because obviously, if you're in closing function scope and the outer function is guaranteeing the lifetime of the thing, then you probably don't need to bother. You can just borrow that ref, forget all about the reference counting. As soon as you, as we do, move into an asynchronous world and you're capturing stuff that's gonna be executed somewhere else at some later point, then you suddenly need to be very careful that you are actually keeping that alive so that any data you're gonna work on is actually there when and where you need it. What can possibly go wrong? Like I said, memory management errors, reference counting mistakes, also two other things to flag. Python has some global static objects like non. If the reference count of non falls to zero, then Python exits with a SIGABORT. You don't want that. Python 11 is fantastic in its function resolution. It'll allow you to overload functions in C++. It also does two passes, one without type promotion and again with type promotion. The only thing to flag there is it is strings-based, so if you have a huge number of overloads and some of them rely on type promotion, then it can get kinda slow, so it can be better just to give things different names. There's a huge world of tooling for C++ now. I would encourage using all of them, basically writing it without these is a little bit like writing a motorbike without a crash helmet, it's just not a very good idea. The clang sanitizers for addresses, threading and undefined behavior called ASAN, T-SAN and UBSAN. Valgrind, Python, the gossip manages its own memory, has a lot of false positives from Valgrind, which is another memory allocation checker, but if you go to the Python source code, they actually publish their own suppressions file which will tell you which things are just Python doing what it does and which things are you doing, things you ought not to. It's well worth building your own debug version of Python, mostly for bragging rights, it just sounds epic if you could say I built my own debug version of Python with all of the reference checking on, and also because it's actually really, really easy, you grab the code, you set the debug flag, you build it, you've done it on loads of environments, it never seems to have any problems. The one thing to be aware of is it's because of the reference count checking which is one of the most useful features, it's not binary compatible, so if you use NumPy as well, you'll need to build a debug NumPy, but that's also super easy. Static code analysis tools, the Sonar people are here, are you Sonar, I think that's great, there's a load of other really good ones that you can use for free, like CPP check, there's Facebook infer, things get even more complicated if you wanna go out there and spend some money. So, the point is, if you wanna do this, you basically wanna get performance, that's the reason why you would bother fundamentally. First thing to be aware of is, good as these frameworks are, marshalling data from C++ to Python from Python to C++ is not free, these represent the first and second run timings, comparing the raw C++ speed, you can see that it gets better on the second run, but really what you want to do is you wanna take a large volume of data in Python, you wanna pass it over without copying as much as possible, you wanna do a big chunk of work and then you want to pass that back again. On the way in, ideally what you want to use is not copy, but steal pointers, this is a little bit of code from our public GitHub where we're really doing some black magic stuff to steal NumPy pointers, we're getting the strides, the dimensions out, the whole point of this is that if we're writing data, we don't wanna kind of copy it and then pass that to the compressor, we just wanna grab the pointer, pass it direct through the compressor, every stage something is happening, it's being compressed, it's being written to the storage. Same thing on the way out, took me a while to find this really interesting bit of code in NumPy, if you pass an object in from Python, a base object, and you have an array, you can actually manage the lifetime of that just by setting the base object pointer on it, it's completely zero copy, if you don't do that, what you go through is the pyarray new copy method and that can be a little bit problematic because if you've got a million rows and you've got sort of 5,000 columns, that's a gigantic amount of memcopying, could be two or maybe three orders of magnitude difference in the time it takes to do those. The other thing you want to do which is kind of related is where you do need to allocate Python objects, which if you're doing it in Python is gonna be serial, you're gonna have to get the global interpreter locked, you're gonna have to hold it for the duration of allocating those objects in the right thread, you wanna do that as late as possible, you wanna kind of, if you're aggregating, make that result set as small as you possibly can, and then only at the point where you've really kind of got an answer, do you pass that back to Python? This is actually some code where I'm keeping an internal C++ map, a high-performance map of Python pointers so that we can increment their ref counts, managing memory and also increasing the speed at which we can allocate Python strings. Beyond that, if you wanna get into the really magic stuff, you can do Python arena allocation. I wish I could say I invented this, I actually thought of it independently, but then I discovered this really cool project called Quelling Blade, which with a guy with a completely unpronounceable GitHub name, which actually goes a little bit further and it allows you not just to arena allocate PyObjects in C++, it actually allows you to register Python objects as being arena allocatable from within Python. It's a work in progress at the moment, but definitely want to keep an eye on, I have no idea who you are, but hats off. And I guess the main thing, and beyond that is guess work-based optimization almost never works really well. You need to measure, these are Brendan and Greg's flame graphs. You can kind of get the perf input, you can drill down, you can really see what's actually happening. So what have we done with that? ArcticDB is a columnar Python centric database. It stores data frames, it's particularly adaptive for time series. It's really taking all of that knowledge that I've gleaned over those 20 years of working on kind of big server-based databases and it's creating something which is just in the Python process and on a storage which can be disk, it can be in the cloud, you can talk directly to S3. It's extremely fast at doing selections. So this is 100,000 columns, 100,000 rows, 10 billion values. What we want to do is to pluck out three columns, 100 rows each from those 10 billion values. I'm doing this on a pretty rubbish Jupyter Notebook. It has one virtual core, it has eight gig of RAM. I can still do it in around a fifth of a second. Beyond that, it has a lot of features that you might expect from a database. In addition to which, you can rewind, you can keep the whole history of everything you've ever written. It does filtering, you can project columns on the fly with a sort of intuitive pandas like data syntax. You can do group by aggregations, time series aggregations are coming soon, so bucketed aggregations and things like that. The Python data science world is obviously changing. The Apache Arrow standard is becoming a de facto standard for data interchange. It allows you to go to things like Rust to Polars. We're just finishing off our support for Apache Arrow. Also, Pandas 2 is coming along. Pandas 2 has PyArrow support. It's also something that is happening at the moment in Arctic DB. So yes, I hope that I've encouraged you to take a little bit of a look below the surface and maybe have a little play around with it yourselves. If you'd like to have a play around with Arctic DB, it's available. It's source available on GitHub. You can pip install. You can conjure and stall. And I encourage you to give it a look and tell us what you think. Thank you very much. Thank you very much. Thanks for the amazing talk, William. He may have stole some ordnance for CPPCon. We have some time for Q&A. So please use the mic if you want to answer. Hi, just vaguely related. I haven't written CC++ since high school. I wanted to ask maybe for advice. If I want to go the other way around, what's the simplest way to embed Python in C++ code like without hassle? Yeah, it's interesting. Actually, in the same original release notes that Guido wrote, he said that that was also the intention. Actually, PyBind 11 will allow you to do that. So you can embed the Python interpreter in CC++ if you want to do that. Yeah, it provides both functions, generally one or the other. Both is at the same time, it gets kind of messy, but it's easily done. Thank you. And I can't walk. Everybody looking at me walking in here. Thanks for the talk. One of the advantages, I guess, of using C or C++ in Python is that it's much faster. But how does it work when it's like you show examples of pure CC++ code embedded in the Python code? I guess it needs to be compiled. Does it compile every time you run it? Does it have something more clever? Because obviously, that will slow it down. It doesn't get all the advantages. Yeah, so one of the slides was actually comparing if I flip back, if I can find it. This compares the first and second run times. The third line is CPPYY. So you can see that there is actually a kind of an upfront cost. And then that's amortized to some extent. But I think it's really going to depend somewhat on how complex the code you're compiling is. I'm not completely familiar with the CPPYY in terms, but I think there is some degree of caching there. Can you pre-compile it? I don't know. I would have to honestly say I don't think so. I think the just-in-time compilation is kind of the point. And if you wanted to go to pre-compile, then you would just use a C++ compiler. Thank you. Hi. You mentioned weird language with borrow checkers early in your talk. And this was all about interoperability with C++. So have you any experience with this weird language with borrow checker and integrating it into Python? Or what is your opinion on that? Sorry, can you repeat the question? Well, I'll stop being so ubiquitous. You mentioned Rust. Yes, Rust. And that it is not so easily interoperable. So have you tried it? I've tried it a little bit. I think it's improving. I think, I guess, I still think that there is a place for C++. I think particularly in the maturity of the generics, I think that Rust is a very elegant language. My feeling is that the thing that I like about C++ is that the lifetime of objects in memory is sort of terminus with their lifetime within the application. And so yes, it's true that if you don't get the kind of stuff correct in C++, you tend to get segmentation faults and whatever. But actually, those are quite well-handled problems. And whilst ensuring memory safety is a brilliant thing, it doesn't necessarily guarantee that your program, as I'm sure you all know, just because you have memory safety, it doesn't mean that what you're doing is going to be correct. It's still possible to kind of get it wrong. And sometimes having those things kind of co-extensive makes it easier because very easy to check memory lifetimes and things like that. Yeah, I'm sure that Rust is going to grow as a language. And I'm sure that the Python interop is going to develop. It's not bad at the moment. The other thing is that C++ is literally a superset of C. And Cpython is written in C. So there's this ease of working with a C API that is hard to rival, I think. Yeah, that's pretty much in my view. Thank you. Thank you, William. We're just out of time now. So if you have any more questions, you can ask the speaker later outside. And thanks again. Thank you very much. Thank you.