 Hello everyone, my name is Ivan and I'm a quant at a trading firm called Susquehanna in Dublin Ireland and actually surprised to see so many people here given that there is a competing talk on Scythona right now in the main room. So thanks thanks for coming and today I'd like to talk about C++ which may seem like a weird choice of topic for a Python conference but I think it's pretty relevant and interesting and that there's been a few good talks this week on Cpython and CFFI and PyPy so I think it fits if it's in quite nicely and I hope you'll enjoy it. So let's talk about Python extensions. So what exactly is a Python extension model and by Python I really mean Cpython here. This Python extension model is something that you can import from Python but it's not written in Python and that normally means it's written on C or C++ because Cpython provides a CAPI. These days you can also write extension models in other languages like Rust or Go. So why would you why would you do that? The first like the main reason usually is to be able to interface with other libraries so maybe you want to use some part of TensorFlow that's not been wrapped in Python yet so you can do it this way or you can maybe you write your code in C++ and you want to interface with that. Another thing is pretty common reason is writing the performance critical code and mirror code. It does number crunching very fast and then you can expose it to Python. And the two less obvious reasons that are really important the I found really important in my own work are these. So if you if you have non-Python libraries like written in C++ for example you can mirror the API in Python literally one to one and this lets you prototype things very fast like in Jupyter notebook for example and then you can just translate it back to C++ just add the semicolons basically and change the for loop syntax. And if you do that you can also run the test in Python. It's not that you can't do that in C++ it's just it's a lot easier. There's a very nice test frameworks and if you're testing in mirror code it's very nice to be able to use Numpy and Pandas and all the other tools. And it the last two points actually play well together like you can you can start prototyping things and then write tests in Python and then translate things back to C++ but your tests just stay the same so which you know confirms that you did right. So it's possible to write Python extensions in pure C and there's been quite a few talks on this as well but if you're going this way you need to have a few skills like you have to be good at ref counting and I don't think anyone's good at ref counting if your name is not Larry Hastings and there's exception handling that you have to do manually and by that I really mean like C-style error handling. You have to you have to be able to type fast and have a good keyboard because you'll type a lot and you'll make some errors and if you don't make some errors then they'll be the Python core devs will change the API and make them for you. So that's that's quite a lot of pain to go through. So here's an idea let's if we can translate Python into Python API calls why don't we instead of running that why don't we just translate it back to C and then we can augment Python with this fancy a new syntax so we get like fun pointers and the references and all that and it kind of works and and there is libraries that use that heavily like scikit-learn and pandas most American intensive routines are actually written in Python but there's a few problems with that as well. So first you're not writing C and you're not writing Python and that's actually at times it's really hard to figure what is it you're writing and a as I've just checked a few days back two line sites model generates 2000 lines of C so it's it's quite a lot. You have multiple build steps the ID is usually choked on that it has limited C++ support so it's like it's like stuck in 2003 it has a few new features supported by most of them are not and it's limited support for metaprogram engineering types you have to create stubs for everything that you use from C and so I think it's good for wrapping a few functions again like kind of like pandas does it it's not so good for managing huge codebase but my biggest ripe really is this it there's just debugging compile-size extension is just complete pain and I just want to illustrate that real quick so here we have a function that does nothing but it does this nothing at times right so I'm pretty sure if you pass it to PyPy we just not generate any code at all so if we run scython on that then nothing good happens so this is the code for just one line like 4i in range of n where we told scython that n is an integer so like what could I be and so you would expect like a C for loop I'm not gonna zoom the same there's nothing interesting in there is just a bunch of scython C calls freely and so what's what's wrong turns out that I forgot to tell something that it's I is an integer for some reason had to do that and then it generates it's still not nothing it's something and it's looks pretty bad so if you were to but you can actually see the for loop but if you were to debug this in like steps written GDB it would be just a complete pain so here's another idea let's use boost and boost is just a humongous C++ library that does everything so if you know if if C++ could make coffee that would be boost coffee library for it and there's a boost Python written by Dave Abraham's who's also the author of boost MPL metaprogramming library and the problem with that is that you have to build it depending on your platform that may not be easy it requires it also requires boost which is the last time I checked towards a million and a half lines of headers and uses weird tools for building and then it requires on this because boost is compatible with everything like all old compilers it doesn't use the new language features so it uses its own metaprogramming library so it's it takes a very long time to compile and you end up with huge banners and it doesn't attract new contributors because it's really hard actually have to know the entire boost to do anything with it and just a disclaimer this is not this is not a talk on like bashing Scythona boost they're both really good options if like if you're already using boost then boost Python maybe like real good choice or Scythona is a really good choice if you're just wrapping a few functions and so I just wanted to introduce another library yet another library that's it's sort of like boost Python but it does things in more lightweight fashion so it allows you to interact with Python interpreter or embed Python interpreter in C++ code so it's header only no dependencies doesn't require any build tools it has it's very small it's like 5,000 lines core code base it's optimized binary size compile time so we've seen one of the big projects converted from boost Python to bybind went down by factor of five and both binary size and compile time so that was quite big we support just C C lang visual studio Intel compiler Linux Windows micro s C Python 2 2 3 and pi pi and yes I said really said pi pi it we require C++ 11 but some new features from C++ 14 and C++ 17 all supported there is support for non pi without having to actually locate and include non pi headers there's support for embedding the interpreter and there's a whole bunch of different functions and like features of C++ that we support and here's a link to the to the GitHub repo so I'll just try to walk to walk you through that using a few examples there's not enough time to cover all of it but I hope you'll have a good understanding of how it works so we'll start with a simple hello world example that's what you normally do when you learn like a new language or framework and a few prerequisites you need like a C Python or or modern pi pi and you have you need pi binds lemon package installed and some non ancient compiler and in all in all code examples I will skip this this like three lines it's basically including the pi bind header then aliasing it to the the namespace to just pi and also defining the pi bind module which is like the main extension module okay so whenever there's a variable m it just means the module that you're currently defining all right so let's write a function that adds to interest like in C or C++ so your way of a function add that takes a B returns a plus B and we can call the def on def method on the module and tell it hey here's the function called add give it's pointed to the function and you can give it a dog string and a whole bunch of other things so it knows how to generate Python signature that's pretty much it you compile it it works you don't have to tell it what you know what what's the exact signatures it's all inferred from from the type so it's using the C++ like the modern C++ type inference features or you can even write it like this you don't have to define any functions at all we can just use lambda a C++ lambda so just tell it here's an anonymous function takes a and B returns a plus B and this also works so it's like a one-liner basically if you compile it and we'll get that in a second it just works like a normal Python model in fact it generates the dog strings and the signatures we see there's two int arguments returns an int you know you give it to one one and two you get three back if you give it some non integers it tells you that the signature is not compatible so it does the type checking so how do you how do you compile it well there's a few ways if you're if you're a happy owner of a Linux box then you can just you know tell it's where the included are and that's pretty much it you don't have to link it to Python or anything so that's literally the entire line to compile it you can do the same thing on my class you just have to add like one more flag so it doesn't complain windows I heard it's possible but I haven't tried it's probably not fun however there's there's better ways so you can actually integrate it in setup tools and these two tools just tell it how to how to find pipeline headers and that's pretty much it and you have to tell it that you have to compile it in C++ in C++ 11 or 14 mode that's it then set up tools will take it from there there's another thing so I've actually built it specifically for a this tool specifically for doing this conference and just for myself but it turned out to be pretty useful so that's a Jupyter notebook extension that you can just load in Jupyter notebook and it does all the bootstrapping and it will compile your model and actually cache it it will enable C++ syntax highlighting in the Jupyter it will forward input output streams from C++ and that's literally the entire cell for for this you hit enter it compiles it it imports it back and you can just use your functions from it so this one is not released on PyPy yet but I hope to do it this weekend so we can find it on GitHub finally if you use CMake it's just the one line that we provide a CMake interface as well you see just create a module my add with these sources that's that's it all right so how do we go about wrapping classes because that's what C++ is supposedly all about and so let's create simple bindings for an HTTP class response class in Python and this is this is sort of something like you would get from requests library in Python like it's a status and reason and text so for example in this class you can create it with just status and reason or it can also pass the optional text which defaults time to string or there is a default constructor which just initialized it with 200 and okay which is browser response for everything went fine so this is this is quite simple and we want to mirror this like one-to-one in Python so we have the same API so first of all we bind the type itself so we tell PyBind hey here's a type response please create a type called response string because there's no reflection in C++ we have to give it the name like this a string so this registers the type and after this line you can use it anywhere in any function signatures you can return it you can take it as an argument you can nest it within other types so let's bind the constructors so in Python in C++ you can have multiple constructors with different signatures different overloads in Python you have only one in it so what we do here is this is a shortcut for the cut the pi in it is a shortcut it's a template to which you give the signature the input arguments to your overload and then it creates this overload in Python so in fact in Python all these three constructors will be merged into one in it and it will be it will do a runtime dispatch so obviously static dispatch is not possible because Python is not compiled but it will it will look like one function and in the doc string I would say that it has three different signatures okay the attributes you can you can bind the attributes directly and to here in what it does it creates descriptors and on the Python side they could be read only like get descriptors or get set so read write descriptors so we can bind this directly as well and in C++ we have stud string and int and in Python you will have like str type like unicode type or a int as well you can if you have a property so as in in C++ a method that doesn't take any arguments so like here for example it's okay property that tells it if your status is not an error you can bind it as a property in Python read only or you can actually bind to read write properties as well with a getter and setter and you can overload operators so like in C++ there's an equals operator that checks that all it feels the same you can do exactly the same thing in Python so you define the double under a queue takes a self and the other and returns you know self equals other and because it's it's such a common pattern there's actually a shortcut so you can just include this by bind operators header and say def by self equals self where this equals could be any operator it could be like left shift equals or anything like that this makes it very easy to bind objects that may have like 20 different operators like matrix or vector type or something like that you can define any method like so here we can define a wrapper so this type has a proper representation and yet you don't like you were doing Python so we just create a string and format it return it back okay and this is this is the full binding code and it's it's not so much really so it's comparable to it's less than the initial implementation and if you were to do this in site and they would be kind of the same and maybe more because I think doesn't have proper overloads and you can use it like like any normal you know Python class really so if you import the type you have your properties you have the attributes you have dog strings for everything so so the you know the operator works so it's all works as expected a few other things and function signatures in Python you can have all this like args and default values star args keyword arguments so it's all it's all doable here as well so first of all you can name arguments through pi arg because C++ doesn't have proper reflection you need to give it names you don't have to but if you want your argument to be called name like name in this case you have to tell it that and now if you look at the dog string there's actually it's actually called name so that's nice you can assign to pi arg so here we have the same function but it runs like n times where times is like an optional argument and defaults to one and this would this would work as expected then you can call this function with one argument or two or provided as a keyword arguments and you can do other things like you can take any Python objects arguments you can take a pie list which is like wrapper but around pi objects and you can so here for example we count all the strings in the list and if you were to do this in Python it would literally look the same like length line so you'd like for item in list if is instant string item increase and then return and back so that looks very close and it works as expected you can take star args and keyword arguments as well through like pi args and pi kw args so as I already said there's function you can bind multiple C++ functions to a single Python name and then they would work as overloads so it will do runtime dispatch on the types so here we have a function that takes an int of float and you can pass an int or it can pass a float and it goes it gets dispatched to two different functions and if you give it something else it tells you it's an error so that's that's pretty handy there's a bunch of other things that I would just like to quickly jump through so there's three ways to communicate objects between C++ and Python so the first one is you have a you created something in C++ you wrap it in pi object and then send it off to Python and you just sort of store the pointer inside pi object and then we record it in like registered instance map so if it ever comes back to C++ we know that we were the ones who created it we quickly unwrap it and it's it's very fast another one is the opposite where would it's native in Python but it's a wrapper in C++ so it's like pyList, pyDict, pyInt, pyString and it's on C++ site we have a thin wrapper around pyObject that way around the third one is like in the examples that I've just shown so what you have stud string in C++ and str in Python now these even an int really because in Python int is objects as well they have different memory layouts so you can't really can't really share them so this would always involve a copy but there's ways to to tell it to to work around that if you need to share a vector or map for example so some of the types that we support that have like built-in libraries or the scaler strings to both sequences maps, dicts, sets, polymorphic functions like daytime functions the new word types from C++ 14, C++ 17 like std optional, std variant you can also write your own typecasters it's fairly easy so you can write a for example like a timestamp type that would work as an int in C++ once you send it to Python it works as pandas timestamp for example so that could be that could be quite handy a few more things in classes I will not go into this in detail because it will involve you know quite a lot of code but you can do single multiple inheritance you can override C++ virtual methods from Python and it would work which requires like a middleware class to do that you can have custom constructors so it's you're not limited to this pioneer shortcuts you can do anything there just like in Python you can define implicit conversions so if the types are convertible implicitly to each other in C++ you can make its own Python works the same way so the function that expects a instance of one type can also take another okay you can overload operators and you can also define static methods properties attributes in all of that so there's also the Python interface so it's like everything that starts with you know pi double colon like a pi list of or pi dict and we try to wrap quite quite a lot of it so it starts with and by object which is the highest level object in the hierarchy and by handle so it could be with or without ref counting there's all built-in types like by module and function and list and int you can cast things back and forth from C++ and Python using this cast operator or cast method you can call Python functions through just using parenthesis normally and I think this is a pretty cool example where we have a tuple of orcs and then we have two dicks of two dicks and and a function called like engage and then we call that and we expand both tuple and two dicks and we actually pass one other keyword arguments exactly like you were doing Python sort of and this is you know this is still C++ is just heavily overloaded it looks pretty cool I think you can import modules you can there's a bunch of built-ins that been wrapped like print and format and length is instance and all of that you can run arbitrary Python code as a string if you want to do that in fact we have to resort to doing that on pipi to make a few things compatible because you know if they don't have an equivalent for for some C Python call we have to do that you can run you can evaluate Python files as well one of the big parts is support for buffer protocol Python so you can so you can interact with numeric code so you can wrap any type any of custom types to support the buffer protocol and then you can then for example numpy would automatically pick it up like you can just pass it into a numpy or a constructor and it will know what to do you can build buffers and memory views directly well to support numpy so if you have numpy installed you have to include a like pybind numpy header but you don't have to go and start like locating numpy itself so we'll figure it out and there's there's a few types like PyArray which is untyped array and PyArrayT which is a template around for a typed array there's things like there's a lot of functionality but a few things I'll mention was like there would be bounds checked and bounds unchecked element access and a fast access to array properties like shape number dimensions dtype all of that that would be through numpy C API there's support for registering structured numpy dtypes and if you've never heard what that is it's kind of like pandas but in numpy like pandas data frames but in numpy and that was my own contribution to pybind there is automatic function vectorization and broadcasting so you can write scalar functions and then just wrap them so they work on any numpy arrays of any shapes and that's pretty handy we also support eigen if you know what that is that's the it's a numeric C++ library that's quite popular in some scientific circles and a few other things that don't fit anywhere else and like different return value policies you can tell so for example if you're returning a reference or a pointer you can tell pybind that it's actually pointer to an internal member so it knows how to create weak references and garbage collected you can tell you can also tell pybind to keep one object alive while another is alive so like if you're iterating over a C++ container and you don't want it to die well you know before you're done so that kind of thing there's automatic translation of C++ exceptions to Python exceptions you can also register your own translators sort of like you can do with Python you can have custom holder types and we support the default smart pointers like unique pointer and shared pointer and one last thing I wanted to mention here would be that pybind does have a runtime of sorts but it's it's pretty fast so the way it works it has a capsule so that's like a C Python term for a block of shared memory like within the interpreter so when you import a pybind's module it looks for an existing capsule pybind capsule and if it doesn't exist it creates one and then as you import other pybind's modules they look for the same capsule and they sort of find it and they share the same map of registered types and registered instances so that that's kind of how it works and last two sections I wanted to be like show a few examples of what you can do what's possible with this and one is on callback so how do you that's quite common thing to do so yeah for example you have a fast web socket library in C++ and it takes like on on events you can you know you can pass it a polymorphic function that would be called each time a master arrives for example and how do you wrap this in Python well the answer is you can use the polymorphic function type the std function in C++ and it will be converted back and forth to a Python function object and this is quite cool because Python function may be actually closure that has a scope captured and C++ function can be a closure that has other C++ stuff captured as well and it works nicely together so for example here we'll have a function for even so you give it an n an integer and you give it a function that takes an int returns nothing that would be called for each even number you know from 0 up to n and you can use it like so you have like a python callback so if you compile that you have python callback that just prints the number and you just pass that directly in and that seems to work you can also do that this kind of stuff you can have a higher order function so you can make use of capturing closures in C++ so for example here's a so interfan is a type is a function that takes an int returns an int so apply n is a function that takes a function and also a number n and applies this function n times although it does it lazily so it returns a function that does that if that makes sense so it's kind of like a decorator of sorts in python so like if f is you know multiplied by 2 and n is equal to 10 it will be like multiplied by 1024 okay and you can know that in the square brackets we have like f and n so this is a C++ 11 notation for we capture f and we capture n by value so this is stored in the in the closure and that's pretty much it's like you can if you compile this we can define a python function and then pass it there what's returned back to us is a C++ closure which has converted to a python function that we can call and it still works like a decorator you can actually go one step further so so the green one the green apply n is the one from the previous example and the blue one is a factor that creates the green one from for given n so it's like if you if you give me n I will give you a decorator that decorates all these functions in such way if that makes sense and you can and just for the fun for the fun of it we can bind it under the same name because we have overloads in C++ so we have two different versions of apply n one takes an int and is like a factory function and another one takes an int and a function and returns a function and we can use them at the same time so so this is the first example where we have a function we say apply n of f and 8 of 10 and then gives us 2560 or we can use it as decorator so we say you know at apply n of 8 so that's a factory returns a decorator we decorate a function and it works the same way as well so I think that's that's pretty cool that's quite a lot of machinery going on here and I'm quite baffled myself that it actually works and last but not least there's like numpy support is was very important for myself and as I talked to Wensel the author talk to Wensel the author of original author of pi binds just a few days back he said that this this talk is not hipster enough if there's no pandas and data frames and umpire so I figured I should should provide one example so here's the full example took me maybe 10 or 15 minutes to to cook it up here we we want to compute rolling stats on on a data frame or like on a series basically of floats so you have a rolling window that if you don't know what that is you have a fixed window size that just moves along through the series and every time it moves it shifts moves by one element you recompute some statistic like mean or or median or variance standard deviation so here we'll just compute mean and standard deviation so and the type would be double right so we have this rolling stats function it takes a pi rate t of double so it's a it would be like a float 64 numpy array and it takes a window and what we do next we just well we use a to make it faster we don't actually recompute it each time the buffer buffer moves we don't recompute it in full we can make use of the fact that to compute the mean you know every time you can if you have the sum of elements in the buffer and you have some of squares then you can infer actually both the mean and the standard deviation and to keep track of the sum and the sum of squares you can just add one element and subtract one one element at a time each time you move through the buffer and it makes it a lot faster than actually trying to reevaluate the whole thing every time I'm not entirely sure what pandas does in so I haven't looked in this rolling API but it's it's it's a little bit slower and so as you can see from the code it's it's not overly involved and it's you know it uses this unchecked proxy access to numpy arrays so we disable the bounds checking because we know we're not gonna you know run over outside the bounds the rest is just like a normal numeric code where there are some computations stored in the stats array one thing to note is that stats is a struct here and what we return back from this function is a py array of this of a structured type so this is known as the record array in numpy or structured array and in our module we have to register it explicitly so we say hey here's a stats type it has stats numpy d type and it has two columns min and std and they will be translated with to python with these names and then we just bind the function so in the way this works if we compile this and try it out we can pass we can pass anything convertible to an umpy array really to this function so this rolling starts we so here I pass a bunch of ints and window of two and you get back a data frame that looks like this so obviously this is the running mean this is the running standard deviation so if you were to use the pandas rolling dot mean or rolling dot std you would get the same result in fact let's check it let's just generate 25 million values and do it both ways and yeah we can check that it's the same and we can also check if it's fast enough so if you run this in pandas for 25 million elements with window size of 1000 it takes 1.1 seconds for it to compute the mean and it takes another 1.18 to compute the standard deviation in our case it takes 0.26 seconds to compute both so it actually does make sense like and if you know if it if it starts taking minutes or sometimes hours to compute this kind of things if you have a lot of data so it may be worthwhile to spend you know 10 minutes and coded up yourself and uh finally I'd like to say thanks to to Wenzel Jacob who is the original author of this project and Jason and Dean who are currently maintaining it and handling all the issues adding a lot of features and a lot of people including myself for contributing all the other stuff also Dave Abraham's for creating boost python and boost MPL and I'd like to thank my work um employer uh says kind of for letting me hack on this at work uh at work time and last but not least uh I'd like to thank you for listening thanks okay we have uh some time for questions if there are any hi son for a talk um could I uh pickle class yeah there's pickling support as well I just didn't mention all the features because it'll take too long thank you yes hi thanks for the talk it looks great um I'm just wondering if you can um if you're always using the heap for allocation or if you can do any fancy allocation say placement or if you're dealing with an array you can always allocate some objects in the heap people can do different forms of allocation um so the heap allocation for what exactly um so say for example when you declare a um a class and you have you have the example of multiple init um types signatures um when you actually instantiate that object in python yeah is it always happening on the heap can you do anything different yeah it currently happens on the heap so it uses the the new operator in in c++ but I think do you mean like the new python malloc uh py malloc api um so we don't we don't explicitly support that but it should be possible yeah thanks uh the library looks really really cool I'm wondering what the state of documentation examples and what's the license of the library um the documentation is pretty good it's it's been it's pretty well maintained I would say it explains a lot more than uh than this talk and it walks you through the examples from you know from really simple ones to the most complicated ones so it's uh what we're also planning to do is is um set up a tutorial notebooks that will so you can run the whole thing in uh trip to notebook as well but uh the documentation is hosted on read the docs so you can find it on our github repo we also um one thing to note is that um there's a few things in this talk there's a few syntax differences in this talk um from the latest stable release so it's actually if you try to compel this with the whatever's on pypy right now it may complain but we'll push a version uh fairly soon that will you know be compatible with this and what's the license of the library the license is it gpi is it mit is it I think it's mit you mentioned that the problems with boost python are long compile times and the the generated shared object size so do you have any numbers on on how pyobind compares to that um I'm sorry numbers on what exactly so the the compile times compile times and the generated code size yeah so the thing is if you if you have a really small module then uh boost the the the extension model generated with boost python would be smaller because boost python has a compiled part like a pre-compiled part right so pybind model would actually be bigger than both saith and boost python if you have like a two-liner once it starts going up um so as I said we have an example of it's it's a py reseta um uh wrapper for a chemical framework uh that was initially written in in boost python and then they the developers tried to convert it to pybind which they did successfully and I think it went down by a factor of like 5.8 uh and 5.7 or something like this both compile time and the binary size and uh in my experience as well so I don't personally use boost python but I did some benchmarks just to see just to see that it's true uh so I assume the global superpower is held when uh c++ code is called when uh from python and uh can I drop the gill yeah yeah again I skipped it here just for the sake of time but you can there's things like scoped you can you can have like a scoped guard for gill release for example yeah that's all there as well any other questions okay great well thanks again thanks a lot