 Hello, everyone. So our next speaker is Stefan Bernel, and he will be talking about lifting your speed limits with Saiten. So, hi. Can you hear me well? I think it's okay, right? So, welcome to my talk. This is probably the biggest fall of ever talking, so I'm really happy to be here at Boston. The first time at Boston. My name is Stefan Bernel. I regularly give talks at Python conferences. It's my first non-Python conference in quite a while, in years. So, the talk will be on Saiten, on the Saiten compiler, so Python with a topic. I'll have a quick poll first. How many of you are regular Python users? Okay. Pretty much the majority. How many of you have used Saiten before? Some of you? Okay. How many are in the big data scientific computing, this kind of area, so doing data processing in some way? Okay, cool. That's about a bit less than half of you, I think. Okay, cool. So, I'll get started with my talk. First of all, a bit more background on myself. So, I'm a core developer of Saiten. I've been there since the early beginning, so we started the project in 2007. It's actually as a fork of a different project at the time, which no longer really exists. So, this is kind of the thing you want to use today. So, it's more than 10 years already, so we had a 10th anniversary last year. Quite a while for working on an open source project, but it's still life and kicking, and it's doing very well. I'm giving trainings and consulting on Saiten, so if you have any interest in using it, you know, better than you currently do, or starting using it, please contact me. I can help you with that. But most of the time during the week, I'm actually working for a company called TrustU, and TrustU, this is what we do. So, you can say you're looking for a hotel in Google, and it's going to give you a lot of information about that hotel, including a rating. So, you get something like, this is a four-star hotel, 3.9 out of five, something like that. And if you click around a bit in what they show, they'll eventually tell you that this rating is actually coming from us. So, what we do is, we collect hotel reviews. It's actually written text from throughout the internet, various places from booking portals, so reservation sites, but also sites that collect reviews like Holiday Check or TripAdvisor, for example. So, we get those, collect them, and then do huge amount of data processing on them. Like, really at global scale, so we collect them globally, analyze the data, analyze the actual text that people write, so we do NLP and the language processing, and then the statistics on that. And so, Google's only one of our clients, what we actually do is we sell these statistics and this information back to the hotels. And then we can tell them stuff like, you know, in comparison to the hotel next door, you could improve your ratings by 10% if you keep your rooms cleaner or stuff like that, or if you renovate your pool, for example. So, that's what we do. And this is how we do it, so this is how we show all this data to hotels and how they can use this data to improve their own performance and get feedback in a unified and well-established way. And down there, you can see this is really big data. So, we get something like 3 million new reviews every week, and we have a huge bunch of reviews that we're sitting on that we can do data processing on. So, why do I tell you all this? Well, we do all of this on Python, right? So, why do we use Python? It really works for us. It's a great programming language, really. It's very versatile. It's very pragmatic language. It's concise, it's readable, so it's very nice. And it has a great community, which is very diverse. It's friendly, it's very helpful. Things tend to be well-documented, and if they're not, there's a huge bunch of stuff on Stack Overflow, for example, where you can look stuff up, where you can ask questions. And apart from that, it has an excellent set of libraries and tools that you can use for data processing. So, it's really a great environment and ecosystem that we can build on. It has a great ecosystem for big data processing, NLP, for building web services, for automation, for the data flow processing that we do for testing, for all that. And many of these tools are very well integrated, which is also cool. So, it's really not just that you have one great tool and another one there. It's a great ecosystem that we can work on. Why is that the case? Because all the data processing is usually based on NumPy, so there's data structure that all these tools can build on, like SciPy, like Sci2Learn, Pandas for data analysis. All of these tools are well integrated via the data layer. And it's not only the data layer that integrates them. And now we come to Scython, because Scython is a great way to integrate code. So, the data ecosystem works that well because many of these tools use NumPy as their data layer. And surprisingly, a large number of these tools that we use integrate external libraries, integrate native code via Scython. Why Scython? Well, it's actually the fastest way to integrate native code. I understand here I give the talk, so I can say that. It's production proven. It's actually widely used, and it's really all about getting stuff done. So, it's a programmatic programming language. It helps you keep your focus on functionality rather than having to care about boilerplate everywhere. And it allows you to move freely between Python and C or C++, which is something that makes it a really unique programming language at that level. You can write code in it that is as pathonic as you want and as native as you need it. And you'll see examples for that in a couple of minutes. Basically, we write the C code that you don't have to write. We write C, so you don't have to. Okay, here's a demo. Who knows the Jupyter notebook or Python notebook? Actually, not so many. You should be using it. It's great. It's a wonderful tool. What gives you is a little web server that you run and then gives you a web interface. So, it runs in your browser and allows you to program in your browser, run code, and do stuff like data analysis and program it in very interactive ways. You type a line, you get feedback, and you can visualize data through it. So, it has direct output for graphs, for example. Lots of tools support it. There are tools that do interactive, provide an interactive way to visualize data and move stuff around and try stuff out and that. So, it's a really great tool. So, remember that name, Jupyter Notebook. So, what I'm doing here is I'm using it for my presentation. There are some actual presentation tools also for it. So, I could give you something like swipe here and there and do stuff, but I actually prefer that way. So, in order to make the Jupyter Notebook run with a Python, all you have to do is load X-Sython then it imports X-Sython and it just knows what X-Sython is and has X-Sython support in it. And then, just to give you an idea about the environment I'm working in here, you know that a co-op developer is talking to you when they are using some pre-release alpha version in a live demo. So, let's see how that goes for me. I'm using NumPy, using Python 3.6, kind of a recent GCC version, more or less, okay, a bit of boilerplate here. Quick intro to Python. This is normal Python code. So, what I do is I take the Python Math module and import the sign function from it and I call it and I get the output from it. So, that's sign of five. And now all I have to do in order to use Python with that is I tell the Jupyter Notebook this is a Python compiled cell. So, this is no longer interpreted by Python. Please make it run in Python for me and then what it does is it compiles it for me in the background, imports the module and runs it. Okay. So, this is now compiled code, a compiled module and since, so what Python does for me is it compiles the Python code to C, also C++ if you want, but generally C, so into native code. Okay, so what the Jupyter Notebook does here is it runs Python to generate the C file for me and then starts the C compiler in the back to generate a native extension module, a shared module that then gets imported by Python. Okay, so far. That's the general build process. Now, since this is translated into C, what I can do now is instead of using the Python math module which requires me to do a Python call and has some checking here and there and some special casing and more than I would require, generally require to, you know, say sign of five, I can use the libc math support and this is how I do it. I say C import, so that's a static import in Python C import libc math and then I take the sign function from it and here I just assign it to a Python variable and what that does is, you know, it auto wraps it for me so I get a Python function which now internally calls the sign function and I can use that from Python and then it says that's the same variable as before. This is sign of five. Okay, so what did I do here? I made Python auto wrap a C function for me to make a callable from Python. Okay. That's a bit boring from the sign function because, you know, there's Python support for it. It gets a bit more interesting as soon as I do not only call some C function but do stuff along the way. So I want to have an interface, I want to have a function in Python that I can use and it internally uses the sign function but, you know, I could say sign of five but maybe I want to have sign of X squared or something so that's a bit more computation involved now and in order to drop this from a Python computation so I could say sign of X squared, okay, fine, in Python, sure, I can do the same in C and it's actually faster to evaluate in C, right, because there's just, you know, the processor sees what it does, it's no longer interpreted so it's native code running. So what I can do here is I write my own little Python function so this is now a bit of extended syntax that we have in, in Python I can say the X the argument that I pass in is in fact a C double so I'm typing my input argument here, my variables and then I call the sign function in there and that's essentially the same thing as here so here above I get auto-wrapping and down there I'm spelling out the function, okay, so in here I'm writing a Python function that calls C internally and still gives the same result and now if I want to do sign of X squared then I just spell that out and just to see the difference I will actually ask Siphon to tell me what kind of C code it generated for me so for the function I just wrote when I, don't just say percent percent Siphon so the Siphon cell, when I say it's Siphon minus A for annotation then it spits out an HTML snippet for me which the Jupyter notebook can display for me here which contains my source code and tells me how Siphon interpreted that source code for me, how to compile a sort and you see there are a couple of shades of yellow in there and these shades of yellow tell me how much Python interaction there is going on so whenever objects are being treated in some way the more object operation there are the more, the darker the yellow gets, okay so there's obviously a lot of Python interaction and this is the C code here that generated going on in the signature actually has to convert some Python input argument that comes in some object into the C double that I declared so it has to take some object, unpack it into a C double do that conversion, do some general argument handling in the way that the Python semantics work so this is the place where the Python call semantics is mapped to a C call semantics, okay there's a lot of object and operation going on here and then the next line is just the line that quotes a sine function, you can see that here it just says sine of X in the C code so this is really a plain C operation and since this is a Python function the return argument then also has to be an object again so this function does objects to C conversion on the way in and C to object conversion on the way out and you can see that here so the result of sine of X is converted using a C API function of Python into a Python float object okay that's pretty much all there is to it now when I want to do sine of X squared here's a symphonic way of doing it I'm square in X and then I call sine of X squared and again when I look at the result here Python minus A gives me down here there's a pure C operation so no object operation is going on here X times X is the square and then sine of the square and convert it into an object again and Python does all these things automatically for me so you don't actually see them in the code it's just the type system of Python which mixes Python and C but that allows it to see where object conversions are going on where native code can be generated from your code and where Python operations are needed and Python does all this internally so by default it's Python you can drop Python code in there it compiles into C but it has Python semantics and whenever I say I know better you don't need an arbitrary object here you can use a C and a C double some native data type and the compiler can say you leave in Python semantics you get into C semantics here which are faster but different and it automatically does it for me so it generates different code when I say what the data structure or the data type is different here is that clear any questions on that high level something this is generally how the language works it mixes Python and C normal Python code just looks like that that you always jump somewhere between Python operations and C operations or C++ operations back and forth and it just changes nicely this is how you do memory handling and see how many of you actually have some kind of C knowledge quite a number that's two-thirds maybe what's that more than the Python I think the Python background was a bit bigger even but there's not a big difference interesting so this is the right talk for you so see memory handling meloc and free you probably know those and you can just use them in the same way in Scythons so the Scython code to meloc if the meloc failed then here I do raise memory error this is totally C-ish code you do meloc and free and right in the middle you see raise memory error I love that and it works that's how you do memory error handling in Scythons the same way as you do in Python it's much nicer than in C where you'd have to say what do I do now try finally catch an error catch an exception somehow it's just as safe as Python so in the end I do call free and then here I do a bit of memory processing you see that I'm using a pointer I'm using a slice assignment here so this is normal Python syntax just with C data types okay can I make this a bit smaller is that size okay can I make this smaller for anyone okay because there's a bit of more code coming now so using third party libraries this is how many of you know Lua has a problem in the language yeah quite a number so Lua is also by default an interpreted language there's a runtime for it there's also a jit compiler LuaJit which I'm actually using here and now what I'm doing is I'm writing the Lua interpreter that takes a string a bit of Lua code puts it into Lua executed there and then returns the result of the processing back into Python okay and I've implemented that in Python so the first thing you have to do you've seen this C import Lipsy math before that's the simple way because Python comes with pre declarations for Lipsy so most of what you would use in Lipsy can simply be C imported from Python code for external libraries Python does not ship declarations and you need to copy them together from the header files so what I did here is I took the stuff that I needed from the Lua header file make create a new interpreter clean it up load some into it Lua is a stack machine so I'm doing some stack operations to get data in and out of Lua and call a function in Lua so that's basically the functionality that I need copy those from the header file into the declarations to make Python aware of the C API that I'm working with so now it's, Python understands the C API and knows what functions I can use to run it out and I can now just call them so what I do is I write a Python function run Lua which receives the code first thing I have to do Lua doesn't have Unicode support so I convert the code into UTF-8 if necessary then I create a Lua run time if that fails I can look up in the documentation the problem reason for that is not enough memory so I raise a memory error if that fails and then once the Lua run time is created I wrap all the rest in a try finally statement because I know in the end I have to close the Lua run time to free the resources the resource management using try finally here and the first thing I do is I take the Lua code load it into the Lua run time if that fails it's probably a syntax error so I raise a syntax error what I get back is then kind of a Lua code object something I can call so next thing is I execute the code in Lua if that fails I raise a run time error probably something went wrong some input wrong or something and then just for simplicity what I'm expecting back here from that code is some number output and once I convert that number into a Python object here explicitly because I have to extract the number from the Lua run time it's actually a float a double in a C double what I get back and then Cython converts that C double into a Python object again so that's what happens here in the end some cleanup clear the stack close the run time and that's it so that's all I need to execute code in Lua as I said it's a Python function so I can call it from Python now I'm going to execute this then it gets compiled it actually uses LuaJit in this case which I'm configuring here so I'm telling the build please look for Lua in this directory use this library to link against and that's all I need to do I'm going to use Lua talk to Lua so here's some Lua code where here Lua implementation of Fibonacci so I'm executing that code here Fibonacci of 10 is 55 probably going to leave that and I can see how fast that runs and it takes so Fibonacci of 24 about 2 milliseconds to execute in LuaJit okay so this is all the code I needed to write in Python to talk to an external C library to use a separate run time and as you can see I mixed Python code with C calls completely free so I'm saying raise an error, raise an exception a Python exception if something fails and pass some Python data into it and Python does all the rest for me because Python understands this expects a character pointer in C and I'm getting a string in okay I can convert that I can unpack the string and pass the data and that's something that Python cares about and I don't need to care about when I write my code really nice watch time is left so here's a real-world example of writing some data processing code in Python everyone likes Texas so this is an example of everyone understands what I did here is I actually stole this example from some Australian guy Caleb Hetting who gave a talk a Python talk, interesting enough at the Python Australia two years ago really nice talk and so here's the example he gave I really like that because it shows a couple of interesting properties so I looked up the average income in Germany which is where I come from and this is more or less it and I looked up the number of earners that we have 44 million so the average income for those over the year is 44,000 more or less that was in 2016 now the question is what's the average tax rate that everyone pays how many taxes they would pay and I actually tried to back these numbers by real data somehow and didn't find any I didn't search very deeply but I guess it's just for privacy reasons that they don't publish accurate data on this but that's the average numbers that they give and so to get actual data to calculate the tax rate I'll just make up some alternative facts here and so just to show you what data I'm using I'm plotting a graph of the income distribution which in this case is just a log normal distribution it's more or less what you would expect from the income distribution it's not entirely incorrect and you can see the minimum and the maximum here which I also not completely unrealistic and I just chose the values here the setup for the log normal distribution so that the average is also that's great as expected so that's my data and let's calculate everyone's taxes when you look at the income tax for Germany in Wikipedia what it gives you is an Excel implementation isn't that wonderful and this is how it looks like it's even a German so it's a German Excel it's not if this then that it's like VIN glorious I love it it's a German income tax so it's okay if it's a German you probably wouldn't care about the exact details but this is the formula how we calculate our income tax basically what it does is if the income is greater than this then this is the formula for it and if it's so there are certain boundaries in which there's a gradual increase in income tax so spelled out in python I think this is much more readable this is what it is so same formula exact same thing just in python and then to calculate the average income I take the sum of all incomes divided by the length of incomes obviously and the average tax rate calculates as the sum of taxes of the incomes divided by the sum of incomes okay when I run this there's the average tax of the average income and so I just execute some more cells here and then you can see that the average tax rate is actually something like 24% on my data so don't take this as actual real world then it's just fake data anyway okay so so we're expecting 24% and when I execute the whole thing in python it takes about 3 seconds to calculate this for the data set of how many do I have I actually didn't take the 44 million I took something like 20th of that just to make it calculatable during the talk okay so for that number it takes about 3 seconds to calculate the average tax rate the brute force way in python now it reminds me that when you do there when you do measurements on a laptop don't remember how much time it takes because it messes up your numbers but this is about what you would expect it runs long enough to scale up the CPU so I'll remember this as the baseline and then to make things comparable I have a little function here that tells me factor one is python and then I'll have different implementations I'll go along optimizing this and have factors to see how this compares to the baseline so that's a way to implement this in NumPy how many of you think you understand this code get a bit more over time some more don't think it's more than it doesn't it's actually kind of straightforward NumPy code it's not that simple because the problem isn't that well adapted to NumPy so basically what I do here is I have a long NumPy array with all the incomes in there one dimensional array and I'm masking out the numbers or masking in them that fits my current calculation intervals so there are between 554,000 euros and 256,000 euros then I apply that formula to them and I do that for all four sections and then so basically I'm spelling out the same formula we had there just for different parts of the array and select an arrays by the income range and then doing the calculation and this is it's quite useful in NumPy because it has to generate intermediate arrays and select stuff and mask out stuff it's a bit inefficient but it's still pretty fast and when I do this in NumPy I get the same 24% that we expected and time it says it's only taken about 60 milliseconds so it is way faster than what we had so far how much faster is it 0.9 so it's already 50 times faster than the Python version that's NumPy NumPy has a second way of doing this you can take the actual Python implementation that I had and you can pass it into NumPy and to apply it efficiently to all values in an array that's called a Ufunk so you convert the Python function into a Ufunk a universal function which means that you can apply it to the array and it runs on every element basically so I want to do that and run time it again it's actually way slower than the previous implementation so the slicing in NumPy is much faster than running a Python function but it's still faster than the Python implementation so here I can see it's 4 times faster than the Python version whereas the NumPy slicing version is still way faster than the Python implementation now let's get to Python in Python I can do the same thing I can just take this Python implementation that I had and drop it into Python compile it there and when I run this it actually comes out a bit faster than what we had before than the Python interpreted version still somewhat slow okay 251 so that is about 70% faster than the Python version not too bad given that we didn't actually do anything I mean all we did was take the code add this line up here and then it's 20% faster that's kind of okay for how much is it 8 characters I can do that but there are ways to make it way faster and that is by exploiting what the language provides for you what it does give you is your code gets compiled to C so you can use C data types in your code and that makes it way faster because especially this kind of calculation can be done entirely in C and that makes it much faster so the first thing we can do is we can take this function up here the text calculation function and we can say whatever comes in is certainly representable by C double so that's a safe bet and what it gets so then the income variable here will be C double the whole calculation can actually be done in C and I'll add that for now and I'll ask Seitan to tell me what it thinks about it and what you can see here is this line actually became plain C and this sort of so the calculation in this line is done in C now just by adding the typing for the argument that's all I did but the return value is still tighten double I can change that I can make the return value and for that I have to change the signature a tiny bit I have to tell Seitan that this is no longer a plain Python function but it's actually a C function now by changing the def into a C and now this will compile the function into a static C function so really low level and that allows me to specify the return type so now it's a function that takes a C double in and out and all the calling will use C calling convention and that makes it a plain C function and down here where I call that function you can also see that if you scroll a bit somewhere down here you'll see let me find it C you really don't want to write this code C code yourself you want a generator for that I think it's down here this is calling the calculation function now so it's a C function okay lots of code going on here okay still same result 24% but I can make that run even faster because sorry I'll just show you how fast it is now run once and we're down to 199 milliseconds and that's already about 12 and a half times faster than the plain compiled Python version even compiled is already 12 times faster and compared to the Python version is 15 times faster but I can get even more out of it and that is something that Serge just presented in the Python talk and I'm going to unpack the loop now which is may I say it Python now has Python support which is sponsored by the Python project to contribute by them and that allows you to also take NumPy code now and compile NumPy code from Python now as a background okay still what I'm doing here now is I'll unroll the loop I'll turn the kind of nice Python loop into a Serge loop now by saying 4i in range LEN incomes and then I'll calculate the total that's 0 and the tax starts with 0 and the incomes is added actually I can just make it like this keep it a bit more Pythonic so for income in incomes total it's income and then the tax adds up by calling the calculate function tax of the income and then return tax by total right? I think that's it okay run that looks better now let's run here okay total is not adding up properly so I make sure those are understood so I declare my variables now and this is how you declare variables in in Python the nice thing about it is that typing is optional so as you've seen before everything was undeclared and everything was considered a Python object by default so I declare my variables it'll understand them but it'll not do the proper operation oh I'm not declaring income yes that's the problem so double income then I can actually leave that out I think so I don't have to declare everything just gets annoying after a while yes much better and this is still so you can see that these lines here so the calculation is actually done in C now so native C code no longer Python interaction and returning the value then does the conversion back into a Python object still gives the same result good didn't break anything how fast is it now well 29 milliseconds which is way faster than 200 so that gives us a factor of 100 Python now and 60 sorry 86 compared to the initially compiled Python version ok that's much faster ok I can do a bit more Python has direct support for NumPy or for memory buffers so far I kept the data in a Python list but I can also keep it in a NumPy array which is much more memory efficient so Python list has Python objects in there so there's some object overhead involved and NumPy array is basically flat memory space of C data types and when I take a NumPy array of C doubles then it's just flat C doubles all the way which are very fast to process so when I switch from a Python list to a NumPy array as as data structure now all I have to do in my Python code is I have to tell Python what the NumPy array looks like what the memory layout is and I do that by saying you know you can unpack this NumPy array which comes in you can take it and interpret it as a one-dimensional array of C doubles and the way I spell this is like this so the incomes array will then be unpacked by Python into a one-dimensional so only one column here for those who understand NumPy this will be kind of obvious if it was two-dimensional I would say this three-dimensional like this so it just goes on like that then so this is one-dimensional and I'm saying the data type in that array is C double kind of straightforward okay I can compile that that's all fine ignore these warnings just a demo and then instead of passing the list into it I'm taking the NumPy array now as input still get the same result and time it on it says it says 362 milliseconds which is surprising that is somewhat slow let me check where that is okay my size minus 8 is broken somehow yeah that should be fine as I copied it from above should be okay from the calculation side I think the data size might be bigger now did I cut it on top I just run time it again just to be sure you never know yeah that's low okay data size is the same apparently ah okay yeah sorry what it does here and that's what size minus 8 shows me it does the iteration in Python space which is a stupid thing to do because iterating over the thing in in C space actually much faster so what I have to do now is um make the loop see-ish by doing this that's a really plain C loop now um and then x equals income of i that should be faster if it compiles actually incomes of i okay fix it so still the same thing and now it should actually be faster that here what I did was I changed the loop here to so initially it was running over the loop using Python iteration so it was actually taking the C double in the NumPy array taken a creating a Python object from it passing that through the generator into a Python unpacked it into C double again and did the computation with it so it was actually creating a Python object on each iteration and by cutting the loop into some C-ish operation um that makes it way faster so we're down to 11 milliseconds now okay and we're at a factor of 262 now compared to the initial version okay so I'll stop here just get back to this again a couple of ideas for future features as I said we have Python integration now and I would like to extend that so it would really be nice to be able to um you know have a so we have a more efficient way now to deal with NumPy operations in the Python code and I think people should start using that there's also a couple of C++ ideas that we have for example instead of having a list comprehension in a Python list you could do the same with a C++ vector this would be how it looks like it's probably easy to implement and that would be a cool new feature so if you want to contribute come to talk to me on the conference somewhere or contact me via email it's certainly something that people would benefit from okay that's it I brought stickers so grab some on the way out