 Let's see if it's working. Oh hell Hello, thanks. Thanks for y'all to coming to the first talk of today's EuroPython here in the Shanghai room and our first speaker today is Christoph here from SAP. He's an engineer in a team that behind SAP HANA's testing infrastructure in a spare time. He likes to develop Django and application and bright rust. Today's topic of the talk is is it me or the girl? Please give a warm round of applause to Christoph. Quite impressive that so many people are interested in the girl. I actually thought that hyperpharm is not a topic anymore, but probably I'm wrong. So maybe I would like also to give some background why I'm actually talking about the girl in this context now. The background is that I'm actually doing quality insurance with an SAP for one of SAP's major products, SAP HANA, which is an in-memory database. And it's basically powering the various enterprise applications What we are actually doing is we are testing each commit coming into the source code and in our scale that means we are testing around 800 commits every day. Therefore, we are operating a small infrastructure of physical hardware with around 1,600 machines and then testing in memory database, we need a huge amount of memory. So overall, we are currently using around 610 terabytes of memory across our landscape. So the problem is if you have such an infrastructure, if you have such load characteristics, you need optimized services, optimized tools to handle such incoming load. And that's actually the main part of what my colleagues and I are doing. So we are developing such tools, such services, which are optimized for all workers. And one of this tool is our own task execution frame, which we put on top of Apache Methods. Apache Methods is something similar to Kubernetes, but a bit more low-level. So what it provides us is some kind of an interface to resources of machines. So in the bottom, you can see that we have multiple data centers with our own physical hardware, but also various cloud providers with cloud instances. Every time, for instance, has some resources available, they will send them over Apache Methods to our own task scheduler. And our task scheduler then has to decide, what kind of task should I now schedule on these available resources. Now, the problem is if you add more and more machines, this also gets a bit more complicated. Especially in that case, our task scheduler is receiving more and more incoming offers with more and more events about changing states of tasks. It has finished maybe, so we have no opportunity to make new tasks, or it has failed, so we have to reschedule it, and we have to handle all these events. The problem is, at some point in time, we hit a bottleneck that our scheduling system was not able actually to handle all the incoming events and use all the resources in an efficient way. And now you can imagine with such amount of hardware, we actually would like to use them as efficient as possible. You can also see that around our task scheduler, there are various other services and databases. So the task scheduler is basically just interacting with all these various services and doing a lot of operations. Therefore, the initial design of task scheduling, a task scheduler is basically a big type of application with various threads that are handling the incoming data, processing the incoming data, which find the best way to schedule a certain task on a certain variable resource. You also have other threads around in-process itself, which are required for the ability stack. For example, the thread with the responsible component or the exception data to send-free, or distributed phrasing is in place for our applications. We have also a thread with the responsible component over to a Yeager instance. Now we have the problem that we have not the best performance and the results visualization goes down. And the last thing is about our ability stack, that we can now actually inspect each thread and each part of the system and find out where is the model. And the same method, we also find out why we cannot utilize all the other resources. Therefore, we have to take a look inside of the resource of the handling thread. And that's now a semantic visualization of our distributed phrasing system that we are currently using. And we can see the required time for certain operations. For example, we see how long you need for selecting an offer for a task or how long you need now to prepare actually the task so that it can be scheduled. And the first great thing is that for the same function we have actually different runtimes. So for example, for selecting an offer we have a variant of various runtimes from 700 milliseconds down to 30 milliseconds, which is actually quite strange. The next interesting thing is that we have also increased latency in a way that is actually not expected. So what you see in the highlighted boxes is the first thing, which is the span which is captured on the scheduler level. So we are capturing that we need around 200 milliseconds for this API. The service which we are actually asking is also transmitting the data into the same system. Therefore we know that the service itself only took around 30 milliseconds to process the API request. So that's also a bit strange and we would also like to investigate it there. The next remaining thing which is also strange in this capture is there are also gaps between operations and if you take a look into the code, there are actually gaps. Inside of the prepared task operation there are two suboperations, these two API points. There's nothing in between. Why is there latency of multiple milliseconds in between? And then we started to assume, okay, I mean, we started, we are using threads, it must be global in the framework. We are hitting the global in the framework, there's a contention and that's now our problem. But I mean, that's no problem. You just open a browser and perform some research and then you will find a lot of various ways how to mitigate the global interpreter of contention. So we could just start and replace all the multi-packing thing with multi-processing or async.io stuff or we could pinpoint a certain function which is CG bound and migrant to Cypher which are actually releasing the feelings in certain senses or let's rewrite everything in a more faster language but that would be now the solution of the problem of the global interpreter. In general, you have to say that such rewrites are major effectors as super expensive. I mean, we're talking about a productive system whose probably a huge amount of workload and we would actually like to invest more time in new features, make it even more efficient. But now handling such performance problems is also important but we have to use the right things to actually solve that performance problem. Therefore, we took one step back and decided okay, let's first analyze the problem. Find out is it actually the real contention and if we then detected that we can actually go and maybe we said even better what kind of mitigation is the best one for our application. So, let's take a look on the GIF. I mean, it's probably easy, you just import a module, for example, sys for the Python interpreter and then you ask the Python interpreter how it's going with the GIF. Sadly, there is no simple function which you can use to get some GIF statistics which means that we have to think about some key points. And then I thought of it about okay, what I would actually like is to know about the GIF of various things. The first thing I would like to know is the basic fundamental metrics of a log. I mean, at the end, the global interpreter log is a log. So, I would like to know how long does the thread actually wait for the log and how long does the thread actually hold in the log. If I know that metrics, and I mean, you all know that if you have just some numbers, they are often not so useful. Therefore, I would also like to have some additional context to which thread it is, which Python function is actually suffering from that contention maybe. Is it maybe possible to get trace ID or request ID in that part because then I can correlate that with other systems. And I would actually like to use this in our productive environment because it kind of teaches that in my local machine because my local machine is not connected to a cluster of multiple hundred different streams. So, that's a problem. And in best case, it would be integrated with our existing observability state so that my colleagues and I don't have to learn a matter of how to use it, how to interact with it. So, with that list, I went through the internet and looked out what are the variables. What can I analyze? I mean, I'm probably not the first person who's thinking about that problem. So, there are some related ones. There's, for example, a super interesting talk by Dave Basely from Python 2010. Python 2010. But it's actually quite up to date because it already talks about the nuclear implementation which we have at Python 2.2. We've written his talk. He's explaining how he measured the yield contention and stored the data so that he actually was able to generate such nice prep. The problem is the instrumentation and so on only works with Python 2.6 and there are some other problems which are not so usable in our productive environment. The main problem is you have to shut down the interpreter at the end to dump all the data out and then you have to generate the visualization. I actually don't like to shut down our scheduler because, I mean, every minute it is not running, we are basically losing resources. The next thing which can be in my mind is the threat concurrency visualization of Python, which is actually a very nice tool to visualize log contention in your application. The problem is Python doesn't take the yield into account In that example, I just run an application whose heavily yield bound because it's basically always wasting time in ECG cycles, but you don't see anything. There's a new implementation called yield load which is quite interesting because it's a profile which you can easily integrate in your existing Python application but during the application one time it will print out a load number it's compared by the load of a Linux system, for example. The problem is at the end you have only a number you don't know what is now the yield of that problem. So if I do that on an application whose heavily bound to the yield and who runs all the things with the same potential I just get the information, yes you have a problem no other information how to solve that. A very promising approach is PySpy which is a new profiler for Python written in Rust which is very hard and you can easily attach it to a running application and then you can get this nice overview about what functions take what amount of time and so on. And it also includes the yield utilization which is quite nice but again you don't have a breakdown to find out okay what is not the problem of that. The truth is there is no magic yield retention in a little bit of the tool and that means you probably have to do it by yourself. Okay, that's great a tool which actually yields the yield and hopefully it is able to give me all my wishes about such a tool. For that we can use an existing framework and basis system which is available on Linux machines and it allows actually to analyze applications by attaching certain event handlers to applications. So it actually allows the attached handlers to certain events which are emitted by applications or by the Linux company itself. Then you can do certain calculations within this event handlers and print out the measured time. The nice thing, C-Python 3.6 actually introduced support for systems that have a deep trace and there are already some markers, some event emitters that you can use to analyze your Python application system. So for example we have function entry and function return which will be involved every time the interpreter goes into new Python function or returns from the function. I can highly recommend the documentation about that because it's super verbose and helps really to understand the concept of system type and how you can use it. And I also saw in the schedule of this conference there's another talk about low-level info filing which will also cover system type. The problem about this approach is that most previous Linux packages actually don't include a Python interpreter which is compiled with the input system. So the markers are not there. Also there are no markers for the deal-related areas where you would actually like to know when a thread acquired something or when a thread is actually stopping the interpreter. But especially the last part was actually not so complicated to implement. So this is just one part of the patch who introduced some markers regarding the guild. So every time a thread will now drop the guild it will emit an event about that. And every time a thread tries to claim the guild it will also emit an event of that. As you can see we can also add arbitrary attributes to these markers which will be then accessible in system type. In that case I'm using the Threadedance so that I have an understanding and I have an idea what thread is now performing actually is actually performing this action. Now you may be asked why you don't use the thread names. I mean every application should have an IS thread name so that you can actually recognize the threads. The problem is at the moment there is sadly no CAPI to get the thread names without actually holding the guild which is a bit complicated if you would like to measure the time until you actually have the guild but you need the guild to get the thread names complicated. But I think that's also something we can start at some point in time. Okay, now we have these markers. Now our Python interpreter is emitting these events at the point in time they are passed. Let's measure the time at that point. We are attaching probes so called event handlers to these events. For example we are instructing system type that it now should look inside of the shared library of the Python interpreter where the markers are located and for example if we now go in the guild claim part so a thread would like to acquire the guild we will measure the time of today in nanoseconds and store that in a hash map of system type as we are using the thread identifier. If you now acquire the guild we can actually calculate how long does it take until the thread acquire the guild and also store that in another data type of system type which is an aggregate which allows us to actually get some statistics out of system type so for example distributions in the form of a histogram or an average and so on. And the same thing we also do for the guildfrog where we can actually then now calculate how long we actually hold the guild. Now we are calculating these numbers. What we actually would like is to print them out in some kind of report. For that we can use the handlers which will be invoked at the startup and determination of the system type tracing session. So if the tracing session stops now we can print out some nice debugging information terminating the tracing and print out a summary of over all the measured threads with the respective timings. Let's do some example. In the first example I have a Python process with two IO bound threads. By the IO bound or how do I actually make an IO bound thread is by simulating the IO with time.sleep because it's basically the same behavior as for example a read request on a socket. Before the actually the thread goes into the sleep mode for a certain number of seconds it will actually release the guild. After the sleep is complete it will try to reacquire the guild. The same thing also happens on a socket. You would like to read something from a socket you define the amount of bytes you would like to read as long as there are not enough bytes or actually no data at all available it will plug and it will wait. But before that, Python will release the guild for you. If we now measure that we can actually see that we actually don't have any problem with that application. You can see that the main thread only had to wait a bit more than one millisecond on the global interpreter look which is quite nice especially if you know that this application runs for 15 seconds in that simulation mode. If we are looking on the whole time we also see that the main thread is the most prominent thread who holds the guild the most of the time. Which also makes sense. It has to import the libraries the threading library and so on. So the most time will go probably for the initialization. The IO threads itself are super lightweight so they don't have to wait that long on the guild and if they have the guild then also they don't consume that much time with the Python interpreter. Overall we can say we hold the time only 0.2% of the full run time and we also had only a wait time of less than 0.01% of the full run time. So basically no guild contention at all and in that situation the guild is not a problem. We can now change that easily if we go and introduce a CPU bound thread. How do we simulate a CPU bound thread? It's actually quite simple. We just need an endless loop who is doing nothing. The loop with a pass inside of the loop itself will do all the trick for us and we now have an CPU bound thread. If we now take a look on the timings we can actually see a guild contention. We see that the main thread has the same behavior as before same whole time but increased wait time. Okay, that's already a problem probably. The most important problem is that our IO threads now have a much higher wait time to get actually the guild. So we are now waiting more than 700 milliseconds just for getting the guild and we are actually only waiting around 100 milliseconds in our application. The whole time is still the same and the CPU thread which is a CPU bound is actually consuming all the valuable CPU time and therefore it holding the guild nearly the full run time. So overall we now see that the guild was basically always active by at least one thread and we also had some wait time in that. Even more interesting is the latency for the IO threads. If you are taking a look on the histogram of the IO threads then you will basically notice after some time that the latency is quite stable between 4 and 8 milliseconds. And that's quite interesting that it's so stable but it also shows us one main disadvantage of this guild contention. The guild contention already affects the overall performance of the application by introducing 5 milliseconds additional latency to any unlock of the global... sorry, of any attempt to acquire the guild after a blocking IO operation which is actually quite a problem in some cases because normally you run multiple IO operations during for example an HP request. Why it's so stable? Because of the internal switch interval which is an implementation detail of the guild but it's actually quite interesting to see that in action. Every time a thread would like to acquire the guild it will check if someone is holding the guild. If someone is holding the guild then it will go into a condition and will sleep up to 5 milliseconds. If now the guild is still acquired it will send out a request to drop the guild so that the other thread who is still holding the guild should please release the guild so that the new thread can take over the global enterprise. That means this can add already this additional latency if you only have Python instructions. Then you will basically always get the stable latency of around 5 milliseconds. The problem is a thread who is holding the guild even longer than this 5 milliseconds because bytecode operations take longer or you're calling out into an external state function but actually doesn't know anything about the guild and also don't release the guild. Okay, now with that we have some tool set which we can use to analyze our application our productive application. So here's the plan, how we do that. We deploy our new container with our custom C-Python for custom C-Python version including system type. On our cluster we go to this machine which is running the scheduler we attach to the process and we get some nice insights about the guild contention. In reality it was a bit different. It was quite easy to deploy the container with the custom C-Python version a big deal we install system type on the host because it's actually not that easy to get system type running inside of a container. And that's a bit related to the architecture of system type. System type is actually transforming your script which I just showed some slides ago in a real kernel extension and loads that at run time. And then it is that kernel extension running to measure all the timing and in the end the data will be printed out. That's possible. I also did that in our productive environment. It was interesting but I don't recommend that. Especially if you're talking with an insecurity guy he will be probably not happy if you start adding custom kernel extensions in your productive environment and run some processes root your full isolation is basically gone. Nevertheless I measured some nice results. Over two seconds observation of our process I found out that we actually hold the gil around 88% of the full run time sorry of the measured time frame. In the same time frame there were so many threads waiting for the gil that we had an overall wait time on the gil of nearly 300% which actually proves our application really suffers from a gil contention. That's not great but it actually revealed even more questions. The problem is the main question which first came up is are there threads holding the gil longer than 5 milliseconds or does all threads actually give up the gil quite fast and we are hitting the limit of switching around various threads. That would be one possibility but let's take a look if we actually see that in our infrastructure. If we see that I would actually like to know which function is taking so much time and which function is not releasing the gil. And also one question which came up was is there maybe some kind of cluster ring in these measurements that are possible to identify certain clusters for certain operations at a certain patterns in this measurement that are very intensive for the global interpreter log contention. Overall the biggest problem was that with 31 threads it was actually quite hard to read this text report and that's also a problem with so many threads that such tools are often not supported and I would not recommend that. So one thing which came into my mind is that timelines are much easier to understand. I mean if you take a look on distributed tracing systems they are actually quite good in visualizing how long a certain operation takes. It's much easier to recognize how long this operation takes based on the size of the span individual representation. So the same thing please also for the GIL contention and maybe find out what's going on in our application. Therefore the idea is actually let's use still system tab because at least it proved that we can collect the data collect the data with system tab and print out a text file which we then can load into a two to per notebook and can do various analytics that can create some nice charts some nice visualizations all the things we can do with our data science tools and I think that's one of the most nicest things about Python in our days that we all have this nice visualization libraries at hand to make data quite easily to recognize. What I did is to visualize that with bookie. This is now the time visualization of our 31 threads over two minutes what you can see is that in the areas where we have dark blue boxes there we have a GIL contention which is actually we have a GIL usage which shows that thread is holding the GIL longer than 50 milliseconds. Please remember switch interval, 5 milliseconds in the red boxes in the red areas we have now threads who are waiting for the GIL longer than 50 milliseconds. If you are now thinking about optimizing web applications you normally try to aim as fast as possible and every web developer who says 300 milliseconds are totally fine for a web page I'm not sure about that. Okay let's zoom in a bit and we can take a closer look. So these are now 20 seconds, 5 seconds now we came to the interesting part of one second and what we saw is that there are actually a clustering of big blue boxes so for example this part is taking more than 500 milliseconds 500 milliseconds this thread is doing all the work and the other thread can do anything and the first question was what is this thread? It's actually a thread who is collecting metrics of our scheduler is sending out his metrics into a central system so that we know how many tasks are running how our queues are currently and so on and after adding additional visualization I found out that this thread is actually consuming 75% of the full of the full whole time which is super expensive if this thread is only taking some yeah metrics out of the application okay so now we know what is the problem and the nice thing is, now as we know the problem we can actually fix the problem so what we did is we started replacing the C extension we were using inside of this thread to calculate all these various metrics we actually found out that the C extension never released a GIL because it's actually internally using heavily Python objects but if you read normally C extension at least that was the intention by choosing that C extension then you think normally it will probably release the GIL quite early in the time it's not always true also probably most simple fix was to just change the interval of our metrics collection so we just changed the interval from 10 seconds to 120 seconds so we are now collecting less often intervals but I mean if it can solve the GIL contention not a big deal and that actually happened we saw a huge usage of time a huge usage of CPU time where the GIL was required for processing and also a huge amount of time where threads are waiting for the global interpreter now after applying this simple fixes this simple changes it was actually possible to reduce the time by half the time so we only now hold the GIL for around 43% and we only wait 80% on the GIL still the wait time is not great but nevertheless with this simple fixes we were able actually to speed up our task execution scheduler so good that we can now actually use all the vapor resources and it basically saved us a huge amount of refactoring of our application of replacing multi-threading with multi-processing our ASIM guide where we had basically to think about is it maybe easier to rewrite full application from scratch and I mean that is probably the most important information out of performance related stuff please try first to find out if you have that problem if it makes sense to invest in that area probably find even a better solution if you have more insights about that and especially now that will help us to find out what is now the next best evolutionary step for our application does it now make sense to introduce multi-processing or does it make sense for example to rewrite some functions which are a bit more CPU bound than other functions and move them into siphon for example also this full topic revealed many multiple additional ideas how we could actually improve the Python ecosystem in that area so the next things I would actually like to do but as you know time is always a limit I would actually like to bring this tool set in a more reusable state so that it's easier to use for people who are not involved in that script but also I think it would be possible to extend C-Python in some areas to make it even easier to build such kind of tool sets so for example the C API for threat names are one thing which would be super helpful every time I read the system tab report I had to map the threat identifier to human understandable name otherwise you have no chance with 31 threats also one question I had in my mind is would it be possible to actually move this metrics collection directly in C-Python for like the profiler itself it don't have to be active all the time but maybe we can enable it on demand if we actually need it and collect the statistics and get them out of the C API it's probably also easier to integrate that in existing observability tools for example distributed tracing which shows yes, this operation actually took 700 milliseconds but overall I only consumed 30 milliseconds of the GIL and at that point in time you would already know probably I have a problem with my GIL if you are interested because the room is quite full then please watch out for me I would actually like to talk about that I would like to hear some feedback, things that could be improved or things which would be interesting for you because with that feedback I think we could actually start some things in that area to make performance measurements in Python much better and I mean yes we have the GIL we will probably never remove the GIL from C Python because it is not so easy otherwise people would already did that but if we learn to live with the GIL then we can also improve overall Python application performance and we don't have to rewrite them in I don't know C++ or C or assembler if you really like to do that okay then thank you very much and as we have 5 minutes I think we can do one or two questions thank you okay try again, okay does anybody have any questions there are microphones at the back I can come to you if you are at the front please show a hand sign okay it seems that everybody is stumped for questions so I will thank Christoph for the talk I hope you enjoyed the rest of the conference thank you very much I Python can do that alright let's start hi everyone I'm Sebastian I write Python code for living and I also teach people how to write Python code you can find me on Twitter so if you have some questions or comments that's probably the easiest way to find me and get in touch and I'm also sometimes blogging at this URL I have a few posts about I Python and I'm planning to actually write a few more posts about and two technical remarks before we start so first there will be a lot of features that I want to talk about and when I speak publicly I tend to get nervous and when I get nervous I tend to speak fast so if you miss something and you want to come back or maybe you're sitting far you cannot see it here is the link to the slides I will also display this link in the description in 3.7 of Python so if you try to reproduce some of the things and they won't work just make sure to update I Python and Python to those versions so why am I giving a talk about I Python well I've been using I Python for over six years and I thought that everyone else from the community is doing the same which apparently is not true some people don't know about I Python so I mean I Python is much more than just syntax highlighting so I decided to gather all the most interesting features and show you how they can be used to boost your productivity we'll start with the basic and then we will move to more advanced stuff later so what is I Python exactly well for those of you who never heard about it I Python is the father of the Jupiter notebook project no, just Jupiter project so I Python was initially created as 259 lines of code by Fernando Perez in 2001 and this code was just executed at Python startup and all it did at that time is to display numbered prompt store the input of each command in a global variable and import some libraries for mathematical operations and plotting so it's been around for over 18 years initially it was just interactive prompt for Python later it was turned into I Python notebooks to make data analysis easier then project Jupiter was born the idea behind it was to decouple the notebook part from the engine part so people could use the notebooks with different programming languages today project Jupiter is probably the most well-known form of I Python but this talk is not about Jupiter if you're interested in learning more what Jupiter can do there is a great 3-hour long talk that was given at Python US in 2017 by the core developers and long-term users of I Python so you can check it out and even though I won't talk about Jupiter or notebooks today most of the stuff that I will mention will also work with Jupiter so I Python is a repo for those of you not familiar with this term it stands for read-eval-print-loop so it's a type of shell that reads a command, evaluates it prints the results and waits for the next command so I Python is basically a Python repo on steroids like a massive dose of steroids it has syntax highlighting it has tab completion and not only for keywords, modules methods and variables but also for files in your current directory or for the unicode characters it has smart indentation so if you start writing a function or you start writing a loop and you press enter it will automatically indent the next line you can search in the history either with arrow up and down or by typing the part of the command too much than using arrows or by pressing control R than typing some text and then pressing arrows to switch between the results but that's just the tip of the iceberg so I Python also has a fully configurable you can swap kernels, debugging many many things so what I really love about I Python is how easily you can access the documentation of basically any object you can think of classes, variables, functions, modules you name it all you have to do is to append or prepend a question mark to the name of the object and if you want to see the whole source code of an object you have to use two question marks instead of one also a nice trick so if you're not sure what's the name of the function that you want to call you can use stars as wildcards to see the functions matching certain strings so here I want to run a function from the OS module and I vaguely remember that it has something to do with a deer so I'm just listing all the functions containing deer in the name so I Python stores the input and output of each command of the previous sessions and if you enable it in the settings it will store the output as well if you want to access the cached input for a given cell there are many ways that you can do this I Python will create a new global variable for each input command that you use or you can use underscore IH or in lists to access the previous commands just keep in mind that those two lists are indexed from 1 not from 0 the same with the output caching so you can run a function from one cell to one of the global variables or using one of two dictionaries that stores them you might be wondering why do I care about the input and output caching well, did you ever run a command that returns a value just to realize later that you actually want to do something with this value I did many many times and if it's a fast command then no problem you can rerun it because you just can't rerun it because you had authentication token and now it expired then you have a problem unless you're using I Python so everything is cached you can just go back and retrieve the value from the cache on the other hand if you don't want the cache if you don't want to cache the input for a given command you can put the semicolon at the end of the line I Python won't print the results I'm not sure one of the coolest features of I Python are the magic functions so magic functions are a bunch of helper methods that starts with one or two percentage signs why the percentage sign well, to distinguish them from standard Python functions as they behave slightly different for example they don't require parenthesis when you're passing arguments just keep in mind that in Python there are two types of magic methods but those functions are something completely different than I Python magic functions so there are two types of magic functions line magics and cell magics line magic functions are similar to shell commands they don't require parenthesis when you're passing arguments and if a function is starting with two percentage signs then it's a cell magic cell magics can accept multiple functions so you enter and you type the input code that the magic function will run on to let the cell magic function know that you are done with typing the input and it should run now you need to press enter twice as of version 7.4 of I Python there were 124 magic functions now I'm going to discuss all of them one by one now I'm just kidding I don't have time to discuss them especially since the documentation of those methods are pretty good so if there are some functions here that you don't recognize I suggest that you take a look and maybe you will find them useful I will just quickly show you a few interesting ones as I said before I Python keeps the track of the commands that you run and the history function can be used to print those commands back from the current session or with a number specifying which line of the history you want to print I'm actually showing you the history because it's one of a few functions in I Python that can accept a range of lines as a parameter and the range parameter is quite interesting so let's take a closer look at how it works there are a few ways you can specify a range of lines in I Python the simple one is to use a dash between two numbers you can also mix ranges and you can also add some lines so in the first example I'm selecting lines 2, 3, 5, 7, 8 and 9 it's also fine if the ranges are overlapping or if they are duplicated and if you want to reference lines from the previous sessions you can specify the session number at the front slash and then the line number or a range it's great but usually you don't remember when I was preparing sessions when I was preparing this slide so I Python accepts a different notation you can use until the character prefix to say I want to print history from that many sessions before the current one so in the third example I'm printing the line number 7 from two sessions ago also you can provide the session number and skip the range parameter that way I Python will print to the whole session and it provides a range across multiple sessions so in the last example I'm printing the history from the first line 8 sessions ago until the fifth line 6 sessions ago so even though writing multiple lines of code in a Python is easier than it is in the default Python because you have the smart indentation you can always make it even easier with the edit magic command it will open a temporary file in your favorite editor where you can type the code and after you save and close that file it will execute it in Ipython and by favorite editor I mean the one that's defined in the editor or visual environment variables so if you don't set it up you will probably end up with the greatest text editor of all time or nano or vim so each time you run the edit command Ipython is going to open a new file so if you want to go back to that time you have to pass the minus p parameter and also to save yourself typing the edit thingy you can just press F2 this is a shortcut what's really cool about the edit command is that it can accept an argument and depending on what this argument is edit will behave differently so if it's a file name Ipython will open that file if it's a range of the input history Ipython will open a new file and copy the lines from the history to that file if it's a variable Ipython will open a new file and copy the content of that variable to that file if it's an object but not a variable for example, if it's a function name Ipython will try to figure out in which file you define this function open that file exactly on the line when the function definition is starting which is super cool and you can use it for example to monkey part some functions and finally if you recorded a macro you can use the name of the macro to the edit command to edit the macro so run magic run magic will run the Python script and load all its data into the current namespace seems pretty straightforward but I find it very useful when I'm writing a module or just a bunch of functions in a file and I want to test them if there is a bug in my module I can't just re-import it I would have to import this I can't just do from my module I would have to import the reload function from the importlib module and use that to reload my module which is a bit of a typing it's not 100% reliable and to be honest I'm usually forgetting the name of the importlib library so instead of importing my modules I'm usually re-running them I can run it how many times I want and each time I do this Ipython will update the current namespace with the latest code from my module also, as a bonus point there is a configuration option of Ipython called auto reload if you enable it Ipython will always reload the whole module before running a function from that module so there are many other magic functions that you can use for example to re-run some commands from the past or edit them and then re-run them to save commands as a macro save them in a file or in a pastebin so you can share them with someone so you can save them in a database and retrieve them back in another session or you can just print a list of variables or functions that you have created in a nicely formatted way so far all the magic functions that I mentioned were line magics as for the cell magics there is a whole collection of functions that you can use to run a piece of code written in a different programming language one of the most interesting cases until the end of this year is when you want to quickly test a piece of Python 2 code you can type double percentage Python 2 then write the code press enter twice and Ipython will execute it with no problems it works with other languages like bash or ruby or javascript out of the box and also notice how in the last example I don't know if you can see it but Ipython is actually correctly highlighting the ruby syntax so what if those 124 magic functions are not enough well, you can very easily create your own magic function all you have to do is to write a function and decorate it with either a register line magic or register cell magic decorator let's see an example so here I'm creating a magic function that will reverse any string that they pass first we start with writing a function that takes an argument and returns the reversed version each line magic function should accept at least one parameter the string that will be passed to that function when we call it next we import the register line magic function and use it to decorate the function that we just created I'm passing a parameter to the decorator that will be used as the name of the magic function if I don't do this, my new magic function will be called in the same way as the function that I'm decorating in this case it would be L magic so I want to change it to I want to change the name to something more descriptive finally after I run this code in IPython my new magic function is ready to use it will reverse anything that they pass and since all arguments to the magic functions are passed as strings I don't really have to worry about checking the types to see if I can reverse it or not creating cell magic functions is pretty similar and you can even create a function that will work both as a cell and line magic function if you want to learn more the documentation of IPython has some pretty simple examples and I also wrote a very short step-by-step guide on how to create a cell magic that will run the mypy type checker on a block of code so creating magic functions is easy but to be able to run our magic function we had to copy and paste our code into IPython if you want to run our magic function often then each time you start a new session you will have to paste this code into IPython which sounds terribly inconvenient so we might want to turn our magic function into an extension extensions in IPython are an easy way to make your magic functions reusable and to share them with the world and they are not limited only to the magic functions you can for example write some code that modifies any part of IPython from custom key bindings, custom colors you can also add an extension to the configuration and you can very easily turn that into extension to create an extension you need to create a file that contains the load IPython extension function this is the function that will be executed when you load the extension you can optionally add the unload IPython extension if you want to have your extension to be unloadable OK, that's a pretty vague explanation let's see an example let's say we want to turn our magic function into an extension that was the code of our magic function all we have to do is take this code put it inside load IPython extension keep in mind that this function should always accept one parameter the IPython object so even though we are not using it in our example we have to accept this parameter and then the IPython will complain and you have to save the function in a file called inside the extensions directory of IPython now if we start IPython and load our extension the magic function reverse will be available in our session all the load X magic method does is to find a file with a matching name and call the load IPython extension function from that file that is deprecation warning and might be thinking why am I showing you something that is deprecated well, it's not really deprecated it's just a subtle way of IPython telling you hey, I see you have created an extension how about you share it with others and publish it on PyPI so I don't think there is any point in publishing such a silly extension on PyPI but then again I don't think there is a need for a package that puts text left yet some languages think differently, so of course we're going to publish it so you can find a package here you can install it with pip and voila, now you can reverse strings with a magic method in IPython so this package contains just the absolute minimum of code that you need to publish an IPython extension on PyPI so if you want to publish your own you can go check it out so that was how we can write extensions if on the other hand you want to see the extensions that other people have created there are two places first and biggest place is the extension index it's a wiki page on IPython GitHub repository that contains a huge list of extensions just keep in mind that some extensions here can be quite old and you might have problems installing them but if you see an extension that you really like and you cannot install it just copy the code and paste it into and that's how it works and the second place to find extensions is PyPI the IPython developers are actually recommending to put your extensions there and tag them with IPython tag but not everyone is tagging their extensions properly so simply searching for IPython or IPython magic on PyPI can return you some more results so what kind of extensions people are creating well for example there is IPython SQL databases from IPython there is IPython Cypher that lets you interact with Neo4j or Django ORM magic that lets you define Django models on the fly the popularity of those extensions is not very high many of them are below version 1.0 or have been abandoned long time ago but sometimes you can actually find something useful so what else can you do with IPython you can for example run shell commands and any command that is starting with exclamation mark is treated as a shell command and some of the most common ones like the CD or LS can work even without the exclamation mark you can create aliases aliases in IPython are basically the same thing as aliases in Linux they let you call a system command under a different name and in IPython they can also accept positional parameters speaking of aliases there is actually cool function called very well known magic function called rehash x it will load all the executables from the path variable into the IPython session which basically means that now you can call any shell command right from IPython which is pretty cool little curiosity like here I'm starting a node repo inside IPython repo I want to go deep down and start more repels but I failed so IPython has a four different settings and you can see how verbose the exceptions should be and you can change between them with the X mode magic function you can select the lowest amount of information a bit more verbose a context which is the default one and the most verbose which will also show you the local and global variables for each point in your stack trace and starting from version 7 of IPython you can execute asynchronous code by using await whatever you want so if you try to put await in a top level scope in standard IPython repo you will get a syntax error however IPython has implemented some hacks to make it work so if you're playing with some asynchronous code and you want to quickly await an asynchronous function this is a great way to do this just keep in mind that this is not actually a valid Python code so just don't do this on production and there is a demo mode in IPython so if you want to use it you have to create a Python file with some simple markup in the comments and then you need to load that file into the demo object this is how it works in practice each time you call myDemo object IPython will execute the next block of code from the demo in the current namespace so you will have access to all the variables and functions that were created in that block of code and you can play with them pretty similar to what you can do with Jupyter Notebooks and to be honest for a presentation I would stick to Jupyter Notebooks so people can actually see what code you are executing but if you live in a terminal and you want to impress your colleagues with a pretty cool coding demo for your next presentation this is a great tool so IPython comes with a lot of good defaults in fact I never actually felt I need to modify the file the default configuration lives in the IPython config file and this is where it's located for the current user actually when you first install IPython the file is not there you have to first run IPython profile create command that will generate the file with default values and if you look inside that file you will see a huge amount of options that you can change some files load extensions change the color schema change exception mode select a different editor to use with the edit command stuff like that if you look what else is inside IPython profile default folder you will see a bunch of directories most of them are internal to IPython so there is nothing interesting for us but there is one that is particularly interesting it's called startup and it lets you run the startup it contains a file that explains what's the purpose of this directory so basically any file with PY or IPY extension that you put in that directory will be executed when IPython starts so we can use this folder to define some helper methods or maybe magic functions remember when we wrote our magic method and we had to create a file in the startup directory and put the code of our magic method there just keep in mind that whatever you put in that folder gets executed each time IPython starts so if you put a bunch of slow functions there then it's going to make your IPython startup time very slow so in this case it's better to create a separate profile for those slow functions on your computer each profile is a separate directory in IPython folder so each has its own configuration and startup files you can create a new profile by running IPython profile create command then you can start IPython with that profile by running IPython-profile equals full and if you don't specify which profile you want to create you can debug and profiling my code and exceptions were said to be as verbose as possible and I was loading a few extensions for profiling but I was not debugging or profiling my code all the time so instead of putting all those things into the default configuration I had a separate profile for that so we talked about magic functions and extensions before and I also talked about magic functions another thing that you can do is to register some callbacks to IPython events IPython defines a set of events like before I run the code after I run the code after I start IPython and you can very easily plug custom function that will be executed during those events in general to be able to add a callback to an event you need three things that you can do to make a callback will get then you need to define the load IPython extension function and register the callback inside pretty similar to what we did with the magic functions and finally as with all the extensions you need to load it to make it work so let's see how it works in practice let's say we want to make a function that will print the variables after the class that will store our callback function so I'm using a class to store the reference to the IPython object that I will use inside my callback function then I'm defining the callback function the result parameter will be passed from the event so even though I'm not actually using it in my function I still have to put it in the function signature inside my callback function since it has to be a valid Python code I can't just use percentage who's as this is going to give me a syntax error so this run line magic function is actually a way to call IPython magic functions from valid Python code and finally I'm registering the callback inside the load IPython extension function and now I'm saving the file in my extensions directory as a var printer and now I'm going to run the Python session it will automatically start working and printing the variables after each cell so speaking of events there is also something quite interesting similar to events that is called hooks IPython has a set of default hooks that are executed at certain situations for example when you're opening the editor with edit magic and you can see that the events and hooks is how they are intended to work you can have a bunch of callback functions that are independent from each other and all of them will be called when an event happens hooks on the other hand will call only one function so if you have multiple functions attached to the same hook IPython will call the first one and if it's the next function and the next and the next until it finds one that's actually successful so let's see an example of a hook here we are registering our own function that will be executed when the editor is open this function will try to use the jet editor instead of the default one an interesting piece of code is this try next exception it's used to indicate that this hook failed and IPython will try to open another editor instead of failing moving on to the next feature debugging so IPython is my default debugging tool it all started because I was using sublime text for a very very long time and I only visual studio code which has a pretty good debugger but using the one from IPython still works for me in most of cases so how can I use IPython as my debugger first thing that you can do is to embed IPython anywhere in your code to do that you need to import the embed function from IPython and then just call it I like to put those two statements on one line so I can remove them with just one keystroke and also all the code linters will complain about it so I don't forget to remove it when I'm done now I can run my script and when the interpreter gets to that I can access to all the variables set at that point so I can poke around and see what's going on with my code when I'm done I just exit IPython and the code executioner will continue also if I change some variables from IPython those changes will persist after I close the embedded session so embedding is nice but it's not really debugging to actually run the debugger you can run magic function you can specify the file name IPython will then run the file through the IPDB debugger and put a break point on the first line the IPDB debugger is just a wrapper around the standard PDB debugger that adds some features from IPython like syntax highlighting, tab completion and other small improvements and now my favorite part of IPython the post mortem debugger almost there and suddenly it crashes because that's what programs do and you're probably sitting there and thinking man I wish I run this script with a debugger enabled now I have to enable the debugger run the slow function again and wait to see what's the problem right well no you don't at least not when you're using IPython so you can run the debug magic command after the exception happened and it works with the standard debugger. You can inspect variables move up and down the stack trace the same stuff as you can do with the standard debugger. Finally if you want to automatically start the debugger when the exception happens there is a magic function called PDB that you can use to enable this behavior so that was debugging another interesting set of tricks up its sleeves. The first magic function is called time it's the most simple way to measure the execution time of a piece of code it will just run your code once and print you how long it took according to the CPU clock and the world clock kind of boring so there is much more interesting function called timeit by default it will automatically determine how many times your code should run to this function it might run a few thousand times and for a slow one it might just run a few times. There is also a cell magic version of the timeit function it's more convenient if you want to profile code that has multiple lines. One nice thing about the cell magic version is that after the arguments you can pass some set up code that will be executed but it won't be the way you want to see why exactly it's slow what's taking so much time so we can run the p run magic function and it will show us a nice overview of how many times a given function was called what was the total time spent on calling those functions where a given function is located etc etc so here we can see that our slow function is running for 12 seconds and it's performing 50 million function calls . So now we can go there and check what's wrong with this function and if we can make it better. Another interesting type of profiler is line profiler. The p run will report how much time each function took but the line profiler or LP run will give you even more detailed information and show you a line by line report of how your code was executed. If your profiler is not included by default with ipython you have to install it from pip and then load it as an extension. Once you do this you can use the magic LP run command. Now to run this profiler you need two things you need a statement so a function or a piece of code that will be executed and then you need to specify which functions you want to profile. Let's see an example. Line profiler to check two functions the long running script itself and the one that is called important function. Line profiler will generate this nice report for each function that I specify where I can see how many times each line was run how much time python spent on this line and how many percent of a total running time was spent on that particular line. The last profiler I want to mention is called memory profiler which is the memory usage of your programs. Again, to be able to use it we have to install it from pip first and then load the extension. You run it basically in the same way as the line profiler so you specify which function you want to profile and then a statement that needs to be run. And then you get output that is again similar to the one from the line profiler you see how the memory usage has changed . So you can see that the version part of Ripple happens in a separate process. It means that the process evaluating your code called kernel can be the couple from the rest of IPython. It has one great advantage. IPython is not limited just to Python programming language. You can easily swap kernels and use a completely different language. The interface won't need to do this. So how can we change the kernel? Well, first we have to find the kernel that we want to use on the list that is published at Jupyter GitHub repo. It will contain a link to the documentation explaining how to install the kernel. Since each kernel has a different dependencies there is no one standard way to install kernels. So let's try to install the I Julia package. Let's try to install the I Python with Julia kernel. As you can see, the Ripple still looks the same, but now you can use Julia syntax. And if we try to write Python then we're going to get a syntax error. So the new kernel will work with both. I Python, Ripple and Jupyter notebooks. And while installing a custom kernel to use with the notebooks is a pretty good idea, installing a custom kernel just to use with it. So today's programming languages have a very solid Ripple of their own, so it's probably easier to use that instead. Unless you really, really want to use I Python all the time. And if you really, really love I Python, there is still a bunch of crazy stuff that you can do, but I don't have time to discuss them all, so I'm just going to quickly show some of them. So you can enable AutoCalls, so you don't have to put quotes around the parameters when calling a function. You can enable the auto-reloading that I mentioned before, so you can change the imported modules on the fly and then you don't have to re-import them each time. And if you're writing Doctest, you can turn on the Doctest mode to make copying code from I Python easier. And you can use I Python as your shell, which would be very easy to use. So if you want to use I Python as your shell, you can enable AutoCalls and run in gray hash X for all the aliases. Or you can add custom keyboard shortcuts or input transformations or if you're brave enough, the AST transformations. And since this is already a talk about the Python repl replacement, it wouldn't be fair to at least not mention the request of making a replacement for the default Python repl. Be Python took a more lightweight approach. It has a lot less features than I Python, but it has the essential ones, like syntax highlighting, smart indentation, auto-completion and suggestions when you're typing. And it has a very interesting feature called Rewind that basically lets you remove the last command from the history like it never existed. Here it is. Next, there is PT Python, a Python repl built on top of the prompt toolkit. It's slightly more advanced than be Python as it contains a bit more features. The obvious ones are the syntax highlighting, multi-line editing with smart indentation, auto-completion or shell commands. But there are some more innovative ones, like a syntax validation tool. There are a lot of features that are actually before executing it, Vim or Emacs keybindings or those nice menus for configuration or the history. And finally, there is Conj. Conj is quite different than I Python, be Python or PT Python because it's not really a Python repl, it's a shell. It's a shell that's adding Python on top of bash so you can see a lot about Conj. First one is from Anthony, creator of Conj, and second one is from Matias, actually a core developer of I Python and a user and contributor to Conj, so we can go check them out. That's all. So thank you for coming, thank you for listening and I would also like to say thank you to the creators of I Python for making such an awesome tool so if you can give him a big round of applause. Okay. Do we have a question? And again, we have time to thank you for your great talk and for providing a link to the slides so that we can rewind them at a slower pace. Thank you very much and give another round of applause for Sebastian. We'll start the next talk here in five minutes. Let's get started with our third talk for today and the last one before the lunch is from Peter. He works for one of our sponsors, Kivi, and he is giving us a lessons learned talk on the do's and don'ts of task use. Please give a warm welcome to Peter. Hi, thanks for the warm welcome and I just want to ask can you hear me at the back quite clearly? Okay, perfect. Well, thanks. So once again my name is Peter Stehlich and I work at Kivi.com and I'm specifically a Python developer and I'm here to share with you what we saw for all the money problems in Kivi. So all the money that goes through us. The outline of my talk is quite short. So first we will define and talk about what the task queue is. Then I will tell you a short story. I will show you examples versus reality. Then the final setup of how we do it and lessons learned and if we have time, we will do it. First let's define what the task queue is. I was trying to find a simple sentence how to define it. I felt miserably. So a task queue is a parallel execution of discrete tasks without blocking. We are here with Python so most what comes to your mind is usually Celery or RQ or something like this high level. But that's not usually just that's the tip of the question. So a task queue can be found in the hardware in CPUs, GPUs everywhere this low so we can generalise it quite a lot. A task queue or basically the major part for a task queues are the queue itself, the task that needs to be done, the producer of the task and the consumer of the task. You can imagine it like you're on the server. So you have the queue, you have the task to buy a banana, you have the producer and the consumer as the customer and the one who is selling the bananas. So we now know what is a task queue, so what is it for? And most usual use case is to decouple long running tasks from a synchronous call. Like your job takes a couple of seconds but the clients have the same response. So you know the result beforehand, so you will just reply that you accepted the task and you're working on it. And then, you can work on it and eventually you can, for example, call a web hook to mark it as done or send an error to the external service. In task queues we can also do a low-level cron in Linux. So we can do it in high level with more monitoring, more control, we can do it more granular and so on. Then actually in Kiwi we used it to break down software to more isolated pieces, like to break down monolithic applications to microservices or if microservice is too big, just break it down to a task and queue it and work on it. And with the coupling we can also minimize the wait time as I mentioned and the latency and response time altogether. So it all comes together and when you combine everything it will increase the throughput of your system. So this is what the task is good for. Now the story. We have this small, let's call it microservice for you. If you have watched older Italian movies you will know that Fantozzi is a series of films about an unfortunate accountant. So that's where the name came from and imagine it like that we have a REST API in the front and then we have the handler and the task and the queues. The initial design counted on quite a lot of queues. There was a Webhook library, we should have you Jason Web tokens, everything, these fancy names and fancy technologies that you usually hear about on some tutorials. Well, there were two of us developing it and we spent three weeks with working on the Fantozzi 2.0 and we were deciding what task framework we should use. Sorry, wrong slide. And we were deciding like during the whole development and during the whole development quite a dangerous sentences were set. The first one is new is always better. So, you know, we got this old piece of software, it was created two years ago so we need to rewrite it, definitely because it definitely doesn't work. We need to understand it so we will rewrite it and maybe in two years we can do it again because new is always better, right? Think outside the box. Because you just know better. So you can imagine yourself like the super duper programmer that knows everything and will do everything better. I know everything I need. So, you know, you can use the application or, you know, some best practices or whatever or, you know, just how to set up the application itself. And I said, I can do it better. That's one of the most dangerous ones because usually when you're using a framework it's done by a couple of really experienced guys and they know what they are doing sometimes. You don't have to always do better, you know. Sometimes do a bit worse and it will turn out a bit better in the end. And with that, with these sentences said there was this small three week window of two developers developing and then at the end of the three weeks suddenly realizing it doesn't work. It's like a really bad application and it won't scale. It won't be maintainable and actually the setup would be harder than with the usual one that we have in Kiwi. So, we basically lost three weeks of development time because we then decided, okay, we used RedisQs or simply RQ framework to implement it and then we changed it to Celery. Changing to Celery took us around 16 hours compared to three weeks of development time. And so we wasted effectively six weeks of present time and that's why I'm actually here to tell you why it all happened and what would be the best practices for you. The first thing why it happened was examples versus reality because in both RQ and Celery you have these beautiful examples of a simple app, just how to scaffold the app, like five lines and that's it, right? Yeah, that's easy, let's do it. Because Redis or RQ is lightweight, so let's use RQ instead of the giant Celery that handles everything for you. But in reality, we suddenly need this repeater for the task. In RQ not included, so you have to write it yourself. And then this kind of ugly mess was created to actually do a repetition without not much configuration in it. Yeah, don't try to read it, just to scare you off, you know. But surprise, surprise in Celery, it's included. So you just need to put some things in the decorator. Five lines and you're done. You have to write 50. And you have it all parameterized, it's all explained, it's all documented and you're sure it will work. But also be careful, because when we were implementing Celery and we saw the five line example of how easily it is to integrate, we ended up with over 250 changes in the whole repo, which was at that time around a thousand lines. So almost a quarter of the project was implemented Celery. So be also mindful about this. And suddenly, we have a working application that's maintainable. It's running on Celery, which we are using throughout the whole Kiwi, so we can get help anytime, anywhere from anyone, of our colleagues who are more experienced in some areas, some are less experienced in some areas so we can brainstorm together. And with this, we came in our final strike, we came to a final setup of how we actually do it, how we develop the application, how we develop them. So first, we are using, of course, Python and Postgre. On top of it, we have Flask or currently AOH TTP. Together, then we have a Connection that takes care of the REST API. Of course, we have Celery. For Broker, we are using Redis on AWS, so it's managed. We are using multiple deploy targets in our continuous integration pipeline and we are using Lux.io and Datadoc for monitoring and we are slowly shifting everything to Datadoc. And then something goes bad and really bad, we are using Celery and PagerDuty for notifying us. So that's how we do it. And that's how the Fantozi application was developed as well. I will break down all the points here so you can know a bit better. With Python, we are always trying to shift to Python 3.6. And when we are starting a new project, we are always doing it 3.6 per newer, usually 3.7 now. We are also, as I mentioned in the beginning, we are trying to break everything down from monolithic architecture to microservice architecture and using task use and asynchronous processing. With Flask and AOH TTP, these are some of the best frameworks for us because we have boilerplates for them and we can scaffold them quite quickly thanks to cookie cutter templates. On the right, you can see the example in Flask and how we basically instantiate the whole Fantozi application with all the monitoring, all the central exception catching and everything. Just a quick question, who knows what OpenAPI 3 is? Okay, not many, so I will explain a bit. With Connexion, that's like an extension or a framework, it's actually. For Flask and AOH TTP and a couple of others. And it implements the OpenAPI 3 specification. So basically when you specify a YAML schema of your API and it generates documentation and validation for your API. So you have a beautiful swagger UI useful for other developers. And you can actually test it there, you have examples, and it's generally useful. So take a note, Connexion or OpenAPI 3 is where to go. And just a side note, OpenAPI 3 is the successor of swagger specification. It was just renamed, so some low thingies. It's token based authentication and when it's needed, authorization. So we don't do JSON web tokens because they are too complicated, you just need a secret as a bearer token and you're good to go. So we follow the best practices which I will present shortly and with Redis on AWS we are using it because it's managed, it's reliable and it's easy to deploy. So we don't lose any time when something happens when for example something goes wrong, really wrong. Multiple deploy targets. We are usually deploying HTTP API, the REST API itself and together with that we are also deploying workers, periodic workers and so on and so on. And the beautiful guys from platform team created a really useful thing for us, it's called Crane to deploy to Rancher via GitLab CI and it can help you with messaging channels or relevant people when you are releasing. With Luxio and Datadoc we are using it to extensively lock everything, like when it doesn't lock it doesn't happen and with Datadoc and their newest development we are slowly moving there because we can join the tracing and locks together so we can stitch everything with the APM they provide. So that's a thing to consider as well. Sentry is when something goes wrong so an exception happens, it's locked the stack trace is locked and we can reproduce the problem itself and then something really goes wrong, we are using pager duty to wake our developers 3 a.m. for nothing because you are on call right? Lessons learned why we are all here mainly you. First thing, use Redis or AMQP broker never a database for salary you may ask why because you already have a database in your system so why not use it but it's very simple. Let me just wait for the camera so never a database because imagine that you have like 20, 50 workers in your setup and each of the workers needs to ask the database like hey are there any new tasks that I can take and the database usually replies no because you have 50 workers and then sometimes it replies yes so imagine that you have like 50 queries to the database a second and the database goes to production and it's used by millions of people and suddenly the database starts failing it starts to time out it starts to you know underperform why? Because the brokers because it's serving as a broker because it's overwhelmed by the workers you have sessions open and generally you're going to crash Redis or AMQP broker are designed for this and they are independent systems so if they crash it happens but you definitely have backups on Redis or replicas here's a small example how to set up brokers for AMQP and for salary sorry for Redis for Redis you need to install an extension for salary and then you can easily use it just install Redis and you're good to go that's easy second thing to learn pass simple objects to the task when you have for example an ORM a database model populated with data you updated it and you commit it and then you pass it to the task and you can work on it again to do the query and then you commit it again in the task I see where this might go because the object is quite complicated and it can go stale quite quickly so when you when you when you put the object or to asynchronous processing it can go stale without you knowing it and then you will create a conflict in the database and so on it's much better to pass just the primary key of the object and then query it again and have fresh new data that you can rely on with that you will avoid these kind of problems which are really hard to debug because it's basically a race condition third thing do not wait for tasks inside tasks with this I will talk a bit more about it and explain it later on when you're waiting for tasks inside tasks you are creating an endless loop if you have repetitions if you don't have a retry limit so we will end up with a stuck task that is endlessly trying to do something and is blocking everything basically so you can end up with quite a haywire in your system this comes together with the set retry limit it basically tells how many times you can retry the task and then just give up race exception and just mark it as successful and handle it yourself it's really easy it's in the decorator itself just max retry and you're good to go use auto retry 4 this is a really handy feature because you can specify an exception on which the task will be retried but again don't forget the max retry otherwise you can end up with an endless loop of a single task which is occupying one of your workers so you just define the exception that you want to be repeated and again you're good to go we are slowly building the decorator you see so it's now multi-line use retry back off through and retry jitter with back off you are specifying that the retry and the wait time between the retries will increase linearly and the retry it's still down there's a beautiful formula for that on Wikipedia but don't bother it will just prolong the periods of time for example when you have an API that you rely on and it has a 500 error you can wait one second first and then retry it again it's down, it's still down then you wait four seconds it's still down, never mind and suddenly the server is up your task is done and you're happy to go again with the retry jitter this is very useful when you have lots of the same tasks happening at the same time because when the retry is happening the jitter will add a small amount of time or subtract a little of time from the back off so the repetition of the task doesn't happen at the exactly same time but it will increase the other service for example and again retry quarks always set a limit set hard and soft time limits a soft time limit is basically telling you that you know you should end gracefully and the time limit itself is hard and it will kill without mercy and then again exception and error handling will happen just bind for a bit of extra oomph in your task basically meaning that you will get a reference to the task itself so you can lock more you can retry with contextual info actually so if you can for example this site if you have a network error you will try five more times but if you have an integrity you just will give up or you will just log it and give up because it's the fault of the data not the API itself so you can use for example logging as you can see here we log to standard out and we are using to get actually the stats for the task if it was successful or not in quite easy manner separate queues for demanding tasks the task that communicates with a very, very slow API it takes like 10 seconds to actually get a response from it and then you have a task that usually uses the super fast API that is like milliseconds to do and you have a single queue for that you can imagine that the long running tasks will starve out eventually because they will be always preferred for the shorter tasks that will happen often and they will come more often and eventually the long running tasks will go stale eventually so it's always better to separate these kind of tasks to their own queues like for example here you have a fast and slow queue this is like a generic example really it's better always to name it a bit more precisely and then with the API async you just specify the queue and you're good to go it will help you tremendously and of course when you have multiple queues always don't forget to deploy multiple workers which handle only that specific queue prefer idempotency and atomicity and because I'm a lazy developer I didn't remember the full description or definition of idempotency and atomicity so I asked good-owned Wikipedia to help me here idempotency basically means when you call one resource multiple times it will always produce the same result and atomicity means that when you call the task it will appear to the system as atomic meaning it will happen instantly and without side effects to sum it up you saw this or AMQP you can use side effects to the task don't wait for task inside tasks set retry limit use auto retry use backoffs, use jitter use time limits use bind and use separate queues and always prefer idempotency and atomicity those are the lessons learned and there are also things to consider with Celery because it's a really powerful framework so you should always go wrong there as well because with Celery you are sharing the code base between producer and consumer so you need to be really careful about circular imports the way of how the imports work and what will load when the worker is starting and what will load when the server is starting or the producer is starting you Celery to its full potential read the Celery docs but it's a nice evening read like when you have nothing to do come on let's read our workers today you don't have to read it carefully and remember everything like every param that is there just remember that something like this is there and you can use it because eventually you can use it it might come in handy so be mindful, read the docs and also always bear in mind that you are using third-party APIs and they don't have to scale as well as your application so be mindful because the developers of that third-party API might not be happy when you shoot them down that's most of my talk done thanks for listening and I would like to invite you to our today's party after Europe-Hiton and this is an invitation party so visit our booth and it's somewhere there on the left I was trying to pinpoint the location and you can definitely talk to us I will be there after the lunch so we can talk together and find out more about the party and also more info about the party as at the meetQE.com so small thing there there's a small error I'm still a Python engineer I'm not an engineering manager there so it's waiting for me thank you okay we have about three more minutes for questions so if there's any questions yes of course, flower is allowed yes definitely nice thing to have but you need to know how to use it actually so yes for monitoring for more granular monitoring I definitely recommend it but honestly personally I prefer my own monitoring where I can get alerts and everything for what exactly if you design it well you don't need it I'll talk about it later there was one more thanks for the talk you said that you migrating to IOH TTP so what's your experience when you are using IOH TTP and salary and have you investigated I think enable like a system like ARQ or some other things with IOH TTP we are still in quite early stage so we don't have a long-term with it but basically if you understand the async paradigm it's kind of okay I'd say we didn't have big problems with it yet so no lessons learned yet no expensive things to learn thank you I would like to know how you do your health checks on the salary workers you mean health checks do you do health checks we have quite a few and recently put up but the health checks are taking a lot of processing so we were wondering if you are doing it right I would like to know how you do it we don't do very good health checks we do logging and through the logs we can see what is happening and with that we are usually deploying quite often so if you are for example asking about the memory consumption and the memory leaks we don't care about them because we are restarting regularly the workers and we have like a rancher itself or any container management can be set up to restart regularly to return to a healthy state so there are health checks we have a lot of information to see whether the database is stable or the connection to the database is stable if everything is communicating properly or anything like that but we can talk about it later on stop at the booth okay that's all we have time for now thanks again to Peter thanks all for coming thank you and now I think the bag and there will be lunch at one let's start in a couple of minutes I want to welcome you to the first sessions of the afternoon so in just a couple of minutes let's wait for a few people to get in alright well let me start by introducing Mark while Mark needs no introduction you all know him and he will be talking about compressing shrinking down the Python runtime in less than 5 megabytes well let's welcome Mark so thanks for the introduction can you all hear me do I need to speak up a bit yeah it's okay good great so Pyrun this is something that I do for my day job we're not doing your Python so just to give you a little bit more background I'm what I'm actually doing all the time during the day so I have my own consulting company I'm a senior software architect and I've done lots and lots of things in the Python community you can read up all on that on my blog I don't want to go into too much detail here but jump directly into the talk so what is Pyrun Pyrun has a long history it started I think I don't really remember it started in the late 1990s I think it was 1998 ish kind of and it started with a completely different project the project was called MX CGI Python at the time I named all my tools MX something because of a naming conflict I had with the Zopecorp packages for MX Daytime and the idea there was that I wanted to use Python the typical FTP website hosters that you had at the time so in those days you couldn't just upload a script and then run it if it was Python you could do that with Perl they all supported Perl at the time but Python was not really a thing there so what I wanted to do is I wanted to get sneak Python in on the web hosters machines it was kind of like hacking an executable in there and it worked really well because I found that you can upload it into the CGI bin directory and then you could also upload a shell script which then turned your uploaded file into an executable so I thought okay let's try this let's do this and take Python make it really small upload it because FTP space at those times was expensive and it would be really easy to do so I just created a single file out of Python and then uploaded it to FTP to the FTP hoster ran the script and then I had an executable Python interpreter and I could upload my Python script to the CGI bin directory and run it and I was not alone with that wish to run Python on one of these hosters there were actually quite a few other people I'm sorry about that nah, just press here so there were quite a few contributors who then created these single file Python binaries for lots and lots of different systems at that time there was it was not like today where UNIX basically means you know you have Linux you have maybe FreeBSD and then maybe you have a UNIX system like macOS there were lots and lots of different functions systems, Solaris, HPUX all kinds of variants of that so you needed to first figure out what system was running at the FTP hoster and then you could upload the correct binary so that lasted a couple of years it then padded out early in the 2000s because then the web hosters started to support Python and then basically I dropped that project now in 2009 my company was introducing a product or wanted to produce a product which is one part of the product is a server application that's written in Python needs to run on Linux and I needed some way to ship this product to clients and so the problem that I ran into was that if I were just to use the OS based Python installation there were so many variants of that OS based Python installation that I was basically a one-man show as a company and so I needed something that had basically a stock configuration something that where I knew exactly what kind of Python to expect and so I remembered that I had this MXCGI Python project and I then revived it and then turned it into a bit more than just a single executable for Windows the solution for us was very easy we could just use Py2Xy for this so Pyron currently does not run on Windows because of this because we don't have a need there but on Unix there was no appropriate solution for that so I started to do the revive the MXCGI Python and I beefed it up a bit so I had a business requirement let's add some more business aspects to this so the business requirement for me was to create a single executable that has the complete or more or less the complete Python runtime including the standard library in a single file so that installing Python on a machine literally becomes a single copy operation and I wanted it to work on Linux and FreeBSD on macOS those were the ones that I was interested in at the time it probably does work on other systems as well now how does this work how many of you know the freeze tool in Python okay I bet you didn't know about this 5 years ago because or maybe let's say 10 years ago Python 3 is a bit older now so the freeze tool what it does is it takes Python modules it compiles them to bytecode and then stores the bytecode in a C struct or array and then it puts everything into a C files it compiles everything as Python C extensions and then puts that into a module file that you can then link and so this is how you can get Python code into an executable or a library and this tool has been around for ages it was written by Guido himself later Mark Hammond extended it to also work on Windows I don't exactly know why Guido wrote this but he probably had some use case for it nowadays it's used for the import lip because for the import lip you have this bootstrap problem that you first need to if you want to run Python you first need to get the code from somewhere the Python code and import lip is written in Python so the issue is that if you want to import something the import has to use import lip so you need to somehow figure out how to do this and the way it works is that a small let's say a core part of the import lip is actually frozen into Python as well and this is why it started to be used again when I started to write pyrun the freeze tool was not maintained anymore so I had to do some fixes to make it work again so how does it work essentially I wanted to take the standard library which is mostly Python modules it's also a few C modules so you had to do two things one was to get all the C modules the extensions that are being built in the standard build process of Python to not be compiled as shared libraries but instead as static libraries so that you can link them directly into the interpreter and then the second step was taking all the Python modules that you have in the standard library and then convert them to C extensions as well and also link them statically so you get everything into a single file now of course you can do this for a single application it's a bit tedious to always redo everything for every single application that you want to run or let's say you want to do a release cycle in your product and you always have to run all these things again which I didn't really like so what I decided to do is to just take the standard Python turn that into a single executable and then wanted to ship the application code the Python code as a zip module so that you essentially get two files the pyrun and the executable and then the zip file with the packages the Python code now that was relatively easy to do but then I wanted a little bit more because I thought that well we're almost there we always have almost have something which is more or less identical to Python and it's tiny and it would be really nice to use it pretty much everywhere instead of virtual ends for example you can just you know you copy your pyrun and you're done you don't have to have a separate installation for a virtual end so I thought I'd add the Python command line as well now the problem is that the way pyrun works it cannot use the C command line parsing that we have in Python but instead it has to use Python code for this so I had to rewrite most of the command line parsing that's being done in Python in Python and then you know again do the same thing wrap everything put it through freeze and then put it into the executable so I managed to do that okay that's very nice I managed to do that it is a bit slower of course because it's Python running to do the command line parsing but there's a tradeoff there because when importing things from the C extensions that freeze builds the import is a lot faster than going to the file system because it doesn't have to the file system file I always always very slow if you load everything into RAM it's much faster so I could make it a little bit slower then gain all the flexibility and then have it work and it even supports interactive use now so you can actually start it and it comes up with a command prompt again so you can use it just as regular Python so this is essentially where I am now I have a pyrun it's open source it's a free and you can just drop in for a standard Python runtime it doesn't use hundreds of megabytes in the file system you don't have to install it anywhere it runs on or let's say it works with Python 2.7 3.6 and 3.7 now it also supports lots of older versions so 2.4 is the oldest version that I still support not in the current version of pyrun but in previous ones I wanted to have it run on the executable size is between 3.7 megabytes for Python 2.7 and 4.8 megabytes of course I'm cheating a bit I'm using UPX compressed for this but anyway it still works the startup time is a bit slower so this is what it looks like and because I wanted to not only talk but only show some stuff this is the project let's go here for example for 2.7 in case anyone is still interested in that so here you go 3.7 can you read that should I make it bigger no it's okay good so just to demonstrate how this works let me just do this and you can actually see it so this is the UPX version where is it? 2.7 UPX this is what you get when you run it and it works in standard way you can import stuff you can do all kinds of things basically it's a standard Python you can also run pip with it so I can do like this actually let me see whether it's already installed it is already installed so I can install setup tools and I can then also run pip the pip will then use this is a bit annoying I can then install something let's say let's say this one okay install something and then I can go here and I can then run this this is the game of life in Python so that was 2.7 let's go to 3.7 3.7 is a little bit so you can see here down here 4.7 megabytes the original one the uncompressed one is 14.4 megabytes which also is not that big but it's amazing that you can actually compress it down to that size and it works in the same way as the 2.7 one I just showed so let's go back here by the way if you have questions please feel free to ask so we I can just answer them right away these are some use cases of Ajax Pyron I'm pretty sure that there are lots more these are mostly the ones that I came up with I know there are other projects that try to do similar things and they are better in marketing than I am so they come up with more use cases but the one set that we really care about is that we are independent of the OS Python installation the most important one we want something small to easily ship to clients we want to easily make it available as a download without having too much bandwidth use it's extremely good for Docker containers because it's so small you can just easily put it into a container image and then you the loading image is very fast it's much faster than having a regular Python installation there and it's very easy to build single app applications out of it I'm going to go into that detail some more because what I've added is integrated zip file support for Pyron and the way it works is very simple you create your Python application very important is you have to add a Dunder main module how many of you know what Dunder main does some of them okay the way it works is when you have a zip module and you have Python run that zip module then if it finds a Dunder main module at the top level of that zip file it will execute that so it works basically like what you have with the typical the main execution part that you put into Python scripts and so if you have an application like that if you have a script then you just add this Dunder main module to your zip file you then can concatenate Pyron that zip file you produce a new file let's say hello you make it executable and you're done so you don't need compilation anymore so this basically made my day because I did not have to send zip files around anymore for like say application updates I could just create a new executable and send that around and just so to show you how that works so I prepared a little something here so you can see there there's the main module it's just a very simple hello world right and then you have the I put that into a hello zip file actually I don't want to unzip it I just want to read it forget it it's let me just do that again so I put the Dunder main in there right and then I can concatenate the two so I can choose the slightly bigger one the 40 megabyte one which is faster to load or I can use the upx one which is smaller to load so let's use the upx one and I simply put the hello zip behind that and I create this one hello and then upx and you make it executable you run it there you go so as you can see now thanks so it's like magic huh you turn 100 megabytes into 4.7 megabytes yeah so it's really really easy to create these single file apps now basically these three steps or two steps that I have here I'm probably going to turn up to something as well you can just run that okay so customizing pyrun of course this was the easy way to do things you just get the pyrun for your platform maybe you have to compile it yourself normally we provide binaries for these we haven't done that in a while because the build from that we had basically crashed and I haven't had time to fix it and then get it running again so the nice thing about this is that you don't actually have to be a core developer to do this it's not really if you know the right places to fix and the right places to tweak then it's not that hard so of course you should know a bit about how to freeze works we I put this I put a special file in here this pyrun under extras.py in that file you just import whatever packages you want to have and then freeze will automatically find them for you and then integrate them into the package so it's very easy to add new modules it's a bit harder to exclude modules so let's say you have a Python package that has test modules and you don't want to ship the test modules together with your product then you typically want to exclude that and in order to exclude things you actually have to go into the make file and then into the excludes variable and put your particular whatever you want to exclude let's say your test sub package you exclude that so it doesn't get integrated into the package and then the next thing is if the next step let's say is if you want to add custom C extensions that you have or maybe you have dependencies that have C extensions for those of course you have to tell that you have to compile to add those to the executable that comes out in order to that you you have to use the module setup how many of you know or have ever edited this file in the Python installation when compiling it extremely few okay so here's some fun so let's say what's the name again so this is the standard standard distribution of Python it has a few patches because I need to do a few tweaks for PyRUN but not really that much and then the way that Python determines whether to compile any of these modules that you see here into the executable that you're building or into a shared module it looks into the setup file which looks like this this is an extremely old file it still references the makefire pre logic that Python used in the very early days to build C extensions so in the very early days you did not have anything like distutils you had to basically do everything yourself you used this makefire pre in concept and then you added your configuration into one of these setup files and this would then make Python compile your extension into an executable or a shared library and then you could put it into your package so there's lots of description here you know you have it looks a lot like a makefire then you come down to this section this section down here tells Python which modules to actually integrate into the into the executable and then there is let me see where they can find it so all these modules that you see here those are statically compiled and put into the right here it is statically compiled into Python and everything that comes below this shared indicator in here this will then be compiled into a shared library and as you can see here I commented out that shared so everything that comes below is still going to be compiled into the executable because that's what I wanted and as you can see here these are just you know C modules that Python signup lip uses and then I had to make a few fixes because I wanted to add some of the modules that typically don't get added as a static version into Python this file is not being let's say it's not well maintained anymore because normally nowadays everything gets compiled as a shared library and so some parts are missing like the various char modules here for example they were not in that file I had to add them some things are also removed for example tkinter I don't use so I did not put that in and then if you want to add other stuff then you can just go down here and just append it so the way it works is you just have to tell Python where the C code is whether you need any parameters like for example here these are the things that you have to do for SQLite and then you just add it to the setup file and essentially you just let the make run and everything works out by itself so that was a short tour through the setup file I don't think I have time to actually show the compilation but let me just maybe go through the changes that I had to make to go from 3.6 to 3.7 and this is interesting I don't know is Victor here no Victor's dinner he made some changes between those two releases and because the import logic in Python sometimes changes from release to release or there are new ways of integrating into the Python build process of how to configure certain things you always have to touch the code base a bit there was a lot to do from 3.5 to 3.6 the path from 3.6 to 3.7 was basically just a few hours work this is just to give you an idea of how you import PyRUN to new Python versions it's actually quite easy you just take what's there already for the existing Python version you have to tweak the patches a bit you have to then adapt your setup a bit because new models get added of course others may need some tweaks in terms of definitions that you have to add and then basically things just work and then the freeze tool itself also sometimes needs some fixes because what I did like I said I had to tweak freeze Python again and what I did is I basically copied the freeze tool into PyRUN into the source code package so that I can apply changes to that as well and I had to make some fixes there as well and that was done and everything worked so this is where you can get PyRUN like I said the released source and binary versions are a bit older the release ones only support 3.5 as the latest version and 2.7 what I will do is I will put the current version that I have already working I will put that up on GitHub so you can download the sources and then compile it yourself compiling it yourself is pretty easy the package comes with the make file it comes with some documentation you just have to run make file it comes with some documentation and then basically you are done everything should then be done for you and then you can just pick up the release package of PyRUN from the distribution directory and then you can use that that's it thank you for your attention we do have time for a couple of questions there is a microphone there you can just ask away is it possible to use PyRUN to embed Python in binaries or if not how much work would that be to create a statically linked library to be used by executables you want to take PyRUN and put it into some other executable I would like to have a library that I can then use to embed Python for example so that the library doesn't have any external dependencies that would require some work but it's possible yes, definitely so essentially what you have to do is you have to change the way that the main the main function works in PyRUN so you have to remove that and turn it into a library but I suppose you could do the same trick and let's say or LibPyRUN and then get everything integrated you probably have to define some entry points there for the library some new ones to get everything working but it should be possible yes time for one more question anybody but I have a question but let's see if there is a question there from the audience can you say thank you thanks for the talk I'm just wondering with this PyRUN it looks great but I'm just wondering are there any limitations some Python features that you cannot use are there any downsides or is it a silver bullet for all the problems there's one downside with this whole approach PyRUN for example the Python test suite does not fully run which is that some packages they put extra files let's say text files or symbol files or whatever into the packages themselves and because of the way those packages are written they try to actually access the file system to get to that particular extra file that they have put into that package and of course PyRUN doesn't have a file system so the files are not available I had that issue with PyRUN because for example the Python grammar is one of those files it gets put into a special file that it gets installed in the file system and in order to make that file available I had to basically take the file and in the process of building PyRUN I had to integrate it into PyRUN as well and then write some extra code to make it available via the standard inside PyRUN so I had to do some of those tweaks I only did those tweaks for things that Python itself needed if you have something like that then you would have to do those tweaks yourself but that's actually pretty much the only limitation I know of I could imagine that some packages that have external C extensions that they might be hard to compile in a static version because sometimes the way that those shared libraries are built is very complex and if you want to turn everything into a static library then you can run into problems but it's just a matter of effort you can get this working pretty much in any case all right let's thank Marc again thanks so if the next speaker Tom is in the room all right welcome to the second talk of the afternoon folks okay we have Tom here and he's going to tell us about how he wrote a Python auto-reloader right join me in welcoming Tom can you guys all hear me we'll work him so yeah my name's Tom I'm going to talk today about writing an auto-reloader in Python so I've broken the talk down into four sections we're going to talk about what an auto-reloader is we're going to talk about Django's implementation we're going to talk about how I rebuilt it and we're going to talk about the aftermath of what happened after I rebuilt it sounds good so firstly what is an auto-reloader like all good programmers I googled this and nothing came up which surprised me there was no definition of an auto-reloader but it's a common development term so I wrote this definition which sounds sufficiently technical and vague so it's a component in a larger system that detects and applies changes to source code without developer interaction so raise your hands here if you use an auto-reloader in your day to day life in some kind of framework so yeah pretty much everyone right raise your hands if you could write one or you know in detail how it works so so this is why I find them interesting they're really common every developer or most developers use them they're a critical part of frameworks like Django if the auto-reloader doesn't work as we'll find out later it's kind of a big deal even though they're not a production thing they're not really well understood and they're really language specific an auto-reloader in Python is very different from auto-reloader in JavaScript so as an example of an auto-reloader a really simple one would be automatically you use a tab every time you change a HTML file or a JavaScript file that's an auto-reloader so a special case of an auto-reloader is a hot reloader and this is the holy grail of auto-reloaders because they're really fast and really efficient so it reloads the changes to your code without restarting the system so a really simple example of this is changing the style sheet on a web page this is kind of hot reloading the browser can take the changes to the style sheet and apply the new styles to the page without refreshing the tab you can hot reload CSS and these are impossible to write in safely in Python in the general case and I'll tell you why and a special shout out to Erlang where you hot reload code while deploying that's how you deploy code in Erlang you hot reload it in production I wouldn't suggest doing that in Python so you might say Tom, Python has reload isn't that a hot reloader isn't this implementation hot reloading a module so reload does nothing but re-import the module all it does is you give it a module and it just re-imports it so yes this is technically hot reloading a single module but you need a lot more before this is a hot reloader I don't know how well that translates into other languages in English but what I mean is reloading a single module is very different than hot reloading an entire system or components within an entire system and the reason for this are the enemy of a hot reloader and Python modules have lots of interdependencies so all hot reloaders are one thing in common they all leverage language or framework features that manage dependencies between things so in Erlang, the example, everything uses message passing so if you want to hot reload a component in an Erlang system, you can just bring it down and you can bring it up again there's no dependencies between things the dependency is message passing which is quite easy to reload on CSS, it's not really a programming language so you can just take it down remove the style sheet from the page and add a new one and the browser takes care of the rest React.js has a hot reloader and it leverages how React components work themselves so React is all about removing components from a page and adding them again and having React take care of laying out the page for you or rendering the HTML so hot reloading a component in React is just deleting the component and adding a new one to React because it's how it works so imagine that you could write a hot reloader in Python so it's a little bit wordy you import a function inside your module so you have a module, your module.py from another module import some function so you have a reference to that function in your module you then replace the code in some function with some new code so you've rewritten it you fixed a bug or something after your hot reloader kicks in you've got some function reference if it references the old code then your hot reloader hasn't worked properly so it's not right so you could go through and find all modules or all references all places that reference the some function function you could then hot reload those as well and you could cascade and then find all the modules that reference the module that references that one and you can go through the whole tree of objects this is it just sounds complicated it sounds really complicated and it's really impossible to do in the general case for any given Python program it's impossible to do that safely so for limited smaller cases it may work for example IPython has a hot reloader that works in a lot of cases but it leverages how IPython is just a shell so you don't hot reload an entire program you kind of hot reload parts of the REPL that you're using and similarly if you have a single reference to something then you can hot reload that safely you can use reload if you have one reference to one module you can call reload and you can replace the reference that's hot reloading that works but to do it in the general case you will end up with bugs and what's worse than having an auto reloader that doesn't work is an auto reloader that you can't trust so if you end up with some bugs in development hard to track down ones and it's because the hot reloader hasn't worked properly that's a terrible development experience you're going to be spending time chasing bugs that don't exist so how do we reload code in Python we turn it off and on again we restart the process on every code change over and over again so this is kind of like refreshing the browser window every time you make a change to a JavaScript file you lose all the state in the process so you lose any connections that are open and it starts again from fresh this ensures that the system or the program is right it works pretty much rather than a hot reloader where you might have some kind of bugs or you can't reload code properly so this is how Django the Django auto reloader works so when you run manage.py run server Django reexecutes manage.py run server again with a specific environment variable set the char process actually runs Django so it runs the entire framework it imports all your modules and does all the stuff that you want it to do and it watches for any file changes when a change is detected it exits with exit code 3 and the parent Django process restarts it if it exits with another code it's an unexpected error and it terminates or it shows you a message that's useful so it's quite a simple loop you have a process that's kind of a supervisor and it will restart the char process when it exits and this is the most common in the Django auto reloader so a little bit of the history of the Django auto reloader the first commit was in 2005 no major changes until 2013 when Inotify support was added KQ support was added in 2013 and it was removed one month later which is never a good sign I'll talk about what Inotify and KQ are later on but the point here is Django code is definitely is usually very high quality and there's lots of emphasis on testing and readability the auto reloader start to me is definitely part of Django the code was very different and purely because it was something that wasn't well understood it kind of worked don't touch it, leave it alone the code was definitely not idiomatic and it was very hard to extend and it was a PEND only code everyone seeing this you kind of just chuck features on you bolt it on and you hope it works so there were some new features that we wanted to add to the auto reloader that just wouldn't have worked with the current implementation so we needed to rewrite it so there's something so far an auto reloader is a common development tool hot reloaders are really hard to write in Python Python auto reloaders restart the process on code changes and the Django auto reloader was old and hard to extend so to the fun part we're going to rebuild the auto reloader so I like breaking things down into sections so there's three or four steps first one is we need to find files to monitor we can't reload on code changes if we don't know what code we're changing to watch for we need to wait for the changes and we need to trigger a reload we need to make it testable of course especially if you're refactoring an old implementation and bonus points make it efficient you shouldn't prematurely optimize stuff so get it working and then optimize things cool so finally files to monitor everyone here knows this modules it contains all the modules that are currently loaded by Python Python has quite a few modules Python has 642 modules loaded Python itself just importing sys and printing the LEN sys modules has 42 modules loaded so there's quite a few modules and sometimes things that are not modules end up in sys modules sys modules is effectively a dictionary and it can be modified by arbitrary Python code so some libraries do some crazy things especially in development for example the typing.io isn't a module even though it's in sys modules and this was actually a bug in the Django autorealed implementation I naively assumed that things in sys modules are modules which isn't true and Python imports are really dynamic as well it's one of the most flexible and best parts of Python you can import from zip files you can import from pyc files you can write arbitrary loaders in Python to do random things on imports so this guy here wrote a 60-liner code which is a importer that imports code directly from GitHub so you can do from github underscore com dot whoever dot username import project and it will import that code download the code from GitHub install it or make it available to Python and it's there don't do this in production but there's a lot of magic that can go into imports they're not as simple as a file in the file system and a module in memory the more common use cases for these kind of loaders it rewrites the bytecode of your test files so it changes the assert keywords that you use into a function call that PyTest can do things with Cython as well which is a library for letting you write C extension modules in a nicer syntax than C it can compile the module on import which is quite handy in development I guess so yeah there isn't always a mapping between a module and an actual unique file you could have two modules with the same file etc so what can you do? what can you do if someone wants to import code directly from github in development you can't really do anything the point here is the imports are very dynamic and not all changes can be detected so we can try our best to detect them and this is a really simple implementation of something to list all the files that are installed or modules that are loaded so each module has a spec attribute and that module, that object has an origin which is the path to the location which can be a zip file etc etc all of these co-samples are really simplistic so the actual implementation in Django is over 40 lines long it wouldn't fit, it actually was going to include a slide with it on but it just didn't work it was too big but this is conceptually what you want to do we want to iterate over CIS modules and we want to return a list of all of the file paths we want to monitor so we found the files we want to monitor we want to watch for changes and trigger a reload so all file systems report the last modification time of a file so there's a function OS stat, you can give it a file path and it returns a structure one of the fields in the structure is the M time which is the last modification time of the file and we can use this to detect changes to a file and the important thing to know here is that the modification time is pretty abstract it can mean different things on different platforms and operating systems so file systems can be weird HFS which was the default file system on macOS before the latest version had a one second time resolution so there was no nanoseconds in the previous slide that's the timestamp including nanoseconds HFS would just be to the second Windows has 100 millisecond intervals so files may appear in the future Linux, it depends on your hardware clock so the current time in the Linux kernel is cached in memory and it's updated by some kind of clock every 10 milliseconds normally Python does a great job about abstracting operating system specifics away but you really can't escape from the realities of the file system that you're running on a case in point macOS has a case and sensitive file system by default which isn't something that you can abstract away so there could be different system calls or different ways that you find in the last modification time of a file on different platforms Python can abstract that away what the actual modification time means you can't abstract away network file systems can be even weirder and they mess things up completely OS stat is generally really fast except if it's on a network file system that could require a network access so if you're for some reason developing a system on a network file system if you're in a network world for whatever reason you want to do that the stack could have a huge latency clocks might be able to sync as well if you have two developers working on it one clock might be completely wrong one clock might be right so you end up with one developer writing a file the other developer reads the file or the other to reload it kicks in and the times are different the times are one year in the future one year in the past if you change the last modification time of a file arbitrarily it doesn't mean that the file has been modified and the M time not changing doesn't mean the file hasn't been modified so the reason we use this despite all these limitations is it's really easy to implement it's generally efficient unless you're running on a really weird network file system and it's a pretty good cross-platform support so here's a really simple implementation of an auto-reloader that uses stat a function called watch files we have a dictionary that maps the file paths that we've seen to the modification time as reported by the file system we have a wild true loop and we go through and iterate through each of the files returned from the previous function that we wrote we call os stat on the path and we get the modification time and we get the previous modification time and if they differ then we exit with exit code 3 otherwise we sleep for one second okay so really simple obviously there's a lot more to this if the file doesn't exist if it's been deleted etc etc this again is a simplistic implementation so we found faster monitor we can watch for changes so how do we make it testable so when I was researching this talk I went through and looked at a bunch of our projects that use an auto-reloader it surprised me there are not many tests for auto-reloaders in the wider ecosystem so the tornado project has two three and pyramid has six most of these are high level integration tests they're like spawner process touch a file the process exits with exit code 3 the point here is not to shame these projects into saying oh it sucks they don't have any tests the point here is that it's a hard thing to test usually obviously these auto-reloaders work very well and more tests doesn't always mean that it works but it's a hard thing to test and the reason is an auto-reloader it's an infinite loop that runs in threads and relies on a big ball of excel state which is the file system and each of these things is hard to test by themselves but they're even harder when you combine them together so how do we make things testable and this isn't some crazy idea that I've had it's just to use generators so if we make our auto-reloader implementation a generator the only modification we do is add a parameter telling the function how long to sleep for and we yield after each iteration of the loop and it lets you write slightly better tests so this is a simple test we create a reloader, creates the generator we call next on it which ticks, so it has one tick of the loop then it hits the yield and it returns to this test we fiddle with a file somehow we mutate the state of the disk and we call next again and it should exit with exit code 3 so this is, we have a way to pause the auto-reloader essentially and it allows us to make changes to the file system and then resume it so you can extend this test to work with symbolic links permission errors, files being intermittently available etc etc so we've made it a little bit more testable and how do we make it efficient so surprisingly there were two slow parts to the auto-reloader in Django, the first one was iterating the modules which surprised me and the second one is checking for file system modifications on an SSD it's really quite fast iterating the modules every second was the slowest part especially you have a really large Django app with maybe 5,000 modules loaded so how do we make it efficient we can just use LRU cache so we have a function the one we wrote before, get files to watch we call another function with the frozen set of all of the modules that we have currently loaded that function, the sys modules files takes the modules and it has an LRU cache on it and it returns the same implementation that we had before so in reality, like sys modules can change, but after an app is booted it doesn't really change that much you might import something in a function so it can mutate, but in the happy path it doesn't, so you can just cache the results of all of this and you can skip all the processing of checking if it's a zip file resolving sim links etc it can all just be cached into a single list and then iterate through them in the Django implementation on this MacBook with a solid state drive it took up 30% of the time of the each auto-related tick which was quite a lot of time so can we skip the standard library raise your hands here, has anyone during a debugging experience or process edited a system library file standard library file so it happens, but not very many people maybe a specific type of developer would in the general case no one really does that the average developer won't need to so it would be quite good if we could just skip watching them, we could just skip all of the system packages, all of the standard library we don't need to re-watch them, they don't really change this is actually a lot harder than it sounds, because how do we know where the standard library is I googled it I got to a StackOverflow answer and I was like okay, good, this is going to be simple there were 20 answers and each of them were different, which is never a good thing so the first one was this get site packages, that's cool it's not available in a virtual environment so that's no good we can call this function that works, but it returns a single path some Linux distributions have more than one site packages directory so I went to IRC and I asked and I was like okay, I feel like I'm pretty experienced with Python, I've never needed to do this before why is it so hard, like am I missing something someone linked me to a project I think it was related to coverage I couldn't find the code snippet for this but it used five or six different ways to try and detect the standard library and it fell back to checking whether site packages is in the path of the file so at this point it boils down to risk versus reward it might not be safe to do this in all cases what happens if your project is called site packages for whatever reason and if you make a mistake then it's going to frustrate users the autoreloader won't work in all cases and that's just not a nice place to be in so whatever autoreloader I could find does this so other games could be huge you could reduce the number of packages or modules you're searching for by 70, 80% it's not safe to do in the general case but it doesn't get done but what you can do is use fast system notifications so calling stat repeatedly is kind of wasteful, you're just asking are we nearly there yet, are we nearly there yet it'd be nice if the operating system can tell you or the operating system can tell you when a file is modified and then you just wait and the operating system will tell you so each platform has different ways of handling this Watchdog is a Python library and it implements five different ways and it's 3,000 lines of code and all file system notifications on operating systems are directory based whereas we care about files which makes it a little bit harder because you get notifications for any file in a directory which has changed and you need to filter them out notifiers are also potentially expensive they're generally designed for longer term monitoring they're designed for a demon that's watching a bunch of files and it performs an action when a change is made in our flow we're going to create and destroy them quickly every time a Python process shuts down Django restarts it, it has to create a new watch thing with a kernel and it's going to use more resources than it should so this is the actual feature that we wanted to add to the Django water reloader it was using a system called Watchman from Facebook so Watchman is a demon that runs on your machine and it handles all of the icky differences between platforms for you you register watches with it it does the right thing and it returns changes to you over a socket and it handles git changes which is one of the reasons you want to add Watchman in the first place if you check out a new branch in git you're going to have hundreds of notifications flying at your process everything's been changed but with this it will wait until all of the checkout has finished and then it will send one single bulk update telling you that the process is finished otherwise what might happen is your process receives one file has been changed mid checkout is going to restart and it's going to be in an inconsistent state if the checkout is still happening after the Django process has restarted and the demon can be shared with our projects so if you have like a JavaScript project that also uses Watchman, quite a few of them do they can share the watches so this is how we do it in pseudo code with Watchman we connect to some kind of Watchman server we tell it what files to watch and in the while true loop we just tell it to wait this waits on a socket for a message from the Watchman demon and if there are any changes we exit with exit code 3 and this way we don't write any platform specific code and we don't have any issues regarding weird OSX versions that don't use a particular library or something like that cool, so we made it efficient as well so the after month the code was much more modern and it's easy to extend it was faster and it can use Watchman if available, there were 72 tests this is in Django and it's no longer a dark corner of Django I might be a little bit biased in saying that seeing as I wrote it but it was certainly in my opinion a little bit better so it's all good, I'm a genius work first time, tests for green, ship it etc everyone's happy these are all issues from the Django Ticket Tracker after we released the Inversion 2.2 the new Autoreloader there were quite a few unfortunately so more tests doesn't always mean that it works so this is my favourite issue and the issue is it doesn't work on Windows essentially, without using Watchman so it doesn't work intermittently so I want to highlight this because this is a great example of how you can make what seems to be a really simple automation that makes sense and have it completely backfire in a way that you don't know why so in the Django implementation that I discussed before we might not be watching for a file that doesn't exist yet some files, Python files in Django if they are there that's a change so for example the models.py if you were to create a directory with a models.py and add that file the stat reloader would the first tick, the first time it attacks the file is there it doesn't pick that up as a modification because it's the first time it's seen it only the second modification where it can compare the modification time of the previous time to the current time does it reload so I was like okay, that's the corner case need to fix that so we store the last time of a loop and if the previous end time I've done this before and the modification time of the file is greater than the time of the last loop then we reload okay this doesn't work on Windows 25% of the time and I could never work out why so you would restart the process would restart and it just wouldn't work but then you would restart it manually and it would work and on all other platforms it worked fine if you know Windows and you want to tell me why this is please, because it keeps me up at night but the point here is you get all kinds of strange behavior across different operating systems across different disks, different configurations and simple optimizations can bite you so keep it simple if you're writing your own and keep it really simple in conclusion don't write your own autoreloader use this library this is the library from Pylons called HAPA and this is a fantastic library in the abstract of this talk you may have seen I was going to present a library that I wrote myself that took all of this knowledge and distilled it into a library this is that library that someone else has written probably better than I could so check it out if you are writing your own framework and you want to add an autoreloader it's really good cool, I'd just like to thank Onfido, who's the company I work for they're paying for me to come here and give this talk we are in the business of identity verification it's a really interesting problem space from the theoretical like what is an identity to the more interesting how do you handle millions of identity checks as fast as possible with as little forward as possible so if you're interested in any of this Onfido specifically, come talk to me afterwards send me an email or check our careers page any questions we have time for a couple of questions does your autoreloader handle it properly if editors do weird stuff when saving files like creating a copy first and then we name it can you save it again, I'm sorry many editors nowadays do like save-saving so it doesn't overwrite the files but rather create a new one so Watchman handles that for you quite nicely as well so it looks at common patterns where you create a separate file and then you do an atomic move the stat reloader handles that as well because it doesn't watch for the new file the .new file which is then moved so as far as it knows the individual path has been changed so I'm going to go back to the other one okay, thank you hi if restarting the process isn't really an option let's say you have plugins for an application and you can still kind of control how the code in the plugin looks because you're defining the API would you say reloading without restarting is possible or just don't so the exclusion to a hot reloader is a plugin system where you have access to that plugin or you control the plugin and you can safely or you know that you can safely delete the reference and re-import it it doesn't always work if you have for example a C module extension module that plugin relies on it might have some initialization code you can't really safely hot reload those at all so it depends, you can write a hot reloader in some specific cases a plugin one in general but it's safer if you find into weird issues to just restart the process so a good implementation might be both, if you can detect a change if you can somehow diff the changes and work out what needs to be updated you could hot reload simple changes and then fall back to a restart if needed thank you one super quick question there you go I tried Watchmen years ago when it just came out, is the API better now or easier to use? the API is better but it's still a little bit harder than I would have liked to use the simplistic code that I showed where you register a file is nothing like you need to do you need to work out, it's directory based so you need to work out the set of files which common directories do you want to watch minimizing the amount of directories that you do watch it doesn't take care of any of that for you but in general it's quite nice to say watch this and you just get notifications on a socket and it provides utilities for filtering out specific files regular expressions on the files etc in a way that's quite cross-platform and takes a lot of code off you but it's definitely more complicated than I would have liked alright, let's thank Tom again and I think it's coffee time welcome to the second half of the afternoon talks okay we're almost ready to start okay, we have Mark here and without a kilt sorry? without a kilt oh yeah, I know and he's going to tell us about how to publish this is difficult, publish a perfect python package on pypi let's welcome Mark thanks very much this is a slight joke though I started writing a relatively normal title and then I realized it had a lot of peas in it so I thought I would add a few more just to make it difficult for the session chairs to pronounce so, sorry okay, so there's a lot to get through so I will hurry on quite quickly but in a moment I'm going to show you how to build perfect python packages from scratch I won't be taking questions because honestly this is like a tight squeeze for 30 minutes anyway but I still have to talk about something first more important than python packaging and that's me so my name is not gg2k but this is my handle more or less everywhere online feel free to follow me on twitter I tweet about python and brexit occasionally angrily so yeah, my real name is Mark Smith I'm a developer advocate for Nexmo one of the conference sponsors I would be remiss if I didn't at least briefly talk about Nexmo we are a software as a service company that deals with best APIs that allow you to do telecommunications and various other forms of communication including video streaming online to mobile devices and web apps if that sounds interesting come and talk to us at the Nexmo booth or send me a tweet or come and talk to me around the conference I'm happy to talk about that another time so, now I've talked about Nexmo let me talk about JavaScript so, in March 2016 a developer removed a library called called LeftPad from NPM, the NodeJS equivalent of PyPI. It's a big web service, holds packages, when you need to pull down those packages and install them into your JavaScript application, you run NPM and it downloads them and installs them so that you can access them from your program. LeftPad, so it broke lots of packages that depended on LeftPad. It turns out that quite a lot of packages on NPM depended on LeftPad. In fact, it had been downloaded 2,486,696 times in the month before it was removed from the NPM repository. It was just one function and it was 11 lines of code that padded a string to a certain length by adding characters to the start of that string. And lots of people thought that this made the JavaScript community look kind of silly. Why would anybody publish a package that only consisted of 11 lines of code? Why would anybody use a package that only consisted of 11 lines of code? I don't think it made the JavaScript community look silly. I think, actually, it made the JavaScript community look pretty awesome. Obviously, there's some problems with the fact that people can remove a package that everybody's depended on and kind of break the entire JavaScript community's software, but in general, why was this package published? So if you don't agree that this made the JavaScript community look awesome, feel free to fight me on Twitter. So the LeftPad is not a problem. LeftPad was a solution. So because the JavaScript standard library is relatively small and doesn't contain lots of useful functions, say the Python standard library does contain, like the ability to LeftPad a string with some characters, a developer wrote a solution for it himself. And because NPM makes it easy to publish small items of code, he did. And he made that code available to other developers so that they then didn't have to reinvent the wheel. And that means that people can bug fix in one location, they can submit PRs to improve these 11 lines of code, and then ultimately people can, well, ideally, depend upon it. Now, the alternative to this is copy and paste. Copy and paste, I think you can agree, is not how you should share your code. Because it's really easy to share with NPM, people do, and even with really small libraries. And I don't think that's really the case with Python, because people find Python packaging slightly fiddly. There are some slightly odd things which we'll go through about pushing your first Python package to PyPI. So I made it difficult for myself to say some of these sentences as well. So people are a bit afraid of setup, Py. There is good documentation out there, lots of it conflicts with each other. But there is, I think, a growing movement to bring some of this stuff together and make it easier to find current best practices. Hopefully this talk will help a little bit. So what I really want everybody in this room to feel at the end of this talk is confident to publish packages to PyPI. Relatively small packages, maybe ideally pure Python packages, which is what the example is going to be. So I would like you all to make a package. What we're going to do in this talk is I am going to show you how to make a package. So the first thing, the assumption is that you already have some code, which is general enough or close to general enough that you think it would be useful for other people. So let's take some useful general purpose code, like a function that prints hello world. Now who in this room hasn't written this program in some form? Exactly. Now that is totally wasted time. Everybody in this room has written this program. Wouldn't it be so much better if you could pip install it and call that code from somewhere else? So that's what we're going to do. Also note there's an F string here. F strings are awesome. You should use them. So the idea is you've written some code that you're proud of. You would like to share it with the world. So the first thing to do is extract it from your code base so that it is independent of that code base. That's your problem. I'm not going to show you how to do that. In this case, we're going to extract this code into a file called hello world.py as a Python module. Next, we're going to put that module in a source directory. And I will explain towards the end of the talk why we did that. For the moment, just assume that Hinnick is excitedly clapping for the second time today. Awkward. So I'll explain why later, but for the moment, just take this as correct practice. And then in the same directory as the source directory, the container source directory, we are going to create our setup py file. You can open that up in your favorite text editor or Python IDE. And we will enter something like this into the file. So the first thing to note here is that we're importing from setup tools not to distutils. Distutils, you will still find documentation online that recommends importing from distutils. Don't do that. It's not that powerful compared to setup tools. PIP already is distributed with setup tools. So if you're installing packages with PIP, you have setup tools. It's not a third-party dependency anymore. And then we have this function underneath it. We call setup that we've imported from the setup tools module. It's a bit weird. For now, just don't think of it as a function call. Think of it as configuration. Each of those parameters is essentially a line of configuration that you are giving to PIP to tell how to install your package. So this is pretty much the bare minimum setup information you need to provide. So we start with a name. Name is what you PIP install. So this is the name on PIP that it will be uploaded under. So people will PIP install it. It doesn't have to be the name of the Python code that people will import. It's a separate thing. Usually they're the same. Sometimes they're different. We need to pick a version number. Here I've just started with 001. 00x version numbers imply that it's unstable. There is a good chance that the first few times you upload this to PIP, there will be a minor packaging mistake. So this is a good stage to start to upload packages to PIP while you've still got this unstable version number. So you're not worried about people seeing instability. It's not actually your code base. It's actually your packaging configuration. Then we have a description. This is usually a one-liner at this point. Say hello. It's not a very useful description. But we'll leave it at that. Then we have pi modules, which is a list of the actual Python code modules. So we have a file called hello world.py. So in here we're saying this is the code that we want to distribute. So that's what people import, not what they PIP install. And then finally, again, this is a kind of cargo copy and paste setup config. We have this package deer line, which is a map that sort of said empty string and a source. All that is doing is telling setup tools that our code is under a source directory. So don't worry. Put it in your code and forget about it after that. So now we've built a package. So let's build it. Potentially we could distribute it. So we run the setup file we've just created with the beedest wheel command. This tells it to create a wheel file that's something that is appropriate for uploading to PyPI. It will spit out a load of output, most of which I've deleted from here. But the line that I've highlighted in bold up here is the one that's important. So what that's saying is that it's just copied our Hello World Py code file into the lib directory, which means that it will end up in our wheel. If that's not there, then essentially there will be no code. It will have an empty, an empty wheel file and it won't work. So we can now look at what's been created as that part of that beedest wheel command. So here are the directories and files. So remember, we've only actually created two files so far. Hello World Py and setup Py. So everything else here was created by setup tools. So I have a few things here. It's created an egg info directory in our source directory. You'll want to get ignore this. I'll show you how to do that at the moment. This is horrible. I wish it didn't do this. I'm going to ignore it from now on. Then we have a build directory. So this is where setup tools kind of moved our files to in the process of building our wheel file. You will see the Hello World Py file in there. So again, validated that our code is actually going to be in our wheel. And then finally we've got the actual wheel file here that it's put in our disk directory. So that is our final distribution. So now we can install it locally. So this is just effectively testing our packaging. So it's not testing our code but it's testing our setup Py file. So here we are going to pip install and then with the minus E flag and the full stop period dot whatever you want to call it. This can be confusing to people if you haven't seen this before. Just out of interest, who hasn't seen this before? Yeah, I thought so. This is actually an essential command if you are building Python packages or rather they're essential flags. So the minus E normally when you install a Python package it installs it into your site packages folder inside your Python distribution. It copies the code into your Python distribution. We don't really want to do that. While we're working on our project we wanted to just work with the code that's in our source directory. So that's what minus E does. It essentially links to the code that you're working on instead of copying code into another location. So that means once we've installed this package we can continue to work with it, continue to run it, continue to write code against it without having these two copies of that code that is just going to cause us problems further down the line. The full stop at the end means install the package in the current directory. So it's looking at the setup Py file and so it's saying install this package by linking to the code that I'm working on. You run this every so often. Every time you change your setup Py file you essentially run this again to make sure that your package is installing correctly and that your Python code is available to you. So bear in mind our code is under a source directory at the moment. So if we run Python in our current directory we can't import Hello World yet because it's not in our path. So let's, in theory, let's test it. So here we run Python, the REPL and then we do our from Hello World import say hello. Because we've just installed our code into our current virtual environment, this will work now. Even though our code is under the source directory, Python has been told where our code is by the setup Py file. So we can execute the say hello function, we can execute it by passing at the optional parameter. Everything works as we would hope. So that's, it's some rough testing, we'll get on to better testing at a moment but it's just a confirmation that our code is installing correctly. So at this point we have a working package with some useful code in it. So we could upload that to PyPI immediately but I would say that there's a few things that we really need to do before we get to that point. That's documentation and testing but also just a little bit of housekeeping which I'll run through now. So as I said there's some files created that you really don't want to add to your Git repository. So it's useful to have a Git ignore file. This website is fantastic. They make it easy to get hold of GitHub's standard Git ignore files that they publish for different language and operating system communities. So you would write Python in that textbox, hit create and it will just spit out a text file into the web browser that you can copy and paste into a .gitignore file. So now we're ignoring all the main files that Python creates. So it will stop you from uploading PYC files and a bunch of other artifacts on a Python project. If we're going to publish this code we also need a license. If we don't have a license we haven't given permission to people permission to run our code. They can look at it but they can't copy it or use it which is not greatly useful. So we need a license.txt file. And a good way to if you don't know the ins and outs of the different licenses and the restrictions and freedoms they grant the software that you're publishing, this website chooseallicense.com is incredibly useful. It essentially asks you lots of questions and then gives you your options and how they compare to each other. So it's a good way, it's a human way for non-legal people to understand the differences between different software licenses. We need to add some classifiers to our setup PY file so that people can find a project in PyPI by searching or filtering on common criteria. So here we say that this is Python 3 code. It runs under Python 3.6 and 3.7. We haven't really tested that yet but we know that it only runs under those versions of Python because there was an F string in the code as I pointed out. I chose the GPL v2 so put that in there. That was a bit of an arbitrary choice. These can all be looked up in this URL at the bottom. PyPI.org classifies there's a bunch of them. Try and apply all the useful classifiers to your project so that you're describing what this project is for and how it's used. Then you need some documentation. But before you write some documentation you need to work out what format you're going to write your documentation in. You basically have two choices at the moment. One is restructured text which is written in Python. It's used widely in the Python community. All of the Python core documentation is written in it. A whole bunch of the libraries you use have a written in restructured text. But it is a Python solution and if you're working on a project that has some Python code and maybe some Rust code or some C code or something like that, those people will probably not have encountered restructured text before but they will probably have encountered Markdown. Markdown is a valid choice. It is simpler but also less powerful. You're making some compromises here. Both of them allow you to use tools like Sphinx for restructured text or MakeDoc for Markdown to compile a directory of Markdown or rest files into a directory of documentation that's all linked together. Both of these are supported by Read the Docs. You can publish either of these documentation sites to Read the Docs and then not have to worry about hosting them yourself. Once we've decided and I've chosen Markdown again kind of arbitrarily, we need to write a Read Me. That's pretty much essential for any modern project. Here we have a title of the project. We have a small paragraph describing what the project does. We should have a section describing how to install the project with some sample command line code for pip-installing this project. We should have some sample code just to tell people how to use the useful code that we've published to PyPI. And then once we've written this, it's nice to have this also published on PyPI. So as well as publishing on, say, GitHub or GitLab or wherever you're publishing your code, it would be really nice if we could make this essentially the official description of our project. And we can do that. So this, again, this is even if you've published packages before and you used restructured text to write your Read Me, this is now a new feature in PyPI as of about a year ago. PyPI supports Markdown directly. So you don't need to convert your Markdown to restructured text before pushing it up to PyPI. So here we're taking advantage of the fact that setuppy file is code and not configuration by opening the Read Me file, reading in this block of Markdown. And then we apply that string to our setup call. So we use the long description parameter, just provide this string value that we put into a variable. And then very importantly, we need to tell PyPI this is Markdown and not restructured text, which we do by providing this MIME type as this content type parameter at the end. I wanted to talk about dependencies. I've cut this talk down a bit. So we won't actually show any code that uses blessings. But for example, if we use this terminal coloring library called blessings, this is how we would add it to our setuppy file. So we have an install requires parameter. It's a list of these specifiers that describe the library and the versions we're prepared to accept. I will talk a little bit more about those in a moment. If we change the library dependencies or anything else, as I said, we should run our pip install minus e.command again, just to reinstall the package and just make sure it actually pulls down the dependencies and that these things work together. And then we should run some tests. But we don't have tests and we shouldn't just keep on opening up the REPL and randomly calling functions to make sure that things work. I would recommend you write your tests with PyTest. PyTest is awesome. But in order to write tests with PyTest, again, we need more dependencies. But this time, we're not talking about a dependency of our library like blessings. So we're not saying this is needed to run. We're saying this is a development dependency. So this is something people need to install in order to develop code with our library. And in order to declare dev dependencies, I recommend you add them as extras into your setup py. But a lot of people here I suspect are using requirements.txt for this. If you have a setup py file, I would argue you do not need a requirements.txt. You can do all of this within Python standard packaging framework and you get some advantages. Because again, this is code, not configuration. So the way this works, it looks a little bit like our install requires, but it's got an extra layer of indirection. So you'll see that it's a dictionary rather than a list. But you still see that list in there as the first value. So the key is the name of your extra. So in this case, we're saying dev. We will tell people that they need to install the dev extras in order to work with our project. And then after that, it is just a list of dependencies. So in this case, we're saying py test above or equal to the value of 3.7. And then we can tell people how to use it. So again, we update the readme. We have a section saying if you would like to help develop Hello World, this is how you install the development dependency so that you can run the tests. And it looks very similar to before, but you'll see we have the word dev in square brackets afterwards. It's saying we're installing our current module with the dev extras. You may have used this with other packages, maybe not seen how that was specified. I stole this straight from Atters, which I think is why Hinnick is here. So yes, if we install the extras, you'll see that it installs a whole bunch of other stuff, basically dependencies for py test. So the difference between install requires and extras requires is that install requires is for production dependencies, things like flask, click, non-py, pandas, and the version should be as relaxed as they possibly can be. So you should be testing against multiple versions of each dependency, in this way you're not locking your users into a specific version of a shared dependency. So if both you and your user are using Atters, ideally you need an overlap there. So if you're all using version 3, they're using version 4, they're not going to be able to use your package unless it makes some changes. Extras require is different. It's for optional requirements for development or testing or whatever extra groups of extras you want to create. And the version should be, in my opinion, as specific as possible because you're trying to get your developers up and running as quickly as possible. And so creating an identical environment to yours and the other developers who have been working with the code is just going to make everybody's life easier rather than trying to debug like minor variations in your development dependencies. Requirements.txt still has a place, but I would argue it's for apps that you deploy onto machines that you control. So in this case you're pinning every single production requirement to a specific version so that you're producing a well-tested collection of code on a destination machine. So use fixed version numbers with the double equals operator and you use PIP-freeze to just spit out all of the things that are currently installed straight into your requirements.txt. So here we write some tests. I'm going to zoom through this a bit because I'm running slower than I would like. But yes, we run our code. Now we just need to run PyTest to actually test our code each time. It's much easier than actually executing the code by hand. It will spit out a bunch of stuff to say what version of Python you're using and things like that and then it will spit out hopefully that your tests passed. So now what we've done so far, this is what we've produced. We've got a license file, our readme file, setup file, source directory with our code and a test. You can obviously stick your tests in the test directory if you have more test files. It's good to distribute source distributions as well as binary distributions for various reasons. People can check the code before they run it. They may not have access to GitHub to access the code. They may just need to verify the code before they run it. When you run Estist against our setup Py file, we actually get some warnings saying it would like some more data. For some reason, Estist would really like to know the maintainer and maintainer email or the author email. So it's told us that. So we can just add those in. That's three lines. We add the URL of the project, a link to GitHub in this case, my name and my email address. Excuse me. So now we need to test that. Make sure that the source distribution contains all the files that we want it to. So it just when you run Estist, it just creates in this case a gzip table. And we can use the tar command to unzip that and have a look at the stuff inside. And when we have a look at it, we notice that it hasn't got our license TXT file or test hello world file. Ideally, our source distribution should contain everything that is in this snapshot of code. So everything we're distributing, everything that gets built into the binary distribution. In order to add those missing files into our source distribution, we need to write a manifest input file. They are fiddly and annoying. Fortunately, there's a tool called check manifest that does pretty much all of this for us, or at least we'll get us started quickly. So you can pip install it, you can add it to your development dependencies if you like. You run it for the first time with this create flag, and then it will create you a manifest.in. I recommend having a look at it. It's just things like includes and excludes lines for various files that it's found in the project that it tries to make sure that everything you have in Git ends up in your source distribution. So it's finding these files and adding them to the manifest input file. So then if we build our source distribution again, we can unzip it, and then we see that now, just out of the box, the check manifest has created a manifest file that includes the files that we're missing. So now let's publish it. It's good to publish earlier rather than later. If you try to perfect everything, you will really never publish the package. So as soon as at the point you have something useful, not necessarily perfect, try and get it up. Apart from anything else, it will register your package name on PyPI to your project. So you're not letting somebody else just kind of come in in the months while you're working on your project. You used to be able to register your project before you uploaded code. Now you need to actually upload the code in order to register a name. So here we run a setup Py with the bdisk wheel and the sdisk command. And in our disk directory, we will now have our wheel file and our source distribution. In order to push to PyPI, you need to use twine for various reasons. It separates the build step from the upload step, which means that you can do these manual checks of your distribution files before you upload to PyPI. Otherwise it's a single command to kind of build and push up your code. If you get it wrong, it's going to mess things up for you. So here we install twine. We use the twine upload command. It also uses HTTPS, whereas for a while, PIP didn't. So it's safer. Sorry, setup tools didn't. If you get to PyPI, the website, quickly enough, you will see the name of your project on the home page as the most recently updated package. And that's kind of cool. If you click on it, you will then get to the project page. You can see our readme file is essentially duplicated here. There's a GitHub link that I've just cut off at the bottom. I had to change the name of the project, by the way, because there is a Hello World package on PyPI. Obviously somebody has done that. So there's still some more stuff that we need to do. I would recommend using TOX. I really am running out of time, so I apologize for running through these. I recommend using TOX for testing against different distributions of Python and different versions of the libraries that you depend on. Here we're just testing against Python 3.6, 3.7. You install TOX. You have that TOX configuration file. It spits out loads of output when you run TOX. And hopefully, at the end, you get, for each one of your targets, you get a command succeeded and a little smiley face at the end, which I always think is rather lovely. Here's why we use the source directory. So our root directory is the directory I've been working in. If our code was in this directory, if we import Hello World while running the tests, it will run the code in our source directory, sorry, in our current directory. But we don't want it to do that. We want it to test installing the package and using the code from there. By having a source directory, you are forcing it to use the version that was just installed into the virtual environment. You should also build on test machines. In the past, I've used Travis for this. I am probably moving my stuff to Azure Pipelines, depending on when Hinnick gets his stuff stabilized. Yes, I won't talk about that anymore. For extra credit, you can add badges to your readme, for code coverage, for quality metrics. You can manage versions with bump version. That's quite nice. You can test on different operating systems. You can write more documentation. You can always write more documentation and tests. You can add a contributed section to your readme. You can implement a code of conduct. There's lots that you can do, but I recommend that you don't do any of the stuff that I've described in this talk. So there's a project called Cookiecutter that generates sets of files from templates and people have already created template projects for PyPI, for Python projects. So if you install Cookiecutter and then you run this command to download Yonals, there's a few of them out there. I quite like Yonals. It's similar to my own way of thinking of these things. This Cookiecutter library, it will then, it will download the template from GitHub. It will ask you lots of questions because it's much more flexible than all the, I've given you one option for each step. He offers you lots of options, different testing libraries and things like that. And then, at the end of it, you're done. In theory, you will probably have to go and tweak some of these files because they won't be quite the way you want, but it took me five minutes to get up and running using this process. And then it created all of this. So you will recognize some of the stuff in here from the tutorial that we've been through. There is extra stuff in there. There is a Sphinx directory of documentation with just boilerplate documentation in there at the moment, but just waiting for you to fill it out. You copy in your code and then you're done. So that took me about five minutes. Could have cut this talk down to the last two slides if I'd wanted instead of wasting all of your time for half an hour. But hopefully this gives you an overview of good packaging practice and all the things that you need to do to kind of build a well-rounded package. There's obviously different directions you can move in, but this is a really good core understanding of the things you should do for a professionally released package. There are other projects for distributing libraries these days that are very interesting, but don't use setup PY or use it in different ways. I would really recommend having a look at them if you're struggling with setup PY. Poetry is getting a lot of mindshare at the moment. I haven't really used them, so I can't really recommend them. I'm trying to push current most common best practice. If you are interested in the slides or the code for this talk, they are available on this bit.ly link. Follow me on Twitter. If you have any questions, feel free to grab me at the conference, come to the next MoBooth, tweet at me. Preferably not abuse. But thank you very much for coming to my talk. Well, actually, we have time for one question. OK, go on. We have less than 60 seconds. So very quick question, anybody? Otherwise, I will ask you to just put the thing about cookie cutter, the slide with cookie cutter. Yeah, I can do that. That's a good kind of question. So is it which slide was it? You wanted this one? Yeah, yeah, yeah. Yeah, that's it. So I never used cookie cutter before? It's awesome. It's really easy to fork other people's templates as well. So if one of the templates out there isn't quite the way you want to do things, or you're just changing things as you go along, like you can just modify it and then commit it back to get. It's that simple. It's just a directory of Ginger2Fights. It's seriously easy. If you look for... All right, we can start. Well, we have Christian here. And he is going to tell us about C++ with Python. And I'm unleashing the power of C++ in Python. Let's welcome Christian. Hello, everyone. This is one room more that I was expecting for, because I know that there's also amazing talks that I would like to go to at this moment. So if you want, we can all go to the other talks or we can stay here for some fun with C++, of course. And I know it's just the last talk from all the tracks. So bear with me. There will be no weird things here. And we will not like be trying to understand obfuscated C++ or something else. So it's a really light talk. With advanced topics. So let's go for it. So as you heard, my name is Christian. Currently, I'm working as a software engineer in the Acute Company, which one of the companies that support the Acute Project. Maybe you heard about it. Later, I will have one slide to explain the motivation behind this talk and why I'm talking about C++ on Python. So first of all, don't worry about the examples or the slides or whatever. Just go to that GitHub repository. There is a link to the slides. And you can get all the source codes. Oh, my God, I feel like a star. Everyone taking pictures of it. Stop it. As you can check it out. And at the end of the talk, also, is my GitHub username. So you can check it out. It's called Unleached CBP. OK. So first of all, the Acute Project in general is a C++ framework, huge framework. And with starting to do some Python stuff, you need to have the mandatory comparison between two languages. First of all, we know that both languages are general purpose. Also, you have multi-paradigm. You can do many things with them. And then we start to notice some difference that I'm pretty sure that most of you are aware of. Dynamically typed, vertically typed, compile, interpret. And then memory management and about also code reliability. So I know this is kind of if we have just raise your hand, who is currently writing C or C++ code? OK, good. So I feel your pain too. So I know that sometimes C++ could be complicated, but it's just want to show you that it's not and you can even make things in Python better. So first of all, everyone knows that Python is beautiful. I remember the first time that I typed that on a console when I was studying computer science and I was, whoa, everything works. But if you think about it, I mean, C++ is not so different, right? I mean, you can write something like maybe you need to run an external compiler or something, but you achieve the same thing, right? So, OK. After all, C++ is simple. It's not. Because then you have things like meta-programming and you start, you know, templates over templates. And this is one of the reasons that people get scared of C++. But C++ is not so ugly and this is some extreme cases of people with a lot of time, a lot of effort and try to do these complicated things like compile time, if. But let's go about one of the goodies, a couple of goodies that C++ and the latest standards being 11 on has. I mean, if you're familiar with C++, now there is this amazing auto-type that kind of detects the variable that you have, which is kind of a similar, like a more dynamic language, but not. You can even use some functions to get the declarative time of a different variable to declare another one and stuff. And then again, we have the for loops that with the time it has been improved. I mean, this is kind of the more dumb, old version that we have in C++ to kind of like go in a vector. I know that you can do also with an iterator too, but it's kind of like going element by element, but it's too complicated. So then you can do more fun things like this. And if you remember the other one, then you can kind of go like, which is not so complicated to understand. Also, if you're a fan of Python lambdas, you can also do it in C++ with the latest standards and so on and so forth. And you can have this for each handling a nice lambda there to just sum it up some values. But the most interesting things about C++ are coming in my opinion in C++ 20, which is, I don't know, I have this relationship with love and hate with C++ 20. There are some things that are really nice and some things that I have no idea why people is doing it. But one of the things that I like is this one. So you have this amount, sorry, this type of ranges that you can use now. And if you see the four, the kind of like the type of the four there is kind of similar that you can have there. So after all, I mean, it's not so crazy different, right? The same with a more example. I mean, I know that this mod way, maybe is the first line that is commented out, but you can write the function, maybe like the second line that you see there, which is also quite similar what you can achieve in C++ 20. So after all, I mean, it's just adding some semicolons and yet, right? So the idea is that now it's not about like, which language is best, but how to make both languages to improve each other and try to collaborate in a more efficient way. So I want to propose now a new language that is, no, no, it's not, but I'll just talk to extending Python with C++. And for this, we will go through a couple of examples, really simple ones. And the first of all, who here didn't know before the conference that Python was written in C? We had some, okay, we have some people, okay. So it is something that also was kind of shocking to me because I thought that it was some black magic assembly stuff, but it was written in C. So if you learn some C, you could understand it. And then you can understand other things like this joke, for example. I know that is really bad, but it's the only joke that I have on the slides, but so I'm really sorry, but I needed to do it. So let's take a look at how we can create an amazing module based in pure C Python. And okay, let's go for it. So the first thing that you need to know is that you just need to see a C file. And we will go through these examples with the most simple thing that I would think there was a hello world function that just returns the char star or string, whatever. So I hope that everyone can read. If you don't know C or C++, it's kind of that you get the idea. It's kind of a function declaration that returns some message, which is hello, EuroPython 2019, okay. So for building a C++, I will just sum out. Don't be afraid just to show you. So that's all the code that we required to create a new Python module written in C Python, which one function that is just printing hello world. It's not so scary, right? I mean, I need to confess, the first time that I started to play around with this, I was just copying all this weird structure that you have around and all these widers are null, null, zero and other things. You just copy some stuff and it works. So if you are brave enough after this talk, just go to the C Python directory and play with it. And maybe you can write other sophisticated models with it. So this is a simple case, right? And this is just one example, one simple function and we have this. So what's the motivation behind this Qt thing? So Qt is just a C++ framework that is like huge and one of the main functionalities was to offer the way of developing user interface, graphical user interfaces for people. And also Qt implements many of the things that we have for free in Python in C++, which is like a lot of abstractions for different classes to interact with more, for databases or maybe doing command line or scripting alike in C++. And even other advanced things that different way of creating user interfaces, actions, notification, threats, all these things. So they are providing a lot of goodies to the C++ world. And at some point, the project was, but you know what, which I mean, this thing need to be maybe available in other languages. So people started to wonder about Python. So maybe you are aware, there is already a set of binding which is really old, it's called PyQt. And this was developed by a different company, some people doing a lot of work there and this was fantastic. And in the Qt project, they wanted to kind of have an official adoption of these things. So they decided maybe we can create a new set of bindings and you know, just put it there and see it's an open market. So people can decide which to choose. At the moment, they are at the same level, one is really old and there's a lot of examples and tutorials. But the Qt project at that point had a couple of options to do this with this huge C++ framework. So they could write raw C Python, they could write a SWEAK, I mean maybe you heard about SWEAK or they could use boost Python. So if you take a look at the SWEAK implementation, so with SWEAK, we just, just for you to believe me, we have here our difficult function to understand. And the only things that we need to do for the SWEAK is just to create an interface. So this interface is in an I file but also nothing really scary or something at the moment. So we have a couple of instructions to build it. I mean, we run SWEAK and then we compile some stuff and then we can just use a normal, simple import the module and execute it. And then we have the message, nothing too fancy. With boost Python, it's kind of a similar story. We go there, again, we have the simple CBP file, with our function there and then we just need to define boost Python module, the name, the name space and then we define the function. Nothing too scary at the moment, right? So this was looking at a really nice solution, right? I mean, we were achieving the same thing but the problem lied that many things that we do in C++ need to be specially treated inside Python. For example, ownership of objects. If you are wrapping an interface that is in C++, who owns the objects? Some cases will be Python, some cases will be C++. So you need to deal with all these things, okay? So the option that at that time the developer had was, okay, this is just one of the generator and we can maybe modify the wrapper generated code. So let's take a look of the code that SWEAK generates. So just for you to have an idea. So this is kind of the code that generates and I will start to go down. We are in 2%, 3%. Okay, we'll just page down, four, five, seven, two. So I understand it. It's a little bit too much with this small function, right? So that, okay, that's all. And there is kind of a motivation to say, okay, I know that they need to set up many things to get for granted and to automate many process and it's a really smart solution. But if you want to modify this file, it will be embeddable. I mean, you cannot do it. So there was a lot of motivation behind this. I see Python and SWEAK and Boost Python. I don't have the source code from the Boost Python here, but the shared library is roughly the same size. So you can imagine that there are a lot of black magic there. So what did they say? Okay, they started the development of this new tool based in Boost at the moment. They said, no, it's too heavy. Let's write our own thing. Okay, everything was released properly. Then there was the project to continue the development and the good news was that last year officially it was released this new set of bindings of Qt in Python and then their name of a new project was called Qt for Python. So maybe you heard about this or PySide tool. It was all this story behind. But okay, this is just a story, but what was the important part, at least in my opinion? How do we do it? So this tool that you see here in the center called Givoken2 is kind of like the response for writing our own code generator tool to expose all this huge C++ framework to Python. So there is a module inside that kind of extracts all the API information from Qt based in Clang. Of course, if you want to do smart things in C++, you need to use Clang. And then we have a support type system which is nothing else than XML file. Then then you get, wait, something is being weird, an XML file where you can define all these things that I told you about ownership or for example, what do we do if a function has a void star argument in Python? We need to have a special or type or treatment or do some casting or something around. So this tool graphs this type system, the information from the framework and generates some wrapper that the one that you saw from Swig, but of course more reduced and it makes more sense and it's more clean, in my opinion. And then with this, we can just compile it and have the same Python module that we have. So this tool is called Givoken, the documentation is there and I don't know what is happening, I haven't been doing anything and the slides are coming, okay. So the Japanese Ganges doesn't mean anything if we have some Japanese speaker there here, it's just three words that make no sense, but let's take a look of the Givoken thing. Okay, so again, we will go first for you to believe me that we have here the implementation and we are like cheating a little bit because of course we are working with strings from the standard library. Then we define a complex header for this, which is just that. And for this type system that I just told you, it's nothing else than this. So we've had TTML, we said this would be a package called simple, it will have a primitive type, there's a string from standard library and a hello function. After all the compiling and stuff, we will get something like this. Let's go to build. And it's inside simple and here you have the module wrapper. I will zoom out just for you to show you that I'm not lying. It's long, but it's not so long as the other one. So here you have it, I don't expect to you to read the code, I just want to show you the magnitude of the code. So at least this has been code automatically generated, calling this C++ function and exposing everything with C Python to be able to use it from Python. So it's way shorter than the option that we had. Of course this translates into having a more lightweight shared libraries and you can achieve the same things. So there are other nice options out there and for this I kind of tend to recommend people to take a look of all the solutions that you have out there because it's unfair to say, yeah, this is the best solution, no it's not. I mean there are many things that you can complement to each other. I don't know if you have the chance and or maybe know one of these ones. The last one for example is the one that the other set of binding called PyQt uses which is called SIP, PyBind 11. It also is a really nice project that appeared in the, I think in a couple of years ago even I think at two Euro-Python's ago there was also a talk about PyBind 11. So let's take a look of like how do you achieve the same things with the other options? I think that I am good with the time at the moment. So this is the case of PyBind 11. So let's open again the file. If you see the motivation here is clearly from Boost Python. I mean it's again a kind of a macro that kind of defines a module. Then you have the definition of the function, some documentation and so on and so forth. And then you can do the same thing. I mean after you compile it and everything you have your simple main that you can achieve the same. So it's kind of like same idea as Boost Python but they are doing way more things and there is a lot of nice support there that I encourage you to check it out. CFFI, well this is again not really fair comparison because CFFI, we had some talks already that we explained about this thing. It's not kind of like generating code but it's just kind of loading something. So we can have, this is the one here. This is the code that is being generated. Yes, so we go to the simple build and then you have like inline raw string that contains the function that we want to expose and then we just compile this thing in some share library that we can easily load from Python. So if you check here for example, the main is the same thing that we were seeing in the other examples. Just calling this nice function and in this case we are casting it to string and so on and so forth but it's kind of the same idea but again I mean it's not like you can write inline code or kind of read lots of C++ code to kind of expose it in this way. The other option that I had there, ah, okay. CFFI is kind of focused on C when they support C89 and I don't remember which one, the other standard but for playing with C++ you have a similar idea which is CBPYY, kind of fine right, CBPYY. And then it's kind of the same idea. So you declare everything like in a string and then they have this declaration that you can use this from the GBL. Again, same idea and nothing too scary. And the last one is kind of similar. It was a zip that I show you, the one that is using the PyCute bindings. I will just show you, this is just there. So yeah, here we have the simple, again same idea but the only difference that they require to define a zip file and it's nothing like the other set this. I mean it's also quite simple. I mean you just need to define a module and then you have some include and then the signature of the function. So there are many options out there to achieve the same thing, right? But what's the idea of doing this, right? I have been having some conversation with people sometimes that they say, ah, yeah, but I mean I use Python for everything. I mean you don't need to C++ and they say, okay, what do you use? No, I do everything with numpy because it's amazing and it's fast and it's Python, you know? And then it's like, yeah, I mean, you know, it's written in C, oh no, it's not. So okay, there's even some people that is not aware that many of the popular libraries out there are using C++ or C. So in this case you have a case of numpy, I just downloaded the source code. The first line is all the Python files that you have that are not test and then you have the C files there which is 96, which is still a lot of things that they are doing and if you want to maybe refer to a more modern module, the case of PyTorch is a little bit more extreme then if you look there, we have, no, this was, this is. So here we have 547 C++ files inside the PyTorch because of course it's based in Torch and Torch is C++ so everything is C++ inside. So it's kind of a nice motivation of using all these tools to start to write things to improve the Python. We are all here, if we know C or C++, we kind of have some responsibility in our hands to improve things and one of the examples that just out of curiosity and please don't blame me about the things that I will show you is that simple case, I was helping someone that I was listing some files in Python and getting all different absolute path and what do you use there? Glob, right, when you use Glob or if you are more like up-to-date person you use the bat leaf and then also you have the Glob access there. So for this, you know the deal, I mean writing Glob is, things are kind of simple. This is a hard coded thing just to play with the recursion thing but you know import Glob and then you need to double stars there and specify if it's recursive, no. At the same way with bat leaf is kind of the same idea. And then I thought, let's look at the implementation and see if we can kind of do something smarter in C++. So the first thing that came to my mind that in C++ 17, maybe you know there is a new adoption in the C++ standard which is the module called file system. This is based in the boosts file system. So I thought, okay, I will just copy paste the hell word example there and I will put the file system call to list directories. So again, same idea, I will just replace all the simple and hello and the only difference is this. If there is any CPython core developer in the audience please forgive my CPython but this is just to show how simple we can achieve things. Of course it's a and save. There's memory leaks there because when you append that increase the ref counts and I am not taking care of anything but just taking one argument which is the recursive to see true or false and create an empty list and then just doing some dumb appends. So the magic here from the C++ site is that I am using the file system module and I am using C++ 17 and there is luckily some directory iterator that you can kind of do recursively or not and that's it. I mean, if you don't know CPython you just need to, this is just a function that will take here some arguments. We create an empty list and then if it's recursive enough we are appending, well, you know how append works some stuff and then we are returning this list. Nothing else. So I thought, okay, let's see how this thing was working and then I wrote a really, really simple batch script because yeah, I was growing up with bash and I thought I would just have this scenario when I have 1,000 directories with 1,000 files inside of each empty files and just to list them. So I just create a shell script to measure this as you can see here, the non-recursive option which is just listing the directories inside this fake environment is 0.05. I was expecting glove to take so long but I think it should be something about 20 something seconds. I mean, in the meantime, I can tell you that I was afraid that maybe I was using user being time and maybe it's not the proper way of doing it. So okay, there you have it. Recursive 33 seconds. We're talking about all these files. Then I tried a bad leave. Again, it was a little bit more and it was slower than the Glob file than the other cases, the recursive way. It will be roughly, I guess at the same if I'm not wrong, I should have have this prepared. Okay, there you have it, 20 seconds. And then the fast glove implementation that I have there, it's less than a second. I mean, I know I'm not taking care of cache or the, I don't know, releasing the GIL to use parallel computer or something fancy, just the simplest thing that you can think. And then you have it, I mean it's, and then I thought maybe it's the same thing I am doing it wrong. So let's do the same performance in Python with the time module or something and after all the tries it was more or less the same result and it's not like I want to say, yeah, this should be the new Glob but this is how easy you can improve things there. And maybe you are thinking in the next numpy, the next pandas, the next PyTorch or something like that. Well, I will not wait for the results there but it's roughly the same, believe me. So yeah, this is a summary. It's okay if you cannot read everything because you can check the slides afterwards about the version that you can use, the licenses, the compatibility with Python. Here I just want to highlight Chiboken, the thing that I am working with and also SIP because there is something that is called stable ABI. Maybe you heard about it? Yes, yes, stable ABI. Good, so this is just for developers to release wheels that are compatible with Python 3.5 onwards. And I don't need to have different wheels with different Python versions. So this is really tricky, it's just the way that you create objects that are more dynamic way but it's really hard and it's implemented in those two options. So there you can have information from the Python project. You can find me here also, you can check my social networks and all the information and you can just type make. I add some make files there just for you to know the process of building all these wrappers so you can start playing around today about it. And just as a PSA that it's always good to support with your local groups. So if you, at any moment you come to Berlin, I mean we have, I think, one of the most amazing Python communities out there. We have PyLadies, Python users Berlin, PyBerlin and PyData Berlin. So that's it. Thank you very much. All right, thank you very much. We have time for some questions. We have actually five minutes, so a couple of questions for sure. And if you don't like your voice on a microphone, maybe you can find me outside. I will be here until Friday or we can hang after a couple of beer, we can discuss the truth behind C++. So if you have questions, yes. Did you ever look into performance, overhead differences between those options? No, I haven't and I wanted to, but then I thought, that's not a half an hour talk. So I was really considering for the meetups or conference to kind of do this kind of thing because I want to measure code length performance, which is also critical. I mean, I have noticed some stuff different between the SIP and the Shiboken, for example, that there was an issue that Shiboken was way slower because we were using lists in C++ and the SIP was using verctors and then we have, for example, in some tables, or cases like this, they were kind of different. But yeah, it's a good question and I will try to maybe next year, I will have one performance. Thank you, welcome. Any more questions? Oh, excellent. You can shout if you want. I'm just going to take it so, too, one thing at a time and pause. Yes, lately there was something that fixed that because I wanted to implement breaking the QTBI, having, for example, for Qt settings to have explicit cast in types and I needed to do the name arguments. So now it's implemented and it's out there, so you can do it and now there's a new notation, which is the at, so you use the at between the name that you want to use and then you have nine arguments and you can parse it like a normal dictionary for the arcs and the, so yeah. You can ask me to leave it to you and I can show you how to do it. Okay, any more questions? Do you hear me? Yeah. Yes. Does the Shiboken uses some, does it have some CMake integration? Yes. It's CMake. It's CMake and you can make a C++ project and use CMake integration in your project. Okay. Yes. And does it, for example, you want to override LAN or some other Python features? Can you override it in XML file for bindings? Can you name one, I mean the only override that came to my mind is that for example you can kind of like remove functions and replace it with a whole new Python equivalent. Okay. So signature-wise or functions. We can remove arguments sometimes. Yeah, but items like square brackets, override operators. Yes, I mean we have some special treatment for a few operators. I don't know how much you can modify this thing but at least I know that you can manually modify for example the lower, lower operator in C++ and in Python to do different things. So you kind of like override the Python's default stuff. Yeah, so for example I have C++ method size and the other one is operator for accessing it and I want in Python to just use square brackets for operating. As long as you define this operator inside the type of the available it will be possible because underneath it will call this specific rewriting that you have. Okay, thank you. You're welcome. Any other question? Any other question? I will ask you a question then. Okay. So how is it to wrap a C++ class? It's kind of the same thing that you saw there. You need to just, the class will be an object so you declare a new object and then inside you have any function that you want to modify. If you don't want to modify it you just close the tag and that will automatically go to your header, see the class, take all the signature and expose everything. If everything is normal in the sense of like there is no void star or weirds, kind of like the pointers working around, so yeah. And can you then subclass that class from within Python as you would normally? Yes, totally because you are exposing some new type in Python. So for example, the simple things that you can have in Qt is a Q-widget and then this is C++, you expose the Python and the recommended way of doing it is to use this as a meta class for your new class in Python that is called MyWidget or something like that. So you can do it. Oh, very nice. Any question? If not, we can thank the speaker again.