 Felly, os ydych chi ddim ystod i'w dweud yw'r prydau a rhain iawn i'w gwybod y flwyddyn. Felly, rydyn ni'n gweithio i'r platform sas yw Chwita Cwontia, sy'n rhaid i'r adrofiad Jango ap, rwy'r i'w rhaid i'r argwyrdau arwag arwag. Felly, rydyn ni'n gweithio i'r eu wneud, rydyn ni'n gweithi'r cyngorlwynt yn ymlaen i'r adroddau arwag arfer, a i'r adroddau arwag arwag arwag arwag y Llywodraeth. As part of that, in that process over the last three or four years I've become a contributor to AIO, HTTP and to RQ and built a bunch of libraries of my own in particular ARQ which is an async.io successor to RQ or I would say successor and Pydantic which isn't really relevant but is quite popular. So I wanted to give this talk because I got a long way as a developer without really understanding the landscape of how parallel programming works within Python and also in general. And so I wanted to kind of give a high level introduction. So I'm going to talk about the four levels of concurrency or the main four levels of concurrency that I see. I'm going to demonstrate each of them with Python and I'm going to try and explain why you might use them and why you might not use them and I'm going to try and keep it mildly entertaining. What I'm not going to do is try and prepare you for a computer science exam on distributed computing or read a spec to you or talk about the protocols. This is going to be quite high level. You're going to have to bear with me on that. So why is parallel processing important? I think this graph kind of demonstrates it. This is the spread and average speed of CPUs over the last 25, 30 years. What's interesting is that Python was conceived pretty much on the left hand axis of this in about 1990 when most computers had one CPU and when CPU speeds were increasing really quickly. In around 2005 that effect plateaued and suddenly we started getting multiple CPUs in computers, both in servers and in desktops. I guess that was partly because people wanted it and it was really because the CPU manufacturers needed something new to sell. And so at that point Python had to adapt and had to implement multiple processing, parallel processing. But the kind of interesting thing is that it didn't start off with that. It had to retrofit it later and you can still see a few of those bug bears now in the gale and stuff as I'll speak about later. Any other thing to mention is that the right hand side of this graph, the kind of pickup recently may or may not be right. I think this data they might have used, they might have done benchmarks on high end processors more recently than another one. So that might be why there's that uptick at the end but I'm not sure. So no talk would be complete without a math metaphor and so kind Tom who works with me has built this metaphor in Minecraft. So the principle here is that we're thinking of a factory like a computer. We're thinking of a process like a conveyor belt within a factory. We're thinking of CPUs as a bit like individual people working on those production lines and then we're thinking of networking as the trucks coming to and leaving the factory. So the highest level of parallel programming is multiple machines or computers or virtual machines or containers. Anything where the code sees itself as running on a specific computer and this is demonstrated here with factories as multiple different factories. So instead of building one factory bigger you have multiple factories all running independently but perhaps networking between each other. So in this case they're not networking between each other they're just doing their own thing. You can imagine scenarios where we do that quite a lot. So for example front end servers on a web platform would generally talk to the database and talk to the client and use things like cookies for state but they wouldn't actually talk to each other. They wouldn't know how many other machines were running around them. But quite often they do have to communicate and that is where the communication comes in. So to get to an example we're using RQ here. I promise you this is the smallest text we'll go at any point in this presentation. I hope you can all read it. So RQ is a queuing library built with Redis as the name indicates. In particular it uses Redis' lists to do the enqueuing. So to enqueue a job you push it into one end of the list and to execute a job you pop it out the other end of the list and then execute it. So the code we're going to use for most of our worker examples here is in the top file here worker.py. It's very simple it just downloads a web page in this case from Python Europe for one of the last few years. It takes the text and counts the number of words. So it splits it and then counts the number of elements in the resulting list. Very simple in reality you wouldn't need multiple machines or even multiple anything to do this but that's our example. Below you see the code we would use to enqueue those jobs using RQ. So we take a completely vanilla Redis connection in this case. I'm demonstrating it here on my laptop so I'm not actually running it on multiple machines. I'm running it on multiple processes to demonstrate the principle but bear with me. And then we're for each of the last four years enqueuing a job where we run count words. Now one of the interesting things to see here is that even in our RQ example which is running on the main machine not the worker we have to have access to the count words function so we can import it so we can enqueue the job. So if we look at running that below, first thing we do on the right here in our two workers is we call RQ worker and that starts the worker which is doing a blocking pop from Redis waiting to execute the jobs. To enqueue those jobs we simply call our example here RQ worker that bangs those four jobs into the queue and prints out the result which is what RQ gives us that we could use to get the actual result later. And then you see those jobs being executed in RQ here and if you can see closely enough you can see the years and how many words. It's not very interesting but there we are. So the advantages of multiple machines, scalability is the big thing. You can add machines very easily. Also adding machines has a linear cost. If you have 10 machines and you want an 11th it gets 10% more expensive. And lastly isolation which is demonstrated here with our factories. If one of your factories were to blow up you can simply add another factory on the side or in the case of my graph pan to the left because adding a new factory in real time was too hard. This advantages of multiple machines, well mainly complexity. You have to set up all your networking between your machines. That's made very easy by platforms like Heroku and others but it still can be a problem particularly during development. And so as you saw earlier quite often we use multiple processes to simulate multiple machines. So to go on to multiple processes. This is within a single factory running multiple production lines in our analogy. Processes are an operating system concept and they are designed to keep different programs isolated from each other whilst running at the same time. So they were developed for I guess originally for desktop applications where you were running two completely separate things but you can use them to run the same code in parallel. So here's our example. You see immediately it's quite a lot longer and the other main difference here is we're not using an external library for the queuing. We're using Python's standard multi-processing library, Python's standard process and joinable queue. So we have exactly the same code here for adding up stuff, for counting the words. Then we have our very rudimentary worker which is just a loop that runs taking jobs out of a queue. And either executing them or if the job is none that's our queue to quit. And so to enqueue those we have to create our processes. The really interesting bit here is happening on line 20. What Python's doing in the background there is forking the main Python process to form multiple sub-processes which at that point share memory but any further changes in memory would be copy on write. So they would be changed. So we now have completely separate processes and that new process is set off to run our worker function we just saw. The argument in this case is just an ID to tell us what worker we're running. Enqueuing our jobs is simply as simple as doing put on our queue object that Python has helped for giving us. We can then wait for that queue to be empty for all of the jobs to be finished. Then we have to go about putting the none job into each of those queues to stop them and then we wait for our processes to finish. And you see there it printing out our words as it did before. Again, not very interesting. So the advanced system processes they're really easy to run. No networking required. You get this OS level guarantee that your multiple different processes are isolated. They can't share memory after they've been forked. And they're pretty fast to communicate either by doing networking within on a machine or into process communication all very quick compared to multiple machines I should say. But the disadvantages of processes are quite significant. You have very fixed limits to scaling. If we go back to our factory analogy and we want to add another production line into our factory there's nowhere to put it. If we want to have four production lines we need to build a whole new factory. I guess decommission our old factory and start running our new factory. If we want to go back to having three then I guess we have to ignore our new four production line factory and go back to the three. And secondly it gets really, really hard to build a really, really big factory. So we can make it five times bigger or ten times bigger but it gets prohibitively expensive to have a thousand core machine. So it's not linear to scale. Whereas you saw with multiple machines it was linear. And again we don't have isolation if our machine breaks the whole thing's broken. So next we go down to multiple threads. So threads are a way of achieving concurrency from within one process. They come in kind of two variants, kernel threads and user threads or green threads. When we talk about threading in Python we're talking about kernel threads. So it's important to remember that kernel threads are the only way from within a process to run a task on two different CPUs at the same time. So we can do lots of things that look like parallel but unless we have kernel threads we can't be running things on two different CPUs at the same time. And in our analogy here production line has changed shape and we see we have three of these boxes that technically have faces and they're supposed to represent the workers. So we're running multiple things on the same process. So our threading example looks suspiciously like our multi-processing one. That's not a coincidence. Python's tried quite hard to keep the interfaces the same between processing and threading. So we have the same function as before. We have exactly the same worker except we say quitting thread instead of quitting worker. The difference is our imports. We're importing here from Q and from threading to use those versions rather than the multi-processing variants. This is all basically the same except obviously line 21 where we create the thread. We're creating a secondary sub thread within the same Python process instead of creating multiple processes. Again we bang the years into the queue to run the workers, wait for them to finish, kill the threads and wait for them to have been killed. And we get the result again. So the advantages of threads, they're lighter, even lighter than processes. They're faster to create and faster to switch between and they share memory which can be an advantage but can also be a big disadvantage. Disadvantages is exactly the same thing, they share memory. Memory locking is horrid. To use a go proverb, communicate by sharing memory. Do not communicate by sharing memory, share memory by communicating. So we can do that with Python threading. Python provides some primitives for doing communication between threads that is secure. But if you're not careful it can all go wrong and you won't get a nice warning, it'll just burst into flames. The second and bigger problem that is specific to Python is the global interpreter lock. So from the Python wiki, the girl protects Python objects preventing multiple threads from executing Python code more than one thread from executing code at once. What? So the whole idea here was we would run stuff in parallel and now we've heard about this lock thing that prevents us from doing that at all. Let me try and demonstrate that with another example. We've taken pretty much the same code but instead of doing a network request now we're doing something CPU bound, so in this case counting a bunch of numbers and we're using standard Python sum for doing that. And we're going to do that in two ways. One we're going to do that in a normal for loop, just doing that task four times and in the other case we're going to go through all the plover of creating our threads and running them in parallel. What happens? Well, it's not very exciting. We actually get exactly the same time. In fact it's even slightly quicker to do it without multi-processing, without multi-threading because we don't have to have the overhead of creating the threads. All is not lost. You can do this same task with multiple threads and make it quicker. Here we're using NumPy, so NumPy's sum function is going to do the summation in C. C in turn can release the global interpreter lock and so here we can get the advantage of multiple threads. So it's going to be quicker because it's done in C but also we see here that we nearly halved the time by doing it in multiple threads. So I guess the not quite half is the overhead of creating those different threads. Anything where we can release the global interpreter lock because we're executing in C or where we're doing file IO tasks or we're doing networking, threading can help but in pure Python CPU bound tasks doesn't really help. So lastly we come to the fourth level of parallelism within Python but also not necessarily unique to Python which is async.io. I think this is really cool. I am obsessed by async.io. I think it's wonderful and I will try and persuade you that it's the way to go for lots and lots of things. The idea here is we're doing cooperative scheduling. So we have one kernel thread but within that we have some wonderful tools that allow us to seem like we're doing things at the same time and in the background we're actually only executing one bit of code at a time. To do this we have an event loop that's effectively scheduling tasks in a way to keep something happening all the time. So I promise you I won't carry on pushing the metaphor any longer after this but without async.io you see here in our top example when we're doing networking our thread has to stop because it is waiting for the networking to come back and give us a response, that thread and perhaps that whole process has to stop and wait for the networking to have finished before it can go on and do something else. With async.io on the other hand our thread can carry on processing as networking tasks are going on because our event loop is doing a clever job of scheduling tasks to fill in the gaps. So an example, first of all you immediately see it's already shorter than our examples before. We don't have to do half as much FAF and setup. We do however have to call our coroutine using in this case async.io.run. If it was JavaScript you could just set off your coroutine and hope for the best and it will finish in the end and no one seems to mind. In Python you have to either await a coroutine or set it off like this if you haven't got an event loop running. So our main coroutine is simply calling our account word's coroutine which I'll get to in a moment putting the result of that which is a future into a list and then lastly we use the special coroutine async.io.gather to wait for the results of those four coroutines and once they've finished proceed. So how does account word's work? And here we get to the big problem with async.io. We can no longer use requests. We've had to rewrite this function entirely in this case we're using aio.http. We have to create our session explicitly in this case requested it implicitly. Then we do our get request. This is a context manager or an asynchronous context manager. We get our response. We can then await reading the text off the network for the response to that. Finally we can do the same things before and count the number of words on our page. And you see the result again. So a vastures of async.io even lighter than processes and threads and we can quite happily have say thousands of web sockets connected to a single host processing all of them without enormous amounts of CPU or memory usage. They're a lot easier to reason with because you are explicit about where you're going to go and do some networking where your current piece of code is going to release and do an await and so other code might get executed when you're doing networking and when you're not doing networking. And there's technically less risk of memory corruption when you have a running one bit of Python at a time. This advances. Well we don't get any speed up of CPU at all by using vanilla async.io but the real problem is whole new way of thinking and in general you have to kind of rewrite applications. It's possible in theory to adapt them but in general I think you basically have to abandon an existing project and start again if you're going to start using async.io all over the place. You might be able to get away with using it in a few places but in general it's a whole new rewrite. The point here is the whole brilliant thing about async.io is that it's explicit but that means it can't be implicit. You can't have some library that wraps around async.io. There was someone asking it on Python ideas recently can't we make it implicit. The whole point is it's not. The point where it gets really tricky is where all of these four levels of parallelism interlink with each other. Machines, the RQ example I showed you the RQ actually does forking in the background to run its worker. The multiprocessing joinable queue that I showed you was in fact using a thread in the background to put things into the queue. Async.io has thread pool executor process pool executor that I'll show you in a minute. Machines when they're communicating with each other because it's networking you then want to go into the async.io world and ARQ and ARO HTTP that do that but all of these things interact so it can get a bit confusing kind of where we are. I want to talk about one of the uses of async.io that I don't think enough people are talking about which is in being a sane way of doing multiprocessing and multi-threading. Particularly multi-threading for file operations and multiprocessing for CPU bound tasks. You get all of the performance improvements from threading or processes but from the comfort of async.io and it's much easier to reason with. We have an example here. We're using our same doo couch as we had just now in NumPy so we know that that is suitable, that's accounted it for multi-threading because it releases the gil. But instead of just calling our co-routine we now have to create this thread pool executor that's creating a pool of threads in which to run our tasks and the clever bit is run an executor which returns a co-routine that waits for the task to be completed within the thread. And there's a process pool executor which has exactly the same uses just a different name and is obviously creating multiple processes and doing it that way. And so we create this list of co-routines and again gather them, wait for them all to be completed and hey presto, we get a time again we get the speed up of multi-threading but from the comfort of Ace and Kaio. So in summary I think I've probably not taken up half enough time have I? I don't have a clue. We've talked about the four levels of concurrency we've said that they're all possible with Python none of them are unique to Python but they're all possible. Ace and Kaio I think today Python is probably leading the way in its ease at least I think it's kind of accepted as one of the best implementations of that I definitely think it's cleaner than what's going on in other languages except for arguably JavaScript but that has its own problems. They all have their strengths and weaknesses and the key thing is to work out which one you want to use for a particular application and that they often interact with each other so they're not unique they don't get to stand in their high castle and be on their own interlink with each other but the real point I'm trying to get across today is that there's this landscape of different processes and processes are a bad word to use of different tools out there and you need to have a bit of an understanding about what they're doing just taking the first working example off the top of the page putting it into an editor pressing run and seeing what happens gets you a long way it got me to a company that pays my salary but it's not always the best way and it becomes a problem when everything goes wrong and you're trying to understand what's happening and you have no grounding because you've just taken the example and got it to work which is definitely what I did the first time round so thank you very much and I guess we've got lots of time for questions if we've got having said questions if we've got a couple of minutes I'll do a little tiny bit of advertising some packages I've built since we've got a second so ARQ is a successor to ARQ but it uses async.io so it uses the async bindings for Redis and it allows you to enquew tasks from AIO HTTP application or similar it also has some other useful features so it has this principle of every job has to be finished so it might be run multiple times but it has to be executed doesn't actually use a list it uses a sorted set which means you can enquew tasks to be run at some point in the future and if they get stopped it automatically reruns them again when it comes back up DevTools I think is the most interesting thing I've ever written and no one seems to care at all so I'd love your feedback on it it's basically a better print command that tells you the line where it happened and what you printed and prints it in a pretty way I use it all the time but I failed so far to persuade anyone else it's interesting and Pydantic is quite widely used as a type-inting using data parsing using Python type-ins thank you, now questions we have lots of time for questions so we also have two microphones over there you can probably see them so please line up if you have questions line up behind the microphones and we'll be able to take a number of questions quite a number of questions I see we have one question so go for it maybe I didn't understand you well but you said there is no good tooling to do machine lever parallelism second but as I understand celery is exactly the tool you can use to run your parallel workers either on a single machine or on a lot of machines built into the standard library there's no way of doing the cross machine communication over HTTP or some other protocol there are some great libraries but they're not built into the standard library that's actually one of the most the reasons it's been so successful is that external libraries have to compete on being really easy to use and on iterating quickly and taking advice whereas the standard library has to be slow moving and has to be sure and has to not respond to advice half as quickly so actually I think it's in some ways a good thing maybe multiprocessing will be way easier if there was the equivalent of requests one library everyone used that was designed to be super easy okay thanks so I have two questions regarding RQ I'm already using RQ and I was going to ask first doesn't make sense for me to switch to RQ as a drop-in replacement if I don't use any as in IO that's first question I'm using RQ now can I just switch to ARQ for my completely synchronous code does it make any sense to switch to ARQ for this I mean you could do four advanced features like running tasks sometime in the future and re-inhewing the job if the worker shuts down you can do you might want to go and use thread pool executor or process pool executor from within a particular job to do that job in parallel but in general ARQ actually the same as RQ is only running per process it's only running one job well RQ is running one job at a time per process and it thinks you run another or another job in another terminal to run multiple workers in parallel ARQ will run up to 60 jobs at the same time using async layout but obviously if your task is not networking or suitable for async layout then only one is actually going to be running at any one time so actually that was my second question so if I still have just synchronous code it will still be running one at a time unless I fork multiple workers anyway right? either you run multiple workers or from within a job or process pool executor or thread pool executor ok thank you very much maybe we're finishing early never no any further question folks oh yeah go for it I can give you the microphone is there any advantage to having a flattened list of coroutines that you want to run as opposed to calling a couple of different coroutines which themselves gather a list of coroutines that doesn't matter as long as it's running on the same event loop it basically doesn't matter I guess there is some overhead to running a coroutine but it's so small that if you're worrying about those kind of steps you need to go write it and see or ask or something in general don't worry about it that's not going to be your problem I mean at some limit of as I say at some limit it will become but at that point you're going to do it another way probably any further question otherwise I have questions I can ask you about some sample use case real-world use case for say ARQ like you use it in your job if you can talk about that or how you're seeing people use it so we use it for sending emails so Tudor Cruncher, the company I run we send I guess about a million emails a month not a great deal but at the point it gets to quite a high load we are currently tethered to Mandrel although they are awful and I hate them because our 300 customers have all set up their DNS records to send emails from Mandrel so moving over is going to be hell about 5% of emails you try and send from Mandrel you get back 502 or 503 or just a broken HTTP request and so we're using ARQ both to go and send lots of emails quite fast but also to back off and retry those jobs when they inevitably fail quite a lot and so does ARQ for instance have the facility to recu the failed jobs for you and that kind of stuff let me try and get on the internet so here's an example of ARQ which is not especially different from what we were looking at earlier we have some tooling ready for setting up the things we're going to need when we're running jobs so you have a bit like in something like AIO HTTP where you have startup coroutines for setting up say your database connection you can do the same thing here so you have startup and shutdown where we can add to this context which is the first argument to any any job we set up any function we set up and I'm trying to remember here if I'll have an example of retrying jobs so basically there's an exception that you can raise which will retry the job and that is what is raised if you shut down the worker and the jobs haven't had time to finish so any job that gets shut down that gets stopped because the worker shuts down will automatically be re-incuged next time because the job is not removed from the sorted set until it's been finished so the problem with ARQ is I rewrote the Heroku worker which basically deals with shutdown behaviour in Heroku because Heroku workers shut down variably that was killing us and raising invoices for example which is one of our slower jobs and so when I built or rebuilt ARQ I built this principle that your job might run twice but it will always run at least once so if your job turns to take care of the fact it might happen more times but if it shuts down it will get re-incuged okay and in case your job runs multiple times I guess you get only one result it runs twice that's your job then to use an either potency key to use a transaction or to do something in redis to say has this job already started that's your problem but there's a kind of principle that you can never have exactly once so it prefers multiple times over take an example of sending your customer an invoice each month they would get a bit confused if they received it three times but that's still better than them not receiving it at all and that's normally the case any question anybody? okay so one last question and then I ask the next speaker to please come up slowly and set up there is no next speaker do you have any experience like releasing the gil in like C extensions and stuff like that and writing those so if you can speak about how you can do it if there are some tools that one can easily use so I haven't written I haven't released in pypyc extensions pydantic we've just had a big effort as a community to a lot of people did a lot of work on it which is really cool to see the whole thing is now compiled with sython which made it I think about 50% faster for lots of stuff but that is some tweaks you have to make to the python but it's still valid as normal python so on environments like windows where we don't have those binaries available it still just works exactly the same okay that's cool if there are no more questions we can thank the speaker thank you very much