 So we will get started with the talk. We will be taking up the questions towards the end if time permits. So this talk about is the Python serialization in terms of multiprocessing and multi-threading, basically parallel computing using Python language. My name is Saim Raza. I work at Denyshaw India. Denyshaw India is a part of the Denyshaw group, which is a global investment in technology firm. It is known as a pioneer in quantitative investing. It's based out of New York and I work in Hyderabad. So this is the overview of the topics that we will be covering today. We will see the limitations and advantages of both multi-threading and multiprocessing in Python. We will then move on to the inter-process communication, how it happens generally in Python. And then we will see the serialization part, which is called pickling in Python, how it is achieved. And there are two types of pickling. One is by name reference and another one is by value. Lastly, we will move on to some features of the disco pickle. This is the pickle module, which we use at the Denyshaw group for our quantitative research and general development as well. So what is parallel computing? So basically, it's just like you have a big problem and you break down into smaller problems and try to reduce the wall time by increasing maybe CPU time or maybe you reduce the IO bound past time. You don't want to pull on anything. You want your program to finish faster. And for that, we can leverage various techniques like using multiple cores on a single machine or using multiple machines in a grid. As I said, like you can reduce the IO waiting times on various responses. Another technique is like using asynchronous programming using an event loop. So we will be focusing on mainly two techniques here. One is threading and another one is multiprocessing. So multi-threading threads, people are, it is a general notion that Python threads are of no use, but that's not entirely true. We have some limitations in Python language with threads and that the main limitation is that only one thread is executed at a time. And this is because every thread needs Jill to get executed. So what what interpreter does is like it switches in between threads and achieves the concurrency. But parallelism, like it is not able to leverage multiple cores to actually do the computation work. That's why it is generally said that Python threads are just good for IO bound tasks and not for CPU bound tasks. We will see examples for both of these cases. So consider one example wherein we just want to open 16 URLs and we do this using a single thread. We just started Python interpreter and nothing special. We do start opening these URLs serially in a loop. So as you can see, when one task ends, only then another task starts and everything happens in serial. This takes around six seconds. Now, we will try to use threads for this particular problem. So how in Python 3 plus, we have very good library called thread pull executor in which you can spawn four threads as simple as in three lines like this. So here we are using four threads. So as you can see, as soon as any thread is completing one task, it is starting another task and the wall time of the task is actually reduced. It has come down from around six seconds here in the serial one to around two seconds. Now, but what if we do the same task using multiple processes? So as you can see, the time is slightly higher than when we were using four threads. These are the results. So in serial, it took around six seconds with threads, 2.1 seconds and processes, 2.8 seconds. Why four processes took more time than four threads was that there is an overhead of spawning new processes and getting the things done and getting back the result from all those processes. So parallelism for this task is achieved with both multi-threading and multiprocessing. But there is a slight overhead with multiprocessing. So threading is the correct solution to go. Now we will consider another task wherein we want to actually use CPU, counting to a very large number and just incrementing a counter. So the results of this task are used. Like in serial, it takes around 4.2 seconds. But when we try to use four threads, it is more than what we were doing in serial. But if we use four processes, it takes like just 1.9 seconds and we get the benefit. So why using the threads actually deteriorated the performance? The reason is, as I said earlier, only one thread is executed at a time. So when you have some CPU intensive task, the Jill has to switch between various threads and execute those on the CPU. That's why because of this overhead and concurrency between the threads, the performance actually worsens. But if you use multiprocessing, you have separate Python interpreters, there is no issue of Jill contention and you achieve the required parallelism. So in multiprocessing in Python, how actually that works internally? You have a master process and you spawn multiple processes to take advantage of nCPUs. So here in the diagram, when the master process is communicating with its workers, it is sending these Python objects. And then it gets results from all these worker processes and combines and gives you the actual result. So how it achieves is using serialization. The Python objects are serialized into a byte stream and then it is sent to the worker processes. And then worker processes send this byte stream again in the form of Python result objects and back to the master process. And the module which is used to achieve this is called pickle. So here what we are trying to do is we are just reading the process ID. And we are launching just one worker using the process pool executor. So as you can see, the current PID is different from the PID of the sub process that we are launching. So everything works fine. OK, processing is working great. Now, when we try to do the same thing with a lambda, it breaks. And the error it is throwing is that it can't pickle function objects. That basically means Python interpreter is not able to serialize this lambda function. So what went wrong is like pickle module has a limited set of objects that of Python object that it can serialize. It doesn't work for lambdas. It doesn't work for classes and functions defined in your Python, Jupyter notebooks or any code which lives outside of a module. So how pickle works internally, it takes the fully qualified name of the Python objects and it dumps just the name. And on this deserialization part, it reads that name and recreates those Python objects. So here you can see when I'm dumping OS.getPID function, which we used earlier, it is dumping Cposits and getPID. That means this getPID function lives in this Cposits module. And these both things, the Cposits module as well as the getPID function are available in the sub-process. That's why it is able to launch. That is why the multiprocessing is not breaking. But when we try to do the same thing with the lambda, we see the same error again. Because lambdas are unnamed functions, there is no reference to them. And even if you put this lambda in a module and do function equal to lambda this and define, you will still get the same error because lambdas are not, by default, serializable using pickle module. Here is another example, like we commonly use a Python or Jupyter notebooks. So if you just write a function and try to leverage multiprocessing on that function, things will again break. So here I wrote a function called a test underscore funk. And when I try to execute it using the process pool, I get this error that this function is not found in this module main. So when you start your IPython, the module's name is main. And when a sub-process is also launched, the name of the sub-process is also main. But you define this testfunk in the master process, but it is not defined in the sub-process. That's why pickle module says, oh, I was not able to find this function in the main module in the sub-process. We have this function in the master, the main process, the IPython process we launched. So what is the solution? So if pickle just wants the name references, then what we can do to optimize it, we can actually serialize the code. And that is generally termed as serialize objects by value. And not just the code, we also have to recursively get all the variables in the local scope, in the global scope, the closers, the module name, its qualified name, everything we have to take. And take all these things in the sub-process to recreate the exact function that we were defining in the master process. And open source libraries like Dill and Cloud Pickle actually cater to this particular problem mainly. So this is the sort of information we need, the minimum information we need to serialize this lambda, the lambda function we saw with which the multipossessing break. So we have this byte code, which is basically the source code in a serialized byte form. We have its name, we have its module. So we can leverage this take in the sub-process to create the exact function lambda and execute that and work around the pickling problem. Now, okay, we have the solution by why isn't it implemented in the vanilla Python? So there are two points mainly, one is the security flaw. So you are sending a byte stream, but there is no security mechanism around it. And if some malicious code is injected as this byte stream, then it becomes a security issue. So when Pickle serializes by name, during vcrelation, it actually goes to a file and tries to find that Python object by name. So in our example, it was trying to find a function testfunk in the module main. That is safer because people can actually provide better security for their source files. Second is tail code. So we said that we will serialize the source code. But what if the source code changes after we serialize? But before we deserialize, that is also an issue. So let's see. So we have this limitation. Limitation should we really do it? So there is a trade-off between use of multiprocessing to speed up your programs and writing only Pickleable objects. So PickleModule gives you a limited set of objects which you can use. If you just use those, well and good. But if you want to use lambdas or if you want to write your code in IPI concessions, it won't work. But the second thing, the tail code problem, we have this assumption in our multiprocessing framework that the time it takes to send from master to worker process and execute that code, it is short lived. We just want to do some testing. Most of the times in our Ipython and Jupyter notebooks, we just want to develop something quickly. And we don't want to get these Pickle errors because when you eventually move that code to our module, it will start working. If I move that function test one to our module, then Pickle will not complain about it. So it is a big problem and very crucial for quick development. Also, if you can ensure a secure network in a data center or your internal network of the company, then the security flaw becomes a lesser concern. So as I told you, towards the end, we will see about desktop Pickle. So desktop Pickle is the Pickle module to re-trade, which we use at Disha and company. So we got these performance numbers. As you can see, the Pickle module used at Disha can be 100 times faster, even more times faster than this open source solution available like Dylan Cloud Pickle. So the highlighted are like it's taking around 30 seconds for a big list, but Dylan Cloud Pickle takes 30 minutes around to serialize the same Python object. So it's very performant than any open source solution. So what have we done special to achieve that performance number? So we actually leveraged the C-Python implementation of the Pickler class. So you have two implementations of Pickle in C-Python. One is pure Python and one is actually implemented in C. So we leveraged the C-Python implementation in C. While the open source solutions like Dylan Cloud Pickle, they just subclass the pure Python implementations and the looping and everything happens in Python and that's why they are slower. Another point that we have achieved is like importing the desktop Pickle has no side effects as such. When you import Dylan, it removes this Pickle dispatch registry. So what this dispatch dict is, it is a mapping from the type of object to the function which should be used to Pickle or serialize that type. So if you import Dylan, as you can see, it is overriding with its own functions with Dylan and for dict and function and for other types also. It injects all its serializing Pickling functions. This doesn't happen with these desktop Pickle. So another major point, advantage is that the desktop Pickle uses the special serialization functions only when required. It doesn't override the functions in the dispatch table by default and only when it is required. So this has a big advantage is that when you generate that byte stream, it will actually have the reference to the function you use. And with the desktop Pickle, the probability reduces because with Dylan, you will always get some reference to Dylan function. But with desktop Pickle, if the native Pickling works, then it uses the native Pickling. Otherwise, it just falls back to the special serialization functions. So this thing we have been doing for over a decade now and before Python 3.8, there was no way to leverage the C Python implementation of the classes. So this is a sort of point of foresight. I think we have done with the talk. I would be happy to take any questions. Hi, Suryam. Thanks for joining. Hey, hi. Sorry, how would I pronounce your name? Just say Sam. It's a bit like sign. You can say Sam. OK, hi, Sam. So quick question here. I used to work with different versions to say, at times it happens that within different teams, different versions of Python are used because the servers which they are working on, remote servers, they're different. So say I have a code segment and I've already pickled in my code and that is used somewhere down the stream. So does it have anything to do with different versions? Would it break? So say one team has 3.6 and others 3.7. Banks, banking domains do not usually upgrade their versions as soon as they are launched. So as my team was still working with 3.5, I just at least it was a headbreaker to make them to upgrade their versions to 3.7. And still, they're not upgrading in 3.8 or 9. So that's what I am talking about. So if I have my code written in 3.6, then I pass it on to some team that using 3.7 with a code break due to this pickling stuff. Yeah, Satyam. Pickle handles it well. Because it is the native C Python implementation you are talking about. Right. Yeah, Satyam. So as we are actually serializing the code and it will have off codes and other things. And C Python implementation doesn't guarantee that those off codes will change. So at least within major versions, between 2 and 3, we were not able to achieve inter-portability. But within minor versions like 3.5 and 7, it should work fine. Because we have code like move from 3.6 to 7 to 8. So that has not been an issue for us within minor versions. OK, because I was using Naive Pickle package which Python provides, which isn't pure Python. So it used to take a lot of time to pickle. And then again load. So on initialization, I used that load stuff. So maybe I'll try this out and see how much it improves my performance. Sure. Thank you for sharing this thing. Thank you, Satyam. So Sunder Raj is asking, how does C Python work with Desco Pickle? So we basically actually, there are multiple ways you can make it work with Desco with C Python. You can actually write extensions with the same code. Or you can even patch the interpreter. So we are considering open sourcing our Desco Pickle. And I would invite you to follow our D-Shop page on GitHub for further updates. So I'm getting some questions around async programming. So within async programming, in Python, you can only execute things. You can achieve concurrency. You can switch between various tasks. But if you really want to leverage multiple course, then you will have to, again, come to multipossessing. And this realization will come into play with multipossessing, as well as if you want to launch your tasks on a grid of computers, I will post it. So in the chat, I posted the link to our GitHub channel. There you can find more updates. OK. So like, Vignesh is asking to explain more about async and parallel processing. So basically, Vignesh, if you see from the user's perspective, you just want to speed up your programs. And you have to decide what type of problems you have. If you have Iobound problems, then you can just use threads. Then Python will work fine. But if you actually want to leverage the CPU, then you have to go to multiple processes. With asynchronous programming, if your tasks are even driven, so you want to checkpoint your state every 30 minutes or so, then you can just register your callbacks with the async loops like async.io, library in Python 3. And you can have a lot of these event-based tasks that you can leverage the event loop for. With multiprocessing or computations on grid, you will have to use serialization to achieve performance boost. Yes, I am enjoying today. That's a good one again. Yeah, so you're thinking about parallel and things. I'll also like to share. I have moved from threading in parallel processing to async recently. Like it's been quite much now. I'm in love with async frameworks in Python now. So what I feel is this part of using async is that it is very easy to handle. So when you're using threads or you're using parallel processes, so you need to take care of everything. If we talk about naive multiprocessing or naive threading, multi-threading, so you used to handle the threads, maintain state of every thread. Also, threads when they are heavier as compared to coroutines. So this was one of the major benefit. You only have a single entry point, a loop which is responsible to run everything, maintain the state. It only takes care of everything that which has to stop, which has to pause, which has to run. So you just sit back and just enjoy the flow, which is going on. And with even loops, you don't have to worry much about concurrency like modifying the global objects and these things. With threads, you have to consider those aspects as well. Right, you get race conditions more often. You need to take care of them. But when you're using coroutines, you don't have to take care of anything like that. Because I seldom think that there would be any race conditions if you're using coroutines. Because it is all taken care by the process itself. And no two coroutines are working simultaneously. Right, right. This is concurrency, this is not palism. Right, right. Yeah, and it comes to be way more faster than threads, I feel. Because I've seen the performance improvement when I moved to Async from, I was using concurrent futures before. So I am still using it, but this time I'm using it with Async, my runners, Async runners. So you have option to use your executors as well for blocking codes. Yeah, for non-blocking code, if you're using like AIO, HTTP, you're using AI file, it is way, way faster. But if you're using a blocking code, so then you will have to obviously use executors. So using executors also won't affect your performance. They work very seamlessly and very, very fast. Only a bit. I would like to, yeah, so can please go on. Yeah, only a drawback is that, I think framework is quite difficult to catch at first, but once you get used to it, then it is like a cakewalk. So I would like to point out one important concept that I'm pointed like parallelism versus concurrency. So with Python threads, as you can see in this graph, the Jill is switching between these four threads. Whenever like there is a weight on some URL, it is a Jill is switching and letting that thread wait, but executing other things. So this is concurrency. Right. So you can achieve concurrency with Python threads, but not parallelism on multiple cores. Right. Which is different from other languages like C++ and Java. And in Python also, you can, if you really want to achieve that like parallelism, then you can write your C++ C extensions and float in that layer and spawn C++ threads. This is what NumPy and other libraries do. To give you the performance given in Python. All right. So initially I was working on this, but my process is required multiple things to work together. So I moved the same, I migrated the same code to Golang for that purpose. So Golang is like meant for concurrency. So now I'm using a mix of Python and Golang to work in saying I written few services in Python, few in Golang, which services needs to run more concurrently those I migrated to Golang and the other services are in Python because Python is very easier to write. Let us quick to write a service and bring it up. Yeah, Sathya, there is always a trade-off of some sort. Right, right, right. We have a minute or so if anyone else like to ask more question. Okay, I will end the talk. Thank you guys so much for joining. I hope you had a good learning time with this talk. Thank you.