 Yeah, thank you. Thank you everybody. So basically, this talk is about multi-processing, multi-threading, concurrency, and parallelism. So this talk is basically for trading and processing. So basically, while doing a programming, using a threads and processing, we do mistakes, we have to use threads, we have to use processes in different programming language, different different compilers, has different different conditions and conditions and methodologies to execute thread processes. So in this talk, I will start from basics, like how to start a thread, how to process and basic thing from that, we will slowly go to advanced topics and we'll cover some internals up and see Python as well. Okay. So that bit about tell me about myself, bit about myself. So basically, I started my career with a swing or there I was a software engineer and currently I'm working as a senior platform software engineer at Substance. I have experience of development of large-scale fault errors and mission critical systems and some ETL as well. And I'm passionate about web backend and infrastructure, and I'm a Pythonist and I'm a gopher. Yeah. So basically, what is parallelism and what is concurrency? So usually people do a mistake in this term. So many people think that parallelism and concurrency is same, but reality they both are different. They both are not same. So basically parallelism is like to add more processor, more CPU cores to make computation faster. It is to add workers, to add more cores to your task. So suppose you are executing some task or doing some computation or something, then you are adding more cores, more processors, more workers into it to make it faster. Another term, it's a concurrency. Concurrency is related with a parallelism. Concurrency is to permit a multiple task to proceed without waiting for each other. So for an example, I have four processor, like four CPU cores or four processor with me. So how do you utilize them perfectly? How to, I mean, parallelize the task in between? It's like two, two. So for an example, I have four CPU cores and I want to divide all the workload with all the CPUs. So it's about concurrency. So concurrency deals with that things, okay? So that's a parallelism, one example. There is eight boxes on left-hand side and there is two ways. So there are two guys, two workers, whose task is to take one box from left-hand side and put it on right-hand side. So here two ways are there and two workers are there. So they will do this work, I mean, one by one. I mean, first of all, in a parallel manner. So first of all, that both guy will go or take that box. They will go on way one, I mean, both the ways and they will put it on right. Now parallelism is to add one more, I mean, one more worker on both sides. So like I have added two workers. So total worker is four workers. So it's a parallelism. So now here in this example, I have two ways. So like, even though I have four workers, then also I cannot get the speed ups here. The reason is that there are two ways. So at a time, one worker will work. He can, I mean, four worker cannot work at a time. So here concurrency comes in. Concurrency is like to create more ways to execute it. Yeah, so that's the term, parallelism and concurrency is about. Another term is multi-trading and multi-processing. So it's a very simple term. It's like to see, I mean, your operating systems, ability to run your multiple tasks in parallel manner using a threads or processes. It's a multi-trading and multi-processing. It's a very simple term. It's two, and threads are nothing, but they are part of processes. So we can call it a lightweight processes. And processes are, it describes a program which we are executing. So multi-trading and multi-processing, it's like that there are too many puppies and they are eating everything. They are dealing with their own stuff. So it's a parallelism. They're paralleling the stuff. Okay, so let's have a demo of how to start a thread. In a Python, it is very easy to start a new thread. Their API is really, really easy. So basically, trading is a module which deals with the threads in a Python. And thread is basically a class which deals with the threads. So first of all, let's create one function which will do a simple, very simple stuff. It will do a print hello world. Oh, sorry. So it will do very simple stuff. Now what I will do is I will, creating a new thread, check. So in a target, we will specify a function name. And in arguments, we will specify argument. It's specifying blank because there is no argument. T.start, it will start a thread, T.join. So it's a method which will wait until your thread execution stops. Now I'm going to run this out. So it's thread started. It printed it, and that's it. So that was a very basic example of threads. So in a Python, it's very easy. So basically, this is a Python threading. It's a module. It's a high-level module under the, it's a base module, it's a thread module which we should not use it. Basically, we should always use a threading module because the thread is inherited in a threading module. In Python 3, a thread module is renamed to underscore thread. So that's why it's like, I mean, we should not use it. I mean, another thing, another module added into Python 3 is dummy threading module which provides, so whenever, so if there is no underscore thread module available, then it will raise the import error. That's why in Python 3, there is a dummy threading module which we can use at the depth. Yeah, so basically Python threads are system threads. So whenever we start a thread at the time, it will open, it will request the operating system to start a thread, then operating system will manage everything. So for example, I just called the start method. So it will call the system, I mean, operating systems API to start a new thread. And in a Linux machine, it's POSIX threads. So basically POSIX is nothing but in older days, there was like many, there was many hardware manufacturers. They used to provide their own APIs and everything and it was creating some trouble for developers because they had to write different different API for different different vendors. That's why one new standard defined for that and it's a POSIX. So in a Python, there is a P thread library is there. So it is being used by, I mean Python threads. In a Windows, I mean, API is provided by Windows which is implemented in Python. And all the scheduling will be managed by operating system. So how to switch a thread, how to schedule it, or everything will be managed by operating system. Python will not deal with it. Yeah, okay, so that's the one thing, let's execute. So let me show you some other example of threads. So basically I'm like looping until n is not greater than zero. So now I will try to, yeah. So now I specify a target, a function name to cal, target is two, okay. And I'm defining another thread because I'm going to start two threads at a time. Okay, now I will start thread one first. So after this method, I mean, it will start starting thread two. I will wait for thread one to complete, wait for thread two to complete. Sorry. Okay. Okay, okay. Let's keep this for now. So basically it's like whenever I will execute two threads at a time, for example, one method, it's taking two seconds to complete it. And what I'm doing is I'm executing two, I mean, two threads using a same method. And actually it should run in parallel mode. But in Python, when we are starting more than one thread, they will not run in parallel. So basically in a Python, parallel running of threads is forbidden. I mean, any number of threads you are opening, you cannot run in parallel. So it's a log at the processor level. So basically if you have a two processor, then it will be parallel. But if you have a single processor with multiple cores, then it will, I mean, then one thread will run at a time. So that's a GIL. It's a global interpreted log. So it's a global interpreted log. It's implemented in Python. So whenever one thread is running, at the time it will take a log and it will not allow any other thread to execute until that thread execution completes. So that's the one thing. It's a GIL. It is suitable for IO bound operations. The reason is that whenever there is a IO operation or something running on to it, at the time it will release its log and it will, I mean, it will give its control to another thread and that will start, I mean, another thread will continue the execution. So that's why it's a bit better for IO bound application because it also releases GIL, and it can prove interpreted log on read, write, send, receive methods. Like, yeah. Then it is bad for CPU bound applications because the reason is that, like, it may possible one thread takes too much CPU and it will never give a chance to another, I mean, another application. But there is a handling for CPU bound applications as well, but although it is not suitable to use it. So basically global interpreter log runs like this. It will run whenever there is IO at the time GIL will release its log, then another thread will run, and this is how the global interpreter works in a Python. So one more thing is there in Python. So for a simple case, if there is IO bound or something, IO application is there at the time it will release a log. For application or function, a calculation which is taking too much time, it is holding a CPU at that time. Python has a handling for this kind of application. So what Python does is, like, Python has a tick event for CPU for most all the threads. So what it will do, every 10ms, it will send, it will, okay, it will, what it will do, it will, I mean, unrelease a log, it will send operating system a signal to unrelease that thread, and again acquire, so operating system will reschedule it by itself. So it's like to send a signal to operating system to release a log of that specific thread, even though it is running, and so operating system will reschedule it to 5th thread, to acquire or, I mean, and you can change it using sys.set, I mean, set interval. Thread pools are to, whenever you want to, restrict a number of threads you want to open, and you have too many, I mean, too many tasks to do, and you don't want to open more than, I mean, more than allowed threads. So at that time you can use the threads. These are nothing but it's kind of a queue, where you are adding your task, and it will assign it to open threads. So basically thread pool is like, when you are starting, you have to tell it that I want to start the just force if you, and if you will, then it will queue that all the inserted arguments and everything, and it will put it, to assign them to all the started threads. So it's like a pretty good, and there are some methods. So I cannot, I'm sorry, because of time constraint, I cannot process demo, but it is similar to processes threads. It is very similar to threads. It is a module to interact with the processes to start, stop, to do various operations on the processes. Yeah, so Python will create a system-level processes. So whenever we'll start a process, at that time it will create a child process under a Python process. Yeah, and good news is that, it will bypass a global interpreted log, which is there in the threads. So if you are starting a more than one processes, then it will, your calculation or your program will run in a parallel in a threads. It will not happen. If you are starting a fourth thread, then at a time one thread will run, but in the processes, it will run, I mean, when it works on both Linux and Windows. Yeah, like a thread pool, you can start a process pool as well. If you want to restrict the number of processes, like at a time, that's where the four processes running, then you can do it, and tasks will be distributed across them. Yeah, there can be this kind of situation as well, like many processes can interface to variables and different memory accesses, too, of different processes. So how to deal with that? So it's a basically deadlock kind of situation where you have a resource to more than one resources. Resources can be anything, or network resource, or files, or any kind of resource can be it. So like thread A want resource one, and thread B want resource one, and they both allocated resource accordingly. Also, there can be one more situation where this can occur. It's like thread one wants object one, and thread two also wants same objects at the time situation can occur. There are semaphore, semaphore for this kind of situation, and it was invented by a Dutch computer scientist. Semaphore can be of three types. It's a binary semaphore, counter semaphore, and mutic semaphore. In a Python, binary and mutic semaphore are same, and in a counter it's also provided. So let me show you. Okay. So basically it's a lock and a detrant lock. So locks and detrant lock are to, if you are executing something at the time if you want to restrict some code that if one process is executing it, then no other process should execute it in a parallel. At that time, you can use a locks and a lock. A lock and lock is different in the terms of execution. So a lock can be used in a recursion. Whenever you are doing a recursion, at that time you should use a lock because it is for that. Locks are for normal code. So this is a code. So there is a part one and part two. I can acquire the lock and I can release the lock whenever I want. So that a code which relies in between a lock, acquire and lock release will run. So whenever that part is running, at that time no other process will run. It will wait until that code execution stops in some process. So it is for synchronization between processes. So there is a library for semaphore. Semaphore is another way to deal with these kind of situations. Semaphore is to... So semaphore is we can define like a number. One number like what are the maximum count? Like if I will define some number like nine, then release and acquire should be according to it. So it's like if I am acquiring a nine lock, and so whenever I will acquire a nine lock, so there will be a current value of semaphore. It will be a zero because the reason is that you have to release that lock. So whenever someone come, I mean some process will come to execute it at the time it have to wait until some release will... I mean lock release will come. So it is for semaphore. This kind of semaphore is suitable when you want to have some limit, some network limit or something. For example, you want to send, I mean some very less number of HTTP requests. You want to send 10 requests at a time. At that time you can set number accordingly and you can use it. Bounded semaphore is like whenever we reach to zero at that time instead of waiting for waiting, it will raise a value error. So it is basically suited for same application, but it is of different type. Yeah, there is one more thing is that events. So events are basically... So basically there are many conditions in programming where we want to wait until some condition. So for example, I want to say that whenever I will get some flag from network or HTTP request response, then only all the process should start or it should start its execution or start its processing. Until that, it should pose or something. So in this kind of situation, what we can do is like, I mean whenever we can write event.wait in different, different where we want to wait, where we want to stop, then we can event.set. So whenever event is set, at that time wait will not work, but when we will do event.clear at that time all the wait will... So whenever that program will come to that point, it will wait until we will call event.set. It will be a blocking until we will call event.clear. Timer is also there. Timer is to execute a function after some interval. So if we want to execute some function after 30 seconds or something, then we can use it at function. And delay can be there. For example, I have said that I want to execute this after 30 seconds and it may possible it will execute after 31 seconds, 32 seconds. The reason is that it is using thread internally. So because of globe interpreted lock or something, it may possible, I mean delay can be there in this function, in this pipes. Pipes are basically a data channels that can be used for inter-process communication. So it's a channel. So it basically returns two file descriptors, one for write and one for read. So whatever you write to write, it will be catch by kernel and you can read that using read object. So basically Python provides two types of pipes. It's OS.pipe and multiprocessing.pipe. OS.pipe is interface on top of Linux kernel. So whenever we request to open a pipe, it will open a pipe on top of, I mean, it will ask the operating system to open a new pipe and it's just interface to start and stop like pipe and to deal with it. Pipe is one restriction. It's in a Linux. It has 64 kb of limit and it uses encoding and decoding while sending and receiving the data. And in a Linux, it is implemented using a length of 4.6, 4.6 standards. In Windows, it is implemented using a create pipe method API, which is being provided. And multiprocessing.pipe, it's a socket implementation. Sockets are files of memory mapped in-memory file sockets and it is a full-douplex. So it will also give you the read and write objects, but I mean, both has... I mean, on the both way, you can do a communication. It uses a pickle to send the data. Pickle is nothing but it's a kind of comparison, not a comparison, but on an object sending, you can send the object by, I mean, pickling your data. Queue is also there. Queue implementation is created. Python supports three kind of queues, first-in, first-out, last-in, first-out, and priority queues, and it disposes and threadset. Set state can be used. So if you want to set some variable directly in between processes, then we can use a set state. Set state is basically... I mean, set state are to set some variable in between or some Python data structure in between. So, yeah. So in pipes, we can set some textual data or that can do data only. It's a simple file-like object. You can write anything, and it will be received on the run. But if you want to use a data structure in between inter-processes, then you can use set memory. So, yeah, Python provides them as well. And all the structures are thread and process safe. So basically, the values and arrays are multiprocess models, I mean, classes. So you can use them. Yeah, so that's an example of thread memory process, value and array. So it's an array is basically nothing, but it's a Python array implementation with some more trading and support for inter-process sharing. And there is another thing which we can use is the manager. Manager objects forks a new process whenever we start it. So manager gives one benefit that we can use... We can start... We can take a dictionary object. We can take a list object. So it's very good for if you want a dictionary and list. But it is lower than set state that I mentioned array and values. But, yeah, we can use it. Manager starts a new process whenever we will create an object out of it. It will give... It gives a proxy object which supports name spaces, digs, logs, returned log, semaphore, pound log, condition, event, queues, value and array. Yeah, manager can be used to set state on different computer processes as well. So for an example, I am on computer one and I want to share that data with another computer on another computer's process then I can use a manager to do it. But, yeah, it's lower than set memory. And yeah, in the end, don't do it. Avoid set state as much as possible. The reason is that it will decrease your speed in the end because it's a trading... I mean, they all are trade safe and everything. So that's why if you are writing something into it, so until you write into it, it will block you until that time. Yeah, so also one more thing should be taken care. Whatever object we are using in... I mean, multiprocessing... I mean, that set state, it should be peakable. It should be, I mean, peakable enough. In the pipes, we should also care about this thing. Whatever object we are using, it should be peakable. I mean, peakable in the science, it should be able to convert it into peakal format. Zombie processes. So we should take care about zombie processes or nothing but even though my program stops, I am executing something. I have started multiprocessing under a master process. I have stopped that master process even though that child processes are running. So there are the zombie processes, so zombie processes. So how to deal with that processes? So whenever we stop some program, we do controversy at the time, and Windows sends some specific signal to Python, and we can handle that signal, and we should send all the signal, handle that signal, and we should terminate that processes, or we can send the same signal to that processes as well. Yeah, our terminating process is... So instead of terminate, try to close it or try some events or something, so that... or conditions or something, so that... So terminate will gradually stop. It's like you are, I mean, taking a plug off your computer, so it will give you very bad results. So for an example, if you are fighting some file, I mean, doing some stuff, then it may possible. I mean, you will get corrupted data or something. And yeah, you... I mean, in global variable, if you are using global variable, then it may possible in the child process... and it is possible that in the child processes, you will not get the same values as you are getting in global variables. Yeah, that's it. Okay, thank you very much, Hitul. Unfortunately, we don't have time for questions. Sorry, there's no time for questions today, but just catch Hitul afterwards if you have something to say, and thank you very much. I look forward to seeing you the rest of the week.