 Anyway, so next up we have actually a double header, as it were, we have Liddy Zeng and Paul Frises. Yeah, more or less. Paul Frises. All right. And so Liddy's a software engineer at Google under the tech Infa network systems area. He is an active member of GRPC repo and most contributing to GRPC Python focuses on API design, distributed systems tooling. Prior to Google, he got his master's degree from CMU and had several years experience in tech startups at Beijing. And then Paul, who's also speaking as part of this spot, has been working the last four years as a software engineer at Skyscanner, had a chance to build, run and own many Python services at scale and production, mainly on async.io. Gave him an opportunity to contribute to GRPC, as well as several other projects in the Python ecosystem in the open source community. So they're going to be speaking on, unsurprisingly, GRPC Python extensions and async.io. So welcome to you both. Hello. Okay, let's get started. So hello, everyone. Welcome to join our talk today. We're going to discuss how we build GRPC Python, how it utilizes Python extensions and how it integrated with async.io. If you have any questions, feel free to post it in the Discord channel. It's called hashtag talk GRPC and async.io. So please allow us to introduce ourselves again. So this is Li Li Zheng. And I'm a software engineer at Google. I have been a maintainer of GRPC Python since 2018. Paul, do you want to introduce yourself? Yes, thank you. So this is Paul. First of all, I wanted to provide a great thankful to SkyScanner for funding my time for working on that project the last year. Sadly, I'm no longer a member of the SkyScanner family. I'm currently working at ono.com. As most of you, I'm a Python enthusiast, but what I do like more is solving any kind of engineering problem. And I try to do my best also these last years for contributing to many different open source projects. Passing over to you, Li Li. Okay. So next, let's get started. So what is GRPC? As the name suggests it, GRPC is an RPC framework that built upon HTTP2 as its transport protocol. So it's meant to be fast, lightweight, and it is designed for distributed systems. So it carries some highlight features, for example, streaming RPC, and various load balancing policies and ways to do the load balancing. And we also provide interceptors. So you can engine logic at any stage of an RPC. We also integrate well with Potobuf. So it enforces your API contract. Currently, we're getting around $400,000 per day. And on the right side is our cute new logo. And her name is Golden Retriever Pancake. And she's cute. So there's another reason for you to try out GRPC today. So before introducing GRPC Python, we have to introduce GRPC Core. Core is actually the component that handles all the HTTP frame processing, serving compression, security, load balancing, all this complex stuff. And Python is just a thin wrapper over it. As you can see, it handles so much functionality. It would be unwise to build it again and again for each different languages. So many languages like C++, Python, Ruby, we are just a thin wrapper over it. And in total, we have 14 supported languages. And it gives us not only better performance, but it also lowers our maintenance burden. Okay. The also gives us some frictions. For example, sacfolds. So Python developer really doesn't really familiar with sacfolds. And we have seen a lot of complaints about don't know how to debug it and it's complex. And also memory leaks. Python's memory management model is very different from C++ once. It is very error prone when we are trying to manage the lifecycle of C++ object in Python space, as well as compilation across platforms. So compilation in Linux and Mac OS probably will be easy, but how about compile on Windows? And to solve the compilation part, we not only distribute our source code, but we also distribute binary wills. So JPC-Python users don't need to worry about that. What does Python C extension actually look like? So C extension, they are module written in C++. On the right side is a short example of it. As you can see, all this code, what it does is it creates a module and you're trying to print the hollow world. On the first line, you can see the header Python.Edge includes all the API you needed for manipulating Python objects within C++ space. It's quite complex to write. Also, the API itself varies from version to version. So to make things easier, people are trying to come up with simpler ways to do it. For example, like better C++ framework and glue code generators that helps to ease the pain of writing glue code. And there are many ways to write C extensions in the market, and I'm going to talk about only three of the represented ones. So first is PyCliff. So PyCliff, it is a templating language. It works really well when you're just trying to expose C++ interface into Python space. But if you're trying to do something more complex than that, when you're trying to meddle around with the threading model, life cycles, and it won't be sufficient. But next is Py by 11, which is a portable lightweight header-only C++ library. And I can't really complain. But it requires people to code in C++. Since this is a PyCon, so I put it in the drawback side. But if you are a C++ fan, maybe it's a plus for you. And finally, Sysen. So Sysen is a language very similar to Python. It's very easy to develop, and it's adopted by NumPy, SCIPy, and TensorFlow. However, even though the language itself claimed to be a superset of Python, but it's not a strict superset, you will see some cavea, you will see some weird quirk of Sysen language itself when you're trying to use it. So eventually, GRPC-Python team decided to use Sysen because it is similar to Python. So when GRPC-Python users want to take a step ahead to help improve the library, they can. So let me introduce how does Sysen work in a minute. So, for example, we have a prime checker. So it does very simple mass computations. You try to loop over numbers and try to see if it is a factor of the number you are checking. And we put it in the right place. You can import it just saying import prime checker and then use it. What about Sysen? So first, Sysen not only compiles Sysen-specific code, you can also compile Python like, entirely Python source code. But down below is a Sysen version of it. You can try to import library from C or C++ with C import and you can also define static variables like C def double root. And after that, you can compile, Sysen provide tooling for you to compile it into thousands of lines of C++ code and your setup tools will help you to compile them into a shared library object. And when you put the shared library object in the right place, you can just import it like any other Python module and use it. So when you are trying, so if you have ever trying to debug a Python extension, you will find out it's a hard challenge because when you're trying to do GDB Python, like Python 3.7, when you're trying to see the backtrace, all you can see is evaluate a frame, evaluate a frame, evaluate a frame. It does nothing, like it makes people mad. However, this is a problem troubling the Python maintenance as well. So they come up with this Python.gdb.py script built for GDB. It ships with every single C Python source release and you can find them in your installation folder. In your source code, in your downloaded source code folder. After the Python GDB mode is turned on, suddenly GDB understands all the Python variables. So you can see Python's backtrace, you can see Python code, and when you're trying to print some variable, it doesn't say PyObject, but instead it tells you what's inside that PyObject. And next, so with all the effort to make Python work with C++ and finally we got a jrpc Python working with jrpc core, but there is still one problem troubling us. So let's say we have a jrpc server and it's calling methods from jrpc core. And it's running in a POSIX thread. So both Python threads are POSIX threads, unless you're using gevent or eventlet, and they will try to monkey patch all the standard library including threading and swapping out the POSIX threads with code routines. And, okay, so we have a valid thread and then in order to make it a server, we also need a polling thread and a bunch of executed thread. Each one RPC will consume an entire executed thread. So with limit number of executed thread, it also limits our concurrency. To make things worse, there is this global interpreter log that is restricting all the POSIX threads. So it not only makes the performance worse, but it also causes deadlocks when we are jumping from Python space to C++ space and back to Python space. Okay. However, we find a solution. So next, Paul will discuss how we solve this challenge by using ACIO. So let's leave the stage to Paul. Thank you so much. Could you stop sharing, please? I'm going to share it from my own. Where's the button? Okay, got it. Can you see my screen? Yeah, we're good. Perfect. So I think it was almost one year, one year and a half ago, when we started that initiative together with a lady and other members of the GRPC community. It was at that time that at the SkyScanner, we were looking for, because we were making a transition between HTTP or we were trying a transition between HTTP and GRPC. We had some of our main services running with Python and CIO and we didn't have any solution for moving from HTTP to GRPC without giving up on the CIO. And this is the time when we started Disney together with GRPC for trying to solve that problem. So what was the first headache that we had? As most of you will know, when you are running a CIO application, when you are using the wait syntax, basically what is happening behind the scenes is that you are returning back the control to the loop, providing a future, which basically would be called later for waking up the task that was put in sleep. This pattern, which is the main pattern that is being used by a CIO, was not reproducible by using the primitives that they were exposed by the C++ interface of GRPC, because basically the main interface for pulling events was a blocking interface. Not only this also, we needed a kind of IOM manager system that would allow us to implement all of the socket network operations using the Sankyo. We were facing two different problems here. The first one, the IOM manager, was something that was addressable since GRPC was providing a way for overriding the custom IOM manager, but there was no unblocking primitive for pulling the events for the GRPC code. Other frameworks like GEven and Node.js because different circumstances they were able to address that issue. But suddenly the GRPC sub-C++ team they implemented a new compression queue callback which basically was based on callbacks. So basically instead of having something that would block you, you would be able to configure a callback which basically would be called when the event was there, which would allow us to chime this event through an Sankyo filter. So yeah, I think we had pretty much what we needed for starting that initiative. And we set Eureka. And after a few time, after some days we managed to have a first solution which basically used a new custom IOM manager, a whole replacement of the all network operations also using the callback completion queue. And the results were quite good. In the graph we can see the performance difference between between the synchronous client which is the version that we already had there versus the new implementation that was an Sinkyo native implementation. For the client, for example, we went from not more than 10k queries per second using genericals to almost 20k queries per second. And in the server we also had really nice posting moving from almost 5k queries per second to more than 15k queries per second. So we were happy with the results that we got. But then another headache came into our plates which was basically, as I already said, we had already one client implemented which was the synchronous one and we had a new one which was the asynchronous one. And it presented some challenges because the synchronous stack would be there for a long time, maybe forever. And one of the possible issues that we could face in the future would be, for example, having an asynchronous application, a server, which basically calls a third-party library which behind the scenes uses the synchronous version with the solution that we had on the table it wouldn't work. So we had to address that situation. For addressing that situation we thought in different ways for addressing it. The first thing that we thought was, okay, let's try to rewrite the whole signal stack by using the synchronous one. So all of the synchronous function would end up basically calling an asynchronous call. But this, we didn't have a clear idea that if it would work at last or not. And also the amount of work that it would need would not be negligible. The second solution that we put on the table was, for example, changing a bit, changing a lot the C++ implementation by providing a way for having two coexisting stacks at the same time, running at the same time. But that seemed to not be also an easy solution on the table. And finally, we found out the solution that which the amount of changes that seemed to need was affordable. Which basically was, let's have one thread where all of the GRPC IE operations would be executed and in that way we would be able to not block the synchronous client that could be run on top of an asynchronous application. And the first tools that we got by using that technique were quite good. So the impact on the performance which was one of our main concerns was almost negligible. But then we start experimenting some deadlocks. When we started experimenting these deadlocks we started experimenting on random deadlocks at the moment that we tried to, for example, pass all of the synchronous test suite over an asynchronous context. And then is when we realized that we were having problems with some with between some JIL acquires and some acquires in the mutex of the GRPC code library. And then for addressing that issue we had to put the proper JIL releases and then is when the performance went down in a in a substantial way. So basically the problem that we were facing before adding the proper release of the JIL is basically we had two threads, one thread calling a GRPC function which would acquire a mutex which at the same time would make a call to the callback which will trigger a Python function that would like to acquire the JIL. But at the same time but at the same time we also had a thread which was trying to acquire the JIL it managed to acquire but also made a syscall to the GRPC core and try it to acquire the mutex but was not the mutex was already located by the other thread so we were facing a simple deadlock here. And for addressing this deadlock the only way was always if we have to call a GRPC core we had to release the JIL first by releasing the JIL what happens that we surfaced a lot of contention on releasing and acquiring the JIL and this is when the performance gets impacted a lot. And then we went for a second solution which basically in this second solution which we call Polar thread we basically gave up on trying to use the callback compression tool we gave up on using another IO manager was like starting over again we basically what we did is basically we implemented the solution where we will have a new thread this thread will be pulling the GRPC core and every time that we will have a new event from the GRPC core will be waking up the evan loop thread of a Senkayo. And for doing so but at the same time avoiding as much as possible the contention for acquiring the JIL we will be doing this by trying to avoid as much as possible Python code so even we implemented that thread by using Psyton most of the code that was executed for pulling and for waking up the evan loop was using basically C++ So basically this solution was really good had all the good benefits for example we removed the burden of having to maintain a new IO manager and the good thing is that the performance degradation was kind of affordable. So Aureka again, so for the client for example we managed to keep the same performance that we got in the first solution but we had a significant impact on the performance for the server but it's still a good boosting compared to with a synchronous one and passing over to Lili. Okay, so thank you Paul for introducing our Async IO integration. The GRPC Async IO API has been released as expanded API since 1.25. Feel free to give a try. Also it already passes all the Intel tests which means it integrates well with all other GRPC languages and with historical version of GRPC. Also what integrating is with Google Cloud platform clients and we're expecting some of them to be released in quarter three 2020 and the API reference can be found on GRPC.io website. Thank you. I'll take questions. All right, so does anyone have any questions here? I'll give it a minute because sometimes these creep in and meanwhile while we are waiting anyone who is in the anyone who's hold on, my brain just left me. So after the Q&A everyone can talk in it. What was the name of your talk again? See that's the thing is trying differently at GRPC there. Yeah. So in the talk GRPC and Async IO you can go to that room as well for further discussion afterwards and just looking to see if anyone has any questions post them here in Zoom or on Discord if you have them there and we'll just hang tight for a moment. And like we're still in Camilla in the last talk my speech teacher used to say if you don't get any questions usually it means that you presented really well. Thank you. So yeah, it actually it looks like it looks like you're good. So yeah, hop over to talk GRPC and Async IO on Discord and enjoy that conversation there. And thank you both for being here. Actually I have to do this work for me.