 Hello everybody, so I am Aditya, I work at PLEVO, in this talk we will be seeing what G-Vent has to offer to help us build highly scalable web services. So I work as an infrastructure engineer at PLEVO and part of my work is on optimizing and scaling our various internal services. So G-Vent is used in nearly every project at PLEVO and it has worked quite well for us to help us scale up and it's the async framework of choice for us. So let's go to the core problem of what happens when we try to run a web server in Python. The problem is simply blocking IO. So any typical web server, a request involves accessing multiple requests to internal services like a different API service that you use that is provided by some other company or accessing a database or a server or RIDIS and the problem is that while your web server process has made a request to one of these external servers, it's blocked, it can't respond to any more requests, it's simply waiting. So standard web server, Python for example like G-unicorn, they run multiple processes where each process will respond for a request and the problem with a server like G-unicorn is that while it's processing one request, if it's not using some async framework, it's going to block on remote procedure calls like accessing a DB or accessing memcache and so on. If you see the G-unicorn tuning guide suggests that you just tune the number of processes that you run so that you can get higher amount of throughput. But the problem is that you can do some amount of tuning with your test platform and try to find the optimal number of processes that you need to run to get the best performance but once in production with varying workloads, different requests will take a different amount of IO and CPU and in production it turns out that very well because if you have a large number of processes serving requests and they all either compete for the CPU or they are just waiting on IO. So your load average keeps going up and down. So the solution is of course to use a non-block framework. So this one that lets you do async. So what is other popular looking IO frameworks? So it gives you a synchronous API which starts using existing code. There's no need to rewrite a lot of existing code that has been running. Rewrite is always expensive, whatever kind of work you're doing. I mean if you're a big company or a small company or trying to accomplish something fairly complex, a lot of code that you have already tested for a long time, it's going to take a lot of time and it's fairly expensive. It gives you the advantage of not having to rewrite it. And frameworks like Twilio, Tornado or even the Python 3 async IO framework has the problem that you have to rewrite a bit of the code. In an event driven framework like Twisted you need to rewrite your code in terms of pure CPU and pure IO calls. You have to register your IO calls and then write callbacks so that it will know what to do when an IO operation completes. In other IO frameworks, you need to decorate a lot of your code. You need to mark it as something that needs to be run inside an event loop and so on. The main advantage with G event is that you have to rewrite very little. So what is G event? We're taking a line from their documentation. G event is a coroutine-based network library which means that functions have to be threading aware and when it does an IO call it needs to be able to switch. So how this is done for a vast majority of Python libraries is through something called monkey patching. What happens in monkey patching is that standard library functions like for example the socket library, blocking IO calls in that are replaced by functions which are non-blocking and the IO loop aware. So what happens is that when you start up you just write a line to import G event and to monkey patch and all your standard libraries become green light aware or all the blocking calls become non-blocking because that's how it works. So what G event does is it has, each process has a G event hub which is an interface to a fast event loop provided by a C library called libEV. Each green light when it needs to do an IO operation will register its IO call with G event hub and when the response comes back the G event hub will wake up the waiting green light. But the best part about all this is that it happens completely under the hood. I mean your code doesn't look like it's doing all this but it does and the performance improvements are quite huge. So how would you use it with a standard web framework like flask or Django? So here's a small code snippet that shows how, I mean how to make your web server blocking, non-blocking. So in this there's just one view function at the URL hello that does time.sleep1 and returns hello1. So when that initial line that imports patch all and then calls patch all just calls make sure that the time.sleep function is replaced with a non-blocking equivalent from the G event library itself and when time.sleep1 is called it doesn't block the whole interpreter and make your web server just wait. So it can process other requests. So the one way to deploy a non-blocking server is to use G event's vizgi server. You can just plug in your vizgi standard web server application into G event's vizgi server and you'll get a process that is, you'll get a web server that spawns green lights for each request and works in an asynchronous manner. So, but one small disadvantage of this is that you can only run one process if you use the vizgi server. A better way is to use the micro vizgi web server or the G unicorn web server and use the G event of worker type in them. This again, this input, this makes sure that each request runs in a separate green light and you can spawn. So how, what it will look like is when you, you can spawn, say, four worker processes, each process, so four should be equal to the number of cores that you have on your hardware and for each, in each process it will spawn some number of green lights which you can configure. And each green light can run simultaneously with the other green lights because during IO it switches green lights. So, but how do you know, I mean, if you have an existing application you have to consider, will it work for my app? So, some things to consider are, is your application CPU bound or IO bound? So, one thing to remember is that if it's CPU bound, G event is not really going to do much for you because it doesn't, it's an asynchronous IO framework. So, it'll make your IO calls non-blocking. But what about if you have some library that does IO in, say, a C layer, that is, which is not using some standard Python library like Socket? For example, DB drivers do that. You need to consider if a green alternator for that is available. Sometimes, more, nearly everything will be fine except that you'll have this one library that does IO in the C layer. So, in those cases you'll have to consider if you can work around it somehow. The problem with a Python library that does IO in its C layer is that it can't be made green light aware at the Python layer. So, like that fourth green light is doing, it's calling a function that does IO in the C layer and that blocks the whole interpreter and you've lost the benefits of an asynchronous IO framework. So, we'll go over what we can do about such situations. So, this is one way. So, a DB driver for Python usually does its IO in the C layer. Like, for example, the Postgres driver PsychoPG2. But it has a green alternative. You just add another library called PsychoGreen and then you call that function and your DB calls are made green light aware and non-blocking, therefore. So, just to show you what, I mean, show you that it works because I've heard that a lot of people have problems getting their DB driver to work well. So, I'm going to show you a small demo. So, the demo is going to basically run two requests multiple times. The first request just calls this function, which is sleepPython, which just calls sleep for five seconds and then returns something from the DB. So, we'll see if time.sleep blocks are not. And the other request that is going to run is to pretend as if the database is doing a lot of processing. But we're just to simulate that, we're just going to call Postgres to sleep for five seconds using selectPgSleep. So, the demo just runs in Docker. I'm just starting the database here and populating it with some dummy data. So, this client program is just going to run five requests of Python sleep and five requests of PgSleep. So, what we see is that though all five requests run simultaneously, the overall time to process all those requests is still very close to five seconds. So, this means that it's non-blocking. So, okay, now the screen is too small. So, I wanted to show you that in this case, the web server running microvisg is running three processes. So, I'm going to increase the number of requests that we make it execute. So, you can inspect the code at my GitHub account. So, I'll show you the link shortly. So, again, you see that it takes just five seconds to do both the Python sleep and the Postgres sleep. Okay. So, yeah, moving ahead. So, now I'm going to, okay, it's not really visible, but let me just show you what it's doing. So, some C libraries, like I said, sometimes most standard libraries, I mean, all your standard libraries are working fine, but you have this one library, for example, LXML, which is, like, I think the fastest XML processing library that you can use in Python. This library uses C layer and sometimes can do blocking calls, like, for example, the parse function is able to accept a URL which it will fetch and then parse and return. But while it's fetching the URL, it's going to block your process. So, one way to work around it is to make it to fetch the data from the URL out of band. That is, do it in a separate call to URLlib.url open. First, fetch that data and then ask LXML to parse it and give you the XML object. So, in this case, yeah. So, this, the actual parsing component is CPU intensive and will block, but in these cases, what you wanted to do is to run continuously. So, for example, if you're using, say, the Python standard threading library to build something like this, it's going to switch even if you're doing CPU intensive tasks or IO intensive tasks. But in such cases, you usually want to let an IO intensive function run completely. Another complication that you might have to consider is if in your view you're calling one CPU intensive process multiple times. Say you pull multiple objects from the DB and you need to, say, read it off as a JSON object or parse XML of it. So, my expensive funk here could be any of those. So, if you need to call it multiple times, it will block other greenlets. That is, other greenlets won't get a turn to run. So, what you do is you call gvent.sleep0, which tells the greenlet to sleep for zero seconds, but effectively gives the gvent hub to switch greenlets. So, this way you can avoid starvation if you don't have too many CPU intensive functions like this. So, that's about how simple it is to use gvent to make your synchronous IO, asynchronous without changing your code much. But what else can you do with gvent? So, the greenlets or the threads that are in gvent, we call them greenlets, they are really cheap. You can spawn hundreds and thousands of them and still your machine won't even get warm. So, you can use these things and some standard synchronization primitives that gvent provides to build complicated multi-producer consumer queues and various other things. So, some standard synchronization primitives that are available are called events and queues and locks. So, I'm not going to go over queues because those are quite simple, but I'll go over an interesting use case that we have at Plevo about that uses events and locking. So, at Plevo we're a call and API, call and SMS API service. So, in the call running component, phone calls I mean. So, in the phone calls running component we have a soft switch called free switch that streams audio to various endpoints that are in a call on your server and how it's controlled is via a socket called which is called the event socket layer in free switch terminology. So, this event here refers to events internal to free switch. So, just to give you an idea of what it's doing. So, a call manager process which we've written is used to send commands to free switch and the commands are like start a new outgoing call or hang up this call or play some audio on this call or connect this call to another call. Various things like that and free switch sends responses to these commands saying okay I'm doing it, no I can't do that or some error or something and additionally it also sends events on the same socket. So, the events can be like there's a call heartbeat that keeps getting generated every few seconds to show that the call is alive so that you can do billing. It will notify you of newly incoming calls so that your call manager process can say how to control it and hang up events, various kinds of events. So, this is how it works. All this happens through a single socket. So, let me show you what the threading model is inside our call manager process. So, the commands sending greenlets, there are many of them each usually one per call and they send commands to control a particular call and they send commands to the free switch event socket layer. All responses by the free switch, both to commands and new events that are generated are caught by a single greenlet that is simply an event listener. Here when I say event listener I mean free switch events but it also listens to command responses. So, the event listener's greenlets job is to root command responses to the greenlets that made those commands. So, you have to send the responses to the right greenlets that made those commands and for events that are generated by free switch it generates new greenlets that handle these events. So, let me show you how this code looks like. So, this function called commander is executed by every greenlet that is trying to send commands. So, that would be those greenlets on the right. So, the arguments T is just a transport object. So, it just encapsulates the socket. CID is the command ID and CMD is just the command. So, we use a couple of shad structures here. One is called commands and it's a deck. So, a deck is just short for a double-ended queue. So, you can insert and pop from both ends. There's a lock also that's used. So, in the code as you can see the first step is to create an async result object. So, this is like a future or a deferred if you're coming from a different set of jargons. So, it's an async result and what it can do is so it can hold some result that will come later on. So, what we do is we append the async result and the command ID as a tuple into the command's double-ended queue and then we call send command which basically sends the command to the event socket. And this we need to do inside a lock. Okay. After this is done, now we call asyncresult.get which basically weights blocks the current green light. So, this is the command sending green light, blocks this green light on waiting for the response. So, the response is received when this asyncresult.set is called. So, we'll see where that is called. The lock is to ensure that the green light that sent a command gets the right response. So, it shouldn't be the response to some other command. That's what we're trying to synchronize here. So, this is the response handling green light. So, this handles both responses to commands as well as events. Okay. So, it's just an infinite loop where it reads the response of the socket object, checks if it's an event. If it is an event, it just spawns a new green light to handle that event. And if it doesn't start with an event, it's a response to a command. So, now you need to wake up the green light that is blocking on this command. So, that will be the first command in the double-ended queue called commands. So, we pop that and call asyncresult.set. So, that will cause this green light to continue from this point. So, you can see a fully runnable example at my GitHub PyCon 2014 repo. I'm not sure how I can make it brighter. So, let me read it out to you. It's just do you want me to read the code out or the link is just github.com slash Donatello, which is d-o-n-a-t-e-l-l-o slash PyCon 2014. So, there are some caveats that you have to look out for when you're working with g-event. I've already mentioned a couple of those, but another one is that it breaks standard profiling tools like c-profile. So, if you already have a lot of stuff set up that helps you profile your code with c-profile, they're going to break. But there are alternatives, but you need to try them out and see what works and there's a green-let profiler there's a green-let profiler project whose author, I think, was planning to speak at this PyCon, but I don't think that happened. With services like New Relic that do say that they will handle I mean, they do handle g-event projects. At least in my experience, it's like sometimes it works, sometimes it doesn't. For some projects it works and for some it doesn't. The other two caveats are simply what I've already mentioned that you need to make sure that libraries that do I.O. in the C layer have green-let-aware alternatives and if you are doing CPU intensive operations, you need to make sure that you don't starve out other green-lets. Yes, so benchmarks. Benchmarks that compare multiple web servers are quite complicated to do and this one is done by someone else and it compares a really large number of benchmarks and large number of web servers that are in Python, so many of them async, some of them sync and it's a pretty well done benchmark. It shows that it does do better than Twisted and many other popular frameworks. That's it. Can I just take any questions? Yeah, web server? So you can use a simple one to use is G Unicorn. You can get it working quite quickly, but for performance, I think you should use microviskey, that is UWSGI. That project is written in C++ and it's way faster than G Unicorn. If you want to squeeze out more performance, you should use microviskey and yeah, it's got a lot of tunables that you can play with to get a lot of performance out of it. Hello? Excuse me? Yeah. In memory Redis server. We just store sessions in Redis and we use load balancer to send HTTP request and we just use Amazon to scale up. Lots of instances behind an ELB all talking to a fast Redis server which shares the session. So it works quite well. I have a question. How do we use Django and we use Celery for a lot of queuing. Can this help out instead of Celery or as a substitute or as some kind of a substitute for Celery? Yeah. It could, but Celery does a lot of things that you'll have to build yourself if you choose to build it with G-Vent. But G-Vent if it's not too complicated what you're trying to do I'm sure G-Vent can do quite well. So we use there ShadQ synchronization parameter. So if you call G-Vent.patch all you can just see the dock of that function. Basically it patches all your standard libraries. The most common ones that you'll be using are sub-process or I mean socket, sub-process and OS I guess. Yeah. Python wasn't yes I think so. Python 3 is not yet supported but I think 2.5 to 2.7 is supported and it works. No, it is necessary to call it inside your application as well because what G-Unicon's G-Vent worker does is it simply creates a G-Vent thread pool a green light pool and simply puts your requests in different green lights but to make your code non-blocking you do have to call a patch all inside. Yeah. It will Yeah, that request will block your web server process until that get completes. So I have a question. Yeah. From the db driver Yeah. MySQL I don't use MySQL much so my knowledge is limited but there is a pure MySQL db driver and there is a c-based driver so the pure one will be non-blocking by default. Yeah. db driver itself will block because that's what so for these kinds of things you need a good profiling tool so I've used one tool called green light profiler which works for I mean which has worked on some projects but it hasn't been a complete solution in my experience. Yeah. So our approach is usually to look at what libraries are linked into the applications and see if those are explicitly known to be non-blocking. Yeah. Yeah. Yeah. Yeah.