 Hello. Welcome. We are starting our next talk. Our next speaker is Dennis Macogon, and he's going to talk about solving Python call-style issues in cloud infrastructure, so I hope you enjoy it. Hi. My name is Dennis, so who am I? I'm a freaking speaker. I'm a Python open-source contributor to various Python projects. I started my work from OpenStack. Probably if you didn't Python probably heard about that. So if you notice the name of the talk, it's all about the cold start issue. So what the cold start is, Python is interpreted language, so we need to spend time in order to warm up an interpreter, execute your packages and modules, and then start executing your code in particular. So when it happens, so whenever you type Python module, it's going to start your interpreter, execute your modules starting from imports, then execute definition of your classes, functions, and et cetera. So why the cold start happens? If you've been developing lots of, if you've been using and developing lots of Python libraries, you've been noticed that some developers put a lot of code into module init files. So basically when you do import something, if from import something, you're going to get that import, but along with that, you're going to get whatever is in init files. So why should you actually care about that? So probably most of you have been working with Python as data science engineer, as people who write scrappers and et cetera, but certain of us doing some hardcore Python networking programming, but in particular, I'm developing a serverless platform. So for our customers, we actually trying to build a platform which is very efficient in running a code with the interpreter languages. So we started with Python and Node in particular, and I'm responsible for any bits of Python code written as part of our open source solution. So if you do, if you launch your code in clouds, if you launch your code on the host platform as a containers, as a clear VMs, as any hosted infrastructure, you actually care for how long it doesn't take to start your code with then the very constrained amount of resources. I'm talking about the CPU and talking about the amount of RAM. So as I told you, the cold start happens when you start an interpreter, and it happens because people then actually take a lot of attention in the way they write their code, because in OpenStack organization, we created a set of extension to FlakeTools that actually checks whether your package, your library, or any bits of your code has a code into init files, which is maybe good for some cases when it doesn't care where your code is being executed, but when you care about the infrastructure and the time it takes to start your code, you start considering about how efficient your code is. And you should care because if you're working at a startup, you have very limited budgets, so you basically have to identify what causes your code to spike on certain commands. So basically you need to figure out if your code is efficient, or if it's not figured out when it actually happens. So if you're being familiar with Python 3.7, the import system, the C-Python implementation, the import system has changed significantly. So for development, nothing has changed much. You still can do import something from package import something, but the import system changed. That's why, for instance, TensorFlow doesn't work with Python 3.7 because it just doesn't follow the new import system internally. So as part of the Python 3.7, your library was introduced called importlib. Before that, it was named as impol, if I can recall correctly. So you can read about that. There's a QR code. So also new things were added as part of 3.7 Relayists. It was a new tool, which is called import profile tool. You can start your code and actually see for how long does it take to how many time your import stakes to actually run. And the sad story first, we've been developing a serverless offering and we've been developing lots of libraries for our customers in order to let them write their serverless functions in various programming languages. And one of the things we've faced was that at very limited constraints, but in particular 10% of the CPU and 128 megabytes of RAM, your code, the R library, which was built on top of one of the asynchronous HTTP frameworks, took almost eight seconds to start before even serving the requests. It was a very huge problem for us because our platform has a pretty strict limitation, three seconds, period, nothing more. So we started to, we used Python 3.7 in order to use, build and import in profile in order to figure out what actually takes almost eight seconds to start their code. So we need to figure out what the use case of the users run our serverless offering and the minimum tire for the serverless function was a container which was running on those constraints. As I told you, it's only 10% of the CPU and 128 megabytes of RAM of swap and kernel space. So the first thing we've used was IHTP and it didn't work well. IHTP took, by itself, it took almost 4.3 seconds to start. Then we decided, okay, we need another asynchronous framework. Let's use Sonic because it's said that it's supposed to be the pleasantly fast HTTP framework and it didn't work as well. It took almost 3.7 seconds to start as well. So I'm not blaming any framework here but we started to figure out, to identify if the Python itself was taking the much time to start on those constraints. This is what we figure out. I think IU itself took almost 2.1 seconds to start on those constraints and we figure out that we want to actually build a framework that wouldn't take that amount of time to start on those limited constraints and we figure out that there is a correlation between the amount of modules in your SysPass to the importing time in the particular modules. So basically, if you put a lot of modules in your SysPass, the pace of the obtaining the particular module instance is going to increase significantly. So we had the limit as I told you 3 seconds to start and we've made a pretty simple framework based on the channels. If you've been developing Django or any ASGI framework, you're probably familiar with what the channels is. So basically, it's a wrapper on top of the I think IU protocol. So we were able to build this type of the framework that was able to import its own code and start in 2.5 seconds, which is less than 3 seconds but still not that very cool. So what causes Python itself to go slow on the limited resources? I would say stop blaming interpreter. If it goes slow, it's your code is slow. So you need to figure out if your code is really efficient in terms of resource consumption and please stop putting code in it files because if I want to import something, I want to actually do this explicitly. And here's the one of the use cases. A HTTP. If I want to start a simple HTTP server, I had to basically import WebSockets, HTTP, AO files, and rather lots of different type of libraries which I'm not using, but all of those are taking time to start. And that was the case why we figured out that we don't want to work with HTTP or any other library because they've been importing so many unnecessary stuff and that's a problem for us. And also the one of the curious thing that I've mentioned that stop polluting your SysPath with unnecessary modules. So assume that you have a web server that operates some business logic but it continuously executes the only one business path. But the other ones are not kind of used but you still have lots of imports for that business logic which is not being used but it's already in SysPath and you basically have to live with that with the time your module is going to be looked up in a SysPath. So here's a simple example of the Saniq application. It's like a simple web server that returns you Hello World as a JSON. And this is what you get from the import profile. Don't read that. It's all unnecessary except the time. The Saniq took 3.01 seconds to actually start and this is what Saniq actually imported along with that are your files, web sockets and testing fixtures. So tell me please if I need a simple HTTP server why should I import web sockets and testing fixtures? It pollutes my SysPaths. I don't want to see that unless I actually want to use that. So how to make your code faster? Most of Python developers and actually Python core developers would say you to do this. More RAM, more CPU are going to be fine. That's not the case. I'm really sorry for that. No. So the first thing you need to do this is how your init file is supposed to look like. Oh yeah, it's empty slide because there's supposed to be nothing there. So since Python 3.7 has an importing profile please develop a set of tools in order to get your important time in your CI CD which is very important for other people because you may say that all of that is totally unnecessary but if you're developing a library for yourself please keep it private. Do not publish it. Do not say it's very useful. There are so many pitching articles on the medium saying that our library is cool but it's cool in terms of what. There's no explanation. It has high performance, high performance on what. So many developers are testing their software on their own laptops. Okay, you have four RAMs and probably 16 gigabytes of RAM which is not fair for any other people who want to actually save some money on the infrastructure. So and yeah, and this is passed as to actually remain clean and I'm going to show you how to actually make that clean. There's like, it's going to be a hack but it actually works and it's more efficient than the regular import system. And one of the very important thing for Python developers who actually care about performance of their code test your performance on the low-cost constraints because this is the only way you can get the results from the, not from the request IO but on the system IO. You will see that at some places your code doesn't work pretty fast and probably this is what going to cost, what going to break your business logic later. And they use an import lab. So import lab allows you to actually delay the import of the particular code for later. This is what makes your business logic actually stay clean from unnecessary work. So for the library we developed for our customers we've used a pretty simple thing. We use import lab that actually been taking a customer modules and then start all necessary infrastructure applications and then run customers code only on the first request but not before that because our code is supposed to be very efficient. So if we are sure that our code is efficient that we can say that excuse me my dear customer your code is not very efficient and here's why. You're going to show them certain statistics from the import lab. And here's the explanation of the hack I was actually being able to figure out but it was, at first case I didn't believe that it actually works better. So as we used to we put all our imports into the header files but we can actually wrap the imports into the functions. You're probably going to throw rotten tomatoes into me say like imports and functions are bad just go away from here. Actually no and here's the example. So here's the two functions. One is who has a pretty simple import but wrapped with the function and there's another one wrapped with LRU cache. So just to let you know even the pretty simplest implementation of LRU cache in the func tools works faster than look up at the six paths. At the scale LRU cache works faster than simple imports and here's the visualization of that. So you may not say that like there's a certain spikes but in general they are equal. So this is the example of the experiment which takes 10k imports wrapped with a function and 10k imports wrapped with LRU cache. So as the result what we get LRU cache works up to 39% faster than regular imports. So at the 10k almost 6.5 grand of imports are faster with LRU cache and this is how it actually looks like. So if you notice there is a pretty tiny line which actually measures where the LRU imports and the regular imports have almost the same time but you can see that almost everything that stays at the right of that line is what our LRU imports is. So basically if you do a development of the efficient web application if you do a serverless, if you're familiar with AWS lambdas, Google Functions, Azure Functions, etc. you need to be very careful with Python code because it's going to explode your functions. So you need to basically make them efficient in order to save some money on the execution because every vendor is charged you upon the execution time and the CP analytics in general. So this alt experiment you can find on this link. Actually, I finished that probably a couple of, almost a week ago and I was actually surprised to see such results with the imports. And the final thing is like a huge disclaimer. I'm not blaming anyone library developers, core developers but you need to be aware that we had a Python 2 version and Python 3 version imports in Python 2 works faster than in Python 3 because Python 3.7 because things has changed internally but we haven't noticed anything about that. Thank you. Okay. Can you hear me right? Good. We have some time for some questions. So let's see one here in the front. If you want to start a longer discussion just let's take it offline. I'm going to stay here around for some time. Hi. So one of the problems is that you're essentially having to pass and read all those modules when you're importing them. Once again? One of the problems is that you're having to go through those modules on the syspath, et cetera. Yeah. Could we not throw the entire thing at some sort of dead code elimination algorithm and just get rid of all the things that we don't need? Well, once again, I just... Just throw the entire thing at a dead code elimination algorithm and get rid of all the things we don't need before uploading our function to a serverless platform. That could work but it's going to be a very specific for each application. You can write like a unicorn algorithm that's going to work for any case. It's more... It's simpler to actually write an efficient code and just keep yourself away from imports you don't actually need. This is what actually happens to us. We had to actually rewrote the whole stack for the customer to make his code work efficiently. And probably some things will change later with Python 3.7, with subinterpretors where you can basically split your file. If you can expect that your user will still follow the same pattern of the development imports in the header and code later, you can actually strip out imports and then execute them in subinterpretor because you have a state of that which is going to stay alive for the whole lifecycle of your application. It could change but still it's in alpha. Probably we're going to see something new later this year until the 3.0. Thank you. Okay, more questions? Yes, just to check my understanding the theme that you're building for the customer is somewhat similar to AWS Lambda or Google Functions. So we've built an open source project but not the open source service. So our project called fend project it's totally open source and you can just try it right away. But in Oracle or internally folks from the service department they are building a service on top of that. The question that was at the meat of my question is do you know presumably if people are trying to upload code or Lambda Functions to AWS or Google Functions they could run into the same issue? Yeah, that's totally the same thing. Do you know how Google... It's the same for any serverless platform that runs interpreted code. It actually happens to Node.js as well but can I provide your results yet? I'm still trying to figure out how to make it work because I'm not a JS folk guy. I'm wondering if do you know have any idea whether Amazon and Google have solved it in a similar way to you where they essentially have? They haven't solved that. They haven't solved it at all. So it's still there because most of the Python community for some reason they're ignoring the cold start issue because it's never been the case until the serverless actually have shown up. So at most of the time the performance of the Python has been increased by adding more RAM, basically. That's what you need basically to run your Python efficiently because almost all operations inside of the Python are not CPU intensive. We are RAM intensive. So yeah, just adding more RAM it's gonna blow away this problem at all. But still, there's a minimum tire there's a requirement to run code faster. There's one more, sorry. I think there was another one here too. Oh, sorry. We're gonna have some time. I promise you. So one of the Python users of the init is to expose the public interface of your package to the person that imports it. Are you saying we should stop doing that? Well, there is another safer approach. If a user wants to import your interface then he should do it explicitly. Why it should happen implicitly? There's always, there's like a wait where you have implicit approach and explicit approach. An implicit actually hides lots of things from you as a user as a customer, as the library user. So you basically had to read all the source code and understand what happens internally because for some libraries there are lots of dynamic imports happening in init files. But they still can happen in... Yeah, yeah, I've seen you. Yeah, that's true. That happens often. Like in the various libraries like in Sanic, when you do import Sanic you get at least like 25 imports being performed for one reason. It's not only the standard library imports but it's a Sonic core imports. You can make that code efficient, more efficient. But still there is a pushback for rewriting libraries to make them more efficient. Okay, we have time for one more question. Thank you. Just to build on what you were explaining about is 25 imports. Profiling such imports can be challenging. Have you come across any tools that help do that? Have you developed any tools that help you identify... I'm actually working right now. I'm working on that right now because it's a fairly new problem as I told you already that most of folks don't care about that. But still there is a huge community of people who want to deal with a call start not in part in the particularly like for Docker there is like a 300 milliseconds delay on the container start but you get more with the interpreter warm-up and your code execution not just the business logic execution but the code itself. There's one more, sorry. Yeah, I'm sorry but we are running out of time. So if you want to ask him any questions... Yeah, just take it offline. Yeah, you're gonna be around, I think. Yeah, I'm here. So yeah, just take it. Okay, thank you. Thank you so much.