 We are going to have the next session now, which is going to be Adinata, I hope I pronounced the name correctly from pocket gems talking about how to utilize serverless technology to optimize and parallelize unit tests. Hi everyone. Welcome to my talk. I assume you can see the XKCD comic on your screen. Please interrupt me if you don't. This comic is a classic that it might not relevant with Python community, you know, because we don't usually compile our Python code. If you work at a larger company, you might encounter the pain of waiting your changes to be deployed. It's pretty common in large company deployment can take more than 30 minutes. Today, I want to discuss a similar yet different pain. In my company, running full test suite can take more than 25 minutes. Here is a screenshot of our unit test when being run locally using NOS test. It run with four processes. If you look at the user time, it almost 6000 seconds or 100 minutes, roughly four times of the 25 minutes running time. This unit test being run in parallel with four processes, which make it four time faster. There is a great talk from Jin Huiyong yesterday in Parotrack that explain about speeding up data processing with multi processes. You can watch the talk to understand more how the multi processing work. And the thing is, the unit test already run on top of the line MacBook Pro. I'm going to share how we decrease 85% of time to fully run the unit test suite from 25 minutes to be under five minutes. My name is Adinata. I've been writing code professionally for six years. I work at startups in real estate gaming company and finance. Most of my works are in the backend or DevOps. I can do some front-end development, but my design skill is not up to par with my coding skill. Please say hi to me in my Twitter. Give feedback about my talk. Trust me send anything to me. Your feedback is very valuable. My talk is based on my work while I was working at PocketGems. PocketGems is a game company. We use Python for our backend and the backend run on top of Google App Engine to serve our players. It's highly scalable and you can focus on developing features rather than scaling the infrastructure. The company has been around for more than 10 years and has accumulated more than 4,000 unit tests in NOS tests. Previously, I was explaining this talk only using NOS tests, but it's overwhelmingly underfunded, so I added PyTest material thanks to everyone who participated in my survey. Our topic today is how PocketGems run the unit tests in serverless infrastructure. I will share with you about our decision making, how did we solve this issue, and how we were able to complete the full test suite in four minutes. I know this comic is an anecdote, but how do you really feel when you have to wait half an hour before you can do your job? For me, it's very frustrating. Apparently it also frustrates others, so we set to fix the problem. Let's try to understand what the problem is. Running full unit tests is slow. Is it worth solving? If we make the unit tests complete faster, we improve developer productivity and happiness. If developers are happy, they can produce more values. We always run full unit test suite before deployment. If we make deployment speed faster, we can deliver values to our players faster. It sounds like this is a good problem to solve. So we set to fix it. How do we make the test suite complete faster? If we have reworked things to run, we will need less time. We can even skip running unit tests and it will be blazingly fast. Of course, this is not scalable. Unsafe, please don't think I'm serious. I am joking. We need to go through all unit tests. The only way to go faster is to fight and conquer. Each unit test is independent. We can split the unit tests and run them individually. Instead of a single process running the whole test suite, we can have multiple process running it. How do we parallelize test executions? We already had a glimpse at the beginning that we can run unit tests in parallel in a single machine with multiple CPUs. The limitation is it limits you to how many CPU cores your workstation has. It can also parallelize your ability to work because of the CPU consumption while running the test suite. The second approach is running it in multiple remote machines. With the era of cloud computing, you can have an unlimited number of CPUs. You are still limited by money, but in theory it's infinite. The drawback is you have to maintain those machines. You need to do health checks, security checks, monitoring, alerting, auto scaling. There is also a warm up time associated with auto scaling that can significantly add more time to complete. In the last approach, we can try to run it in a serverless infrastructure. If we can do this, we will also have an unlimited CPU, but without the need to maintain the work or not. And because the pricing model is only paying what we use, it will be cheaper compared to using remote machines. Let's go into detail for each approach. Local runner. Test runner usually have the options to configure the number of processes to run the test suite. For example, PyTest has the XDS plugin, which accepts test and options, and NOS tests with the test-test processes options. The way it works is first, the parent process will spawn processes based on the given configuration. The worker processes collect all the tests, and the manager then validate the collected tests. Manager then schedule tests to each worker. For PyTest, all the communication between worker and manager happens via the exact net channel. NOS tests use multiprocessing.q to communicate. Next, multi-machine. PyTest also have the capability to run in remote net with the same plugin, XDS, or probably cross this. The flow is mostly the same. The only difference is it arsings the necessary source to a remote machine. There is also Ruby alternative called knapsackpro. The basic idea is very similar. You have a queue, each processes in this CI machine, send this to the knapsackpro, the manager, and knapsackpro validate it and schedule the test to worker. Worker letters send the result into the server. There is also fallback mode, which will run the test by splitting test by file name in case of the API time out and failure in reconnect. It also use some kind of dynamic queue to ensure the time of completion around the same between all the nodes. One main difference is each worker can start their part without arcing. Usually triggered with git commit or git push. This is helpful if your CI solutions manage the worker for you and you cannot use SSH to the remote machine, which is what the PyTest cross this depends on. Example of such CI solution is CircleCI and also GitHub Action. Now let's try to design our solution. As a mobile game company, package gems focus on delivering value to the players and creating new innovative games. It means we won't have engineering resources to manage the worker full time. If we use a worker machine, we will have more things to do, which can be a full time job for a team from monitoring health check security dependency to ensure the CI CD work continuously. Of course, we can invest in the beginning to make the system more robust, require less handouts, less maintenance. If we look into how package gems backend works, we can see opportunities. First, because we run on Google App Engine, all the tests don't have external features. And also the backend code is 100% Python code and aren't using many binary dependencies. Those situations raise opportunities for us to consider using serverless. So we decide to use AWS Lambda to remove the necessity of managing worker machines. Here is how we decided. First, we upload the code package or deploy it. We create a Lambda function that can run the whole unit test. It includes all the package dependency, including the binary dependencies. Then we decide how many workers we want. Let's say 10 workers. In the local machines, we start the unit test based on the number of workers. If we have 500 unit tests and 10 worker, each worker will execute 50 unit tests. Let me show you the next slide. Here is my best illustration to show you what I mean. In this case, we have 12 tests numbered from 0 to 11. It's a simple module functions, modulo by number of charts. And if your test ID is the same with your chart ID, we take that test as part of the groups. We created a NOS test plugin that has a custom selector. And here is the logic where which unit test will be run for a given node and the node total. We then create multiple threads that send HTTP requests that invoke the AWS Lambda functions. For each HTTP request, it will communicate with a single function or container. Because we are using serverless, each container doesn't know what is worker ID. Hence, we send the thread index and the number of charts as part of the request. Those HTTP requests are asynchronous so we can send all of the requests without waiting for them to complete. Lambda functions will then run the unit test, select it based on the thread index and the number of charts. And meanwhile, our local machines will be blocking to wait for the result to complete. It's cheap to run hundreds or thousands of threads at the same time in this case. Because all the thread is doing is waiting for IO. When the Lambda function completed, the HTTP request returned the test output to our local machines. And when all the HTTP requests complete, the local render parse the output, show it to the developer. This approach works well for our use case. Of course, with some Goja. Here are what we have learned. When developing, we encounter the code limit of 50 megamits for AWS Lambda. So the local render when uploading the source code need to exclude everything in Git folder. Exclude all the PYC files, documentation files, dependencies, dependency tests, I mean. And yeah, basically making sure that we only upload what is necessary for the test to run. In the system design, we deploy the code base as a Lambda function. Now imagine two different machines, two different user running the test at the same time, which means we upload the code at around the same time. How will the unit test behave? While one of the unit tests with running, someone else uploading a different code from a different range. It probably won't work and probably won't be saved. So we handle this by letting each machine have its own name functions or deployment. So each user have different deployment to different Lambda function. The third is it also depends on your network bandwidth. Is your network fast enough to upload 50 megabyte? If you are currently traveling and have a slow network, this might be less useful. It's because for every test execution, we upload the whole source code to AWS Lambda. The four minute times that I showed you before, one minute and 30 seconds was used to upload the source code. So it's actually only required two minutes, 30 seconds to run the whole test suite. In theory, we can use 1000 parallel executions. But the battle network changed into the longest running unit test. If you have one test that is particularly slow, then we will need to wait for that test to complete before the whole test suite completed. Serverless is excellent, but with its limitation. You cannot log in to compute instance. You have limited time out. You also cannot customize the runtime. If you depend on binary dependencies, you cannot install it beforehand into your runtimes. That means you need to upload your binary dependencies together with your source. Nowadays, I think this is a solvable problems. For example, we can use GitHub actions. The next pro that I showed you before actually support running on top of GitHub actions. We can create a Docker container that will be used to execute the test runner. There is also several learning that we encounter. The intern had developed it for roughly four weeks. And we have been using it for the last two years. Running the test suite not completed in four minutes. That's 21 minutes being saved on every execution. And we have more than 1000 executions each month. We translate that into engineering hours. The 21 minutes before multiplied by 1000, divided by 60 minutes. We have more than 350 engineering hours each month being saved. So assuming when engineering hour costs $50 in Bay Area, we are saving $17,500 each month. Now let's talk about the cost of operation. In average, the system costs around $120 each month based on the AWS report. So this is very cheap compared to the engineering hours being saved. And if we look at the initial investment, it was four engineering weeks. That is less than $10,000. The break-even point is less than a month. And we have more than 70 engineers happier using the tools. If we go back to the beginning, we are just frustrated. If we look around our daily life, we might be able to see stuff that makes us frustrated. Others might have the same frustration with you. Are you able to proactively figure out how to make it less frustrating to you? You might solve everyone's problem. That is the end of my sessions. I think I talk faster than what I train. If you have an equation, feel free to ask. Indeed. Thank you very much. So let me play the applause just a second. Thank you very much for your talk. That was very impressive, really. It's not only money that you saved. I think it's lifetime that you saved, the developer's lifetime. Just sitting in front of the screen and waiting for stuff, which is often very frustrating. When I think about it, Python development has this problem as well. It takes ages to run the test suite for the Python source code. It's basically running unit tests as well. I believe that most of those unit tests are actually independent of each other. It would be easy to parallelize everything by uploading everything x times and scaling up that way. That's actually something that we might want to consider. I think it's also with the GitHub actions, it was very easy to parallelize things a lot. The Knapsack Pro that I showed you before actually allowed us to use GitHub actions to parallelize the unit test executions. What do you have to have? You have to have GitHub actions. The code has to be on GitHub, I suppose. That's for Python code development, that's a given. We're using actions as well. The only thing that is missing is setting up the AWS, I guess. If you use GitHub actions, you don't actually need AWS because GitHub actions actually run in a container, so you can run the unit test in that container. You would have to parallelize the GitHub actions. Yes, in the Knapsack Pro, I haven't read it by myself, but in the Knapsack Pro, it used the metrics, I believe, the metrics to parallelize it. So you can have the same system, for example, Linux, Debian, to run five or ten different executions. Right, that's very interesting. One question about your use of AWS Lambda. What you're doing with AWS is essentially for every single, let's say, unit test or set of unit test, you upload the whole source code ball to AWS Lambda. And then you run into these issues with the code size limitation. Would it help to put the code on S3, for example, and then load it from there? Or is this a, let's say, a container limitation that AWS Lambda has? It's the AWS Lambda limitations. At the time of the development, the GitHub actions, I believe, doesn't exist before. So we tried to use AWS Lambda, and Google doesn't, even though our infrastructure run on Google, Google at that time doesn't have Python as the serverless. So we use Lambda because it's already support Python. I think we will already support Python now for the serverless. I haven't used Google Cloud in a long time. I know that AWS supports Python and has been for, I don't know, how many years. So I've used that quite a bit. So I'm familiar with it. So yeah, thank you very much.