 Thanks, Alex. So, yeah, as she said, I'm stepping in for a talk about Locustail. I'm going to talk about briefly about experiencing my team here at Bloomberg with a load testing tool that we used. It's written in Python. It's open source. And we used it to load test some of our backend services. I'm Ivan. This talk was delivered by Kuvila and my team-mate in FOSDEM last year. No, actually this year, last edition. And I'm also working in Bloomberg. So, to provide a bit of context, I'm going to start talking about our product. This is Worksheets. It's a market data monitoring tool that professionals from the finance industry use to monitor live data. So, you can see some of the numbers here flashing. They type in the securities that they're interested in. They can get the data. They can type formulas. They can share this with the colleagues and have, like, a collaborative editing environment. And this is a new product that we're developing right now, and we're releasing to clients. So, so far, it's going well. We're going to release it to even more clients, and we ask to ourselves, if we grow a user base, are we going to be able to handle the load? Do we have enough capacity? That's not the model I was expecting. Sorry. I guess I found this different here. So, we ask ourselves, are we going to be able to handle the new loads? Are we going to be able to provide the service to more clients? Or is it going to collapse catastrophically? That's not what we want to happen, right? So, that's where no testing comes in place. First of all, what is what we want to measure here? As certain attributes occur, there is text of a system that we want to assess or evaluate. I'll just explain some of them. One is the capacity. So, is the size of the infrastructure adequate for the load that we're expecting? Do we have enough machines? Do we have enough bandwidth on a network? Do we have more CPU, more memory, and so on? Is the speed of the system enough? How long does it take for a service to respond when we send a request? How long does it take for the database to achieve the data? Is that enough in our context? What does being enough even mean? That's something that we need to answer. Is a system scalable? This is, if we have a system X and we double its size, will it be able to handle double the load? That's not necessarily true, right? Sometimes it could be that it can handle more loads, but precisely that also with more machines. Or more bandwidth, whatever it is. But it doesn't necessarily mean that a system is going to be able to cope with more load. So what's the scaling factor? We would ask ourselves. And another characteristic would be the stability of a system. Does the system behave correctly under load? Does it become flaky when the load increases? Does it become so full apart? Something not so right? So how do we answer to all of these questions? There's lots of questions. We need answers. So for that, testing comes in place. There are different kinds of testing, as you all know. We have unit tests, performance tests, load testing, stress testing, innovation testing. Here I'm going to mention briefly these three because they're the ones like... The meaning of this is not quite settled in the literature, so some authors use different terms for the same concepts. I'm going to explain what I understand for performance testing, load testing and stress testing. You might go to all the resources and find different explanations. But pretty much for the purpose of this talk, performance testing would be a way to evaluate the performance of a system against the benchmark. So we want to get the numbers, the raw numbers, like time that we spend doing some operation. We don't test particularly heavy load. We just want to get the data. And we fine-tune the different parts of the system to get the right numbers, basically. So the goals of performance testing would be, as I said, established in the benchmarks. But we don't want to really find the effects because we have all the types of testing for that goal. Load testing, which is the one that we're going to focus on. In this case, we want to feed the system a big task and basically increase the loads and see when it stops working, because at some point it will not be able to cope with it. And we do this by simulating virtual users. So we'll see how to do this with locust later. This is the one that we're interested in. But it also overlaps a bit with performance testing and stress testing. That's what I'm explaining. So the goal here is to expose the effects, probably related to memory management, things like memory leaks that become obvious as the load increases. And determining the limit of our system, the database, the network, the CPU, memory, how far can we go. And stress testing is slightly different. So in this case, instead of filming more load, we take resources away. So the system, we start with a system in a normal status. We take machines away. We take network bandwidth away. We bring dependencies down and see what happens. In this case, the goal is to make sure that the system will fail, but it can recover gradually from those failures. Or there's some level of resilience to those failures. And we want to establish what the application should do when that happens, if the inevitable happens. So as I said, we're going to focus on load testing. There are some points to consider before we start load testing and to establish if it's the right thing to do. First of all, we need to have some monitoring tools because we will start load testing and we don't have the tools to see what's happening. We can get any information out of our tests. So we would have logs, metrics, things like that to be able to see and make picture. The risk of a service failure should be important enough, which means there may be services that we don't want to test because simply we don't really care if they fail or they are critical. So we should first focus on the critical parts, which seems a bit obvious, but yes, we're mentioning. And it's important to notice that we will never be 100% right when we do load testing because we can't model perfectly what the users will do with the application. So for this to become more accurate, we need first to identify usage patterns. So we need to know how the users use our application or the most common actions that they do or the workflows in which they use our application. We would have to define the success criteria in measurable terms. So basically setting our goals, saying we want to be able to respond in less than this time or we want to be able to handle this amount of users. We want to set our target, so to speak. And also very important, we need to isolate the testing environment. This seems obvious, but still it may be dangerous. It wouldn't take proactive steps to prevent this. So I'll say it again, isolate your testing environment because even if it's obvious, I mean it may not be obvious, but you could have situations where you don't expect to hit your production environment and you end up somehow having an impact during your load test so make sure you don't ruin your production environment. These are some of the metrics that we used in our team. I'm sure you can come up with more through response time error rates, CPU use. This is the one that we're interested in, but basically you can have whatever you need for your system. So, low cost. This is the Asas and this is a Python open source tool. You can specify the user behavior in code, which is very cool. So you can have a class specifying what a user would do, send this request and this other request, and you can basically distribute this in different machines and start sending load based on the specification that you've done in Python with the example later in Python code. It's based on code routines and I think approach, which means that it makes it scalable and it's a nice coding pattern to follow. And we say that it's battle tested because it's been used by all the companies extensively, like for example, they say on the web page that the developers of the game battle field use it to swarm their servers with players with fake players. Sorry. So there's a command line interface that we can use to run low cost. It looks like this. We specify no web to use the command line. The low cost file is the specification of the users, of the user actions that I mentioned earlier. We're going to see an example later. Then we specify the number of users that we want to simulate, what's the hatch rate of those users, how many requests we want to send in total. There's also much more to specify the time that we want to have the low cost running, so we could say generate 100 users, 10 new users per second, and have that running for an hour and we'll see what happens. There's also a web interface that we can use to monitor the data. So it looks something like this. You can enter the number of users you want to simulate, the hatch rate, and they're mostly just swarming. You get here, like I've listed here, the request that we have in our service. You would have whatever request, number of requests sent, time that it takes, number of failures, like some statistics about the measurements. And yeah, basically, that's it. You have the big red button on the top. It's something that goes really wrong. And this is what I mentioned earlier, the low cost file. So this is how we define the user behavior. And we have a task set class. I can get this flow here. We specify the tasks that users will run. We use the decorator task to say that a client could run. In this case, it's an example for a web page. So we're using HTTP client. You could get the root of the website. You could get the above page of the website. You would expect the root of the home page to be hit quite more often than the above page. So you can specify an argument to the task decorator to make this 10 times more common than the above, for example, than the above request. So you would specify task, 10, and here task 1, something like that. So based on the data that you gather by analyzing how your users use your application, you can simulate that in your local file to get a realistic simulation. And then this is the local client. So basically you set the tasks and the class that you have defined before. You set how many the mean weight and max weight times, which is every user will send a request between every five seconds and 15 seconds randomly and uniformly. And this is how it works, basically. There's also the possibility to use a different client. So in this case, it's pretty simple because it's an HTTP built-in local client. You can specify your own protocol instead of HTTP if you're using RPC or whatever TCP protocol that you have. You can do your custom client where you specify all the requests, how those requests need to be approached, and you specify when a failure happens, what's the information that you want to see about the failure, like the name, response time, exception, or the success. And basically you pass, instead of using the HTTP locust, you use your custom locust here, custom locust class. It has to be a subclass of locust and that's the way you generate, you specify your user behavior. To deploy this, once everything is defined, we use containers, so we use Docker because it's also for tasks like this. Keeps things separate. It's easy to deploy. You specify all your dependencies and everything and that's it. And it's HTTP. So we even need to spawn more machines to get more user simulated. So this is kind of the architecture that we used in our team. So on the left-hand side, we have the cloud. We have a Locust master machine that is orchestrating this swarm. So this Locust machine, we talk to slaves and the slaves are the ones actually sending the request to our test environment. This is an alpha cluster. We're so we're doing the load testing on alpha. We have two machines. It's important that we send the request from outside the machines that we're testing because otherwise we're spending resources in generating the simulation and thus the results won't be accurate. So we have to do it from a different environment. In this case, we use the database as a cache because all the requests that we use because the request that you specify need to model what your users actually do. So we actually generate some mock data in the rich database so that the Locust slaves can send meaningful requests. And I have a demo, so I'm going to show a video. I'm not going to do a live demo, sorry, for risks. We can see the Locust interface here in action or maybe not. I can scroll myself manually in any kind of place. No, really. Well, it's the screen that we saw before. We can see here at the top it says how the batch rate, how the users are generated, the request are sent, so these numbers start increasing and we can see the number of videos that we have and so on until we hit a stop and it halts completely. That's it. Pretty much. So this is kind of a success story for Locust in our team. We started experimenting with this and we saw on the first day that there were many requests that weren't being dropped unexpectedly even with relatively small loads. So those requests were taking too long and we were blocking all the requests. We didn't really know why so the first thing that we did was adding more instrumentation. We added more logs. We added more metrics to understand exactly what the problem lied. We found that it was a regression in a database access a couple of weeks before we started using Locust. We had introduced a regression that made our database queries much slower so we made a fix immediately and we shifted to production. You can see here the blue line is the depth environment. So the data we found, the issue we fixed it, I immediately dropped like seconds. So it was a big success and you can see it in this later following a stage for a lot in which beta and agent production. So this is what we did at the time. Now as a future work we have two plans to implement this for all the services. We'll be testing one of our services. We're going to add it for more. We're testing a couple but there are still more that we want to test with different protocols even. There's also the possibility to run this on CI because so far we've run it manually using either the command line or either the interface but it will be interesting to have this running for example on a Jenkins instance. So periodically, maybe nightly, weekly or whatever it is we have to decide run this Locust instance and then compare the results to see if we're doing better or if we've introduced some performance issue. So yeah, that's basically everything that I had. Thanks for coming and if you have any questions. Awesome, thank you very much. I've got a first question over here. Hi, thanks for the talk. A question, I was looking at how you could define max weight if something took too long but you could also define min weight. So how does that work? Is it the case that sometimes you don't want something to respond too fast? I'm just curious about that. And one of the slides you could define the max weight which I assume is like you don't want the Circus to take longer than that but you also define the min weight. Should I understand to be that you don't want to think that fast? Yeah, because you don't expect the users to be sending requests constantly. They know it's scripted after all. So the time that it takes from clicking here to clicking there you want to have a delay there. Sorry about maybe you could just clarify what's the meaning of those min and max weight parameters? Okay, yeah, maybe I wasn't clear enough. So this means that for every user that Locos generates and simulates it's going to take at least 5 seconds between one request and the next request that the user sends and at most 15 seconds. So you're randomizing the interval here. Any other questions? I just want to make you run me a thanks note. No, thanks for the talk. So essentially this service has already been nicely segmented and then you're hitting the API rather there's no kind of coupling between the front end layer. You're purely hitting the API to make the users... It's purely hitting the back end of that one, doesn't it? And have you already instrumented the API in a way that like so are you scraping the logs from it or does this inject logging? Because how do you make your charts? Or is it the standard HTTP max response around the 500 type stuff? Yeah, for the HTTP case you can have the standard response times. In our case we use a proprietary protocol that we have in-house in Bloomberg. So we have also our logging systems. We have our metrics infrastructure that we use. And you specify in the case... In our case we're a custom client. We just specify here like the request and the time. So we do it in an inspection way. Thanks for the talk. I used Locust a few years ago and I found it quite hard to produce enough load to kind of hit the limit. What would your rough distribution be between the size of the load cluster and the size of your actual machine? So like rough number of cores comparison? Well, the alpha cluster where we're running the load test is more than the production machines which is representative for production machines but there are a lot more production machines. So it's smaller than the production machines because we're assuming that it's proportional. So we do a proportional load as well from the cloud. So it doesn't take that much to generate enough load. I'm actually not sure. We definitely have more locust to slaves. We just set the number of users that we need and it managed to generate the load. But very much that we are not using the full production like environment to test. We assume that it's proportional. So how can you export... What are your options for exporting metrics from locusts? So if you're running your load testing you're going to have some information coming out of that load testing. You get a CSV file. So you can get it downloaded from the web interface. It's actually shown on the video at the very end. So you can get a CSV file with all the data that you see on the screen and then you can process that yourself. No, it's purely raw data. We got any other questions in the room? So you said you were using a Bloomberg proprietary service. Was there any challenges with that? Did locusts make any assumption that you must be hitting HTTP or you must be hitting something that it knows roughly what it's like? We're using the libraries that we have in Python to access those services with Frotoco. So that was basically everything that we needed. Awesome. Thank you very much for some fantastic questions and thank you Avam. Great talk.