 So, welcome everyone to the session on measuring service mesh performance 101. So I don't know if it's just me, but I think service mesh is the latest cloud 2 buzzword and it's all over Twitter and everyone is using it. That makes sense since there are obvious benefits for using a service mesh from managing your entire traffic, maybe adding some configuration policies, additional security benefits and even some telemetry and other metrics. But does these benefits, like we are at a point where we should be asking whether this, like how adding a service mesh is affecting the performance of your system, is affecting the performance of your workloads. So with this session, we'll try to walk through on how you can measure service mesh performance and why you should measure your service mesh performance along with some tooling and a demo on how you can measure your service mesh performance. We'll be also looking at some open source projects, both of which are in the cloud native computing foundation. And yep, so let's get started. So as a service mesh operator or as a service mesh adopter, we often find ourselves asking what does it cost, like it sounds too good to be true. So what overhead does being on a service mesh incur? So it is important to evaluate how your application, how your workload performs when it is deployed on a service mesh. So that is the first drive for us to measure the performance of our service mesh, the performance of our workload in our service mesh. And what this cost becomes is it can be an added latency or it can be an added resource consumption. Both of which can results in real world business impacts like additional costs, additional infrastructure support, and so on. So service mesh operators ask themselves, does the benefits of a service mesh outweigh these costs? Like service mesh are definitely helpful for them and but is it affecting our performance? Is it costing too much? And if that is the case, is a service mesh right for our particular use case? So as people are trying to adopt service meshes, instead of just following the crowd, they should step back and ask themselves, is a service mesh right for their particular use case? So let's look at the some performance testing basics. So we will have application deployed in a stable infrastructure on our service mesh. So there are a lot of service meshes out there and a new one is popping up every few months, I think. And like a funny metric here is that there are more service meshes than there are people using service meshes in production, but yep, there are a lot of service meshes out there and you have your application deployed on top of that particular service mesh. And to run this performance test, we will have a benchmarking tool. So our benchmarking tool will hit our application and hit our workload and it will be collecting some metrics after hitting your application and after producing this traffic. And we will collect some key performance indicators which we will look in at a later time. And we will collect these metrics from the application and from your infrastructure. And a key point that we should consider here is that the testing should be consistent and repeatable. And we will look at how this is ensured with a service mesh performance specification, CNCF project, in the later slides. So looking at some benchmarking tools. So benchmarking tools are basically layer 4 or layer 7 performance measuring tools. Some of the examples are WRK2, Forteo and NITOC. So Forteo is the official testing, official load generator for the Istio service mesh and NITOC is from the on void team at Google. So what this does is, what we can use this is to run performance test against our workloads in service measures. So a caveat here is that the performance of the system, of your system can change over time. And if you are using a benchmarking tool like WRK2 or Forteo or NITOC, you might need to change your test configuration over time to add up to these changes. So maybe the maximum load that the system can sustain changes over time. So what we need to do is we need to have a solution that can not only figure out what the maximum load specification of the test is. But what the maximum load specification, how it changes over time as well. So that brings us to adaptive load control. So to solve that problem, what we can do is we can automatically update our test configuration in each iteration. So each time we send traffic to our workloads in a service mesh, we can update the test configuration based on metrics that are returned by our benchmarking tool. And what we can use at the adaptive load control feature is we can measure how the performance has evolved over time. So how we have implemented this feature is in another CNCF project called Measury. So basically we have a test configuration and we have an adaptive load controller which is built into this tool called Measury which we will demo towards the end of the session. And so we have an adaptive load controller and we will use the benchmarking tool and we will send traffic to the system under test. And what we will do is we will have a monitoring tool which might be built into the load generator itself or it can be something like Prometheus and what we can do is we can collect this metrics. And based on this metrics, we can have this feedback loop and we can adjust our test configuration. So while adjusting this over every iteration, what we can find out is the maximum load your system can sustain. And we can run this test over time to figure out how the performance of the system has changed over time or after you make changes to your system configuration or things like that or any changes that might affect the performance of your workloads. Now coming back to some of the questions we posed earlier, how long should we run the test? So these are similar to how we normally run performance tests. So depending upon the scenario, we can have a load test where we test the system against a large number of users or we can test the maximum capacity of the system. We can test the system under extreme conditions or we can even go for something like soak testing which is more or less similar to real world use cases where we run tests for a prolonged period of time. And what are the measurements that we are interested in? What values should we be measuring? And what are the key performance indicators of a service mesh? So this brings us to the first CNCF project that I would like to talk about which is service mesh performance. So service mesh performance is a specification and what the specification does is it captures the entire performance aspect of a service mesh ranging from these static metrics and latencies to the type of service mesh you are using to the environment configuration that the CPU, the storage of the system and maybe the number of cores the system has. So what the SMP spec does is it abstracts out these performance indicators and it abstracts that into a specification. So what this means is that you can have consistent test, consistent run consistent test. You can get reproducible results with the specification and what you see on the side here is an example of the performance test result. This specification is constantly evolving and we are continuously iterating over the spec and we are working with folks at the CNCF tag network to improve upon the specification. So this specification essentially tries to capture the entire performance test result. And talking about running these performance tests, the SMP provides the specification but how do we actually implement the specification or how do we actually use the specification or run test that is compatible with the specification? So this brings us to our next tool, another CNCF project, a completely open source project called Mechery. So Mechery is the implementation of the service mesh performance specification and what Mechery does is it lets you manage the lifecycle of service meshes, it lets you configure your service meshes, it lets you run performance tests against your service meshes. So it is essentially a service mesh management platform and what Mechery does for service mesh performance is it implements the service mesh performance specification and as you can see you can use an interface to interact with Mechery and run performance tests on your workload on your service mesh. So that was a lot of talk and let's look into some of this in action with an actual demo. So let me get Mechery up and running. So this is my current environment. So this is the Mechery dashboard and I have an stereo service mesh installed on my cluster and I also have deployed Prometheus and Grafana to collect some metrics from my environment as well and as you can see you can also work with all the other service meshes. I have deployed only four of the service mesh adapters but you can go ahead and work with any other service meshes you want. So Mechery not only manages the service mesh lifecycle, you can manage your applications, you can validate your service mesh configuration and see if you follow the best practices or you can even bring in your own custom configurations and apply it. So let's look at performance. So what this is is the performance dashboard in Mechery. So what you can do here is you can collect metrics from your provider that is Prometheus and Grafana in my case which is ensuring metrics for some reason but what you can also do is you can schedule tests and you can schedule them to run at a particular point in time. You can have them run at a recurring time and you can go back and look at the results from the particular test that you run before and all of the stuff. And what you can do is you can also define your performance test configuration based on the type of test you want to run. So if I run a load test, I'm just running it for five seconds for the demo. You can get the results from the load generator which is 40 in this case but also you can get the other metrics from your Prometheus or your Grafana dashboard. So you can configure these, the tests are actually quite configurable. You can change the concurrent request or you can set a particular queries per second or you can leave it blank to figure out the maximum QPS. And you can also add some request headers or request body if your API needs that. And you also have, you can also work with multiple load generators and we are working on abstracting this out so that the user can work more from an operator perspective instead of having to deal with the entire implementation of what's underneath. And what this means is that a service mesh operator can deploy their application on multiple service meshes. So as Masheri provides users to work with multiple service meshes, you can deploy your application across multiple service meshes and compare their performance. So if you are evaluating which service mesh to use, if you are evaluating whether your application, how your application performs on and off the mesh, what you can use Masheri to compare them. And to compare that, let's see, so we have these four test results. So if we compare those, you can see how one performs better or worse than the other. So this can work well for tests that I mentioned earlier, like across multiple service meshes or after deploying your application on a mesh and off a mesh. So I think that's pretty much it from my side. And before I go, I also want to talk about the service mesh performance, Masheri service mesh performance GitHub action. So what we have here is a way for people to actually test the performance of their workloads on their CICD pipelines. So we currently have it as a GitHub action, but we are working on basically adding it to the other CICD platforms as well. And what this action does is you can basically spin up Kubernetes cluster, deploy a service mesh with Masheri and deploy your application on that service mesh with Masheri, and then you can run performance test in the pipeline, and it will automatically report back the results to your Masheri cloud instance. And so that is pretty much it. And I'll try to share the links while I answer some questions. And yes, I feel free to ask a question. I think we have some more time with us. So yep. I know there's a question in Q&A section. There's a lot of tools. Yep. So Magnus asked the tool you were referring to in the beginning. Yes. It is on-vibroxy's NITOC. So NITOC is a load generator, one of the popular load generators that is being used. So this was developed by the on-vibroxy team. And actually the Istio team works, the Istio team at Google works closely with this project as well. Yep. So that is definitely the tool. So I will also share links to the project. And you can check them out. After you check them out, you can reach out to us in our community discuss forum if you have questions. So feel free to jump in there and ask your questions on so much performance. If you think we can improve the specification and capture something so different, feel free to mention that as well. Thanks a lot, Navindu, for your time and talk here. It was really a nice one. Have a nice day. I think we don't have any more questions or folks, if you have any questions, Navindu is around. Yep. Please feel free to ask your... Thank you. Have a great day, guys.