 Okay, hello everyone. My name is Xiang Feng and today I'll be talking to you about the overhead of ECU sidecars. This is a joint work with folks at Duke University and Bidens. First, just a few words about myself. I'm currently a graduate student at the University of Washington where I do research in microservices, sewage meshes, and application networking. And this is my email address and personal website in case you want to reach out after the talk. Performance versus functionality trade-off is a key concern for both ECU developers and users. This is a quote I took from the Envoy FAQ page and the performance overhead of Envoy and ECU depends a lot on the features enabled development environment and workload characteristics. For example, developers might want to know how to configure ECU given a performance budget or should I use Vibe Assembly or Lua to extend my service mesh. To solve this problem today, developers rely on ad hoc menu configuration tuning. They do black box measurements of different configurations and workloads. However, such approach have a key limitation. The configuration space of ECU is huge. For example, you can intercept traffic in different protocols and there are many ECU and Envoy features you can enable. Workload characteristics like required size and rate also have an impact on the magnitude of the overhead. And to this end, we build machine side to systematically quantify the ECU sidecar overhead to enable two important use cases. First, we want to help the ECU users to navigate the tradeoff between performance and functionality. And also for ECU developers to be able to estimate the impact of their optimization to make informed decisions. We consider two primary measures of overhead in this work, in other words latency and CPU usage overhead. And we build machine side mostly for sidecar mode but we believe it can be extended to the ambient mesh that is coming up. And machine side is written Python and using it is pretty straightforward. First, you need to run the offline profiler which generates performance profiles for Envoy on different platforms. And once you have the profiles for your development platforms, you can predict the end to end performance for specific call graphs using the online predictor. And the readme file in the repo has a lot more details. Instead of using a black box approach, we model per sidecar overhead based on a decomposition of model. The figure on the bottom shows the data path of a request. We break down data paths into several finer grain components. The read system call and there's three side components that represent parsing, the filter chain processing and the baseline processing. And then the write system call and finally the inter process communication between the Envoy and the application. The nice thing about this component is that they're independent from each other and when you compose them together, they represent the end to end overhead. And with the per sidecar model, we can go ahead and predict the end to end overhead. Mesh inside rely on what we call extended call graph to predict the end to end overhead. It captures the service communication pattern, the platform specifications, the sidecar configurations and workload information. Extended call graphs can be constructed from the distributed tracing or mounting to like yogurt and graphonum. And finally, given the call graph, the end to end overhead will just be the summation of all the sidecar overhead in the extended call graph. And for latency, the calculation is only based on the critical path. Okay, let's see how machine side work in practice. We apply machine side to a large scale Alibaba microservice trace to show how overhead vary across different settings. And we conducted the prediction using five configurations. In other words, TCP, HTTP, GRPC and HTTP and GRPC with setoff filters. The figure on the bottom here shows the CDF over the predicted end to end latency overhead. The main takeaway is that the performance overhead can vary by orders of magnitude for different settings. When you look at the P50 value of TCP versus GRPC with the setoff filters, they can vary by up to 10x. And different call graphs and end points also introduce huge variations. When you look at the overhead of the same configuration, when you compare the P50 versus the tail, which is P99, it can vary up to 50x. And we have a similar observation for CPU usage overhead as well. So these massive variations across different settings is exactly why we need to like machine side. So developers and users can learn the overhead of their specific development of interest without actually spending precious time profiling. We start, I will conclude my talk. Machine side is available on GitHub and we also publish a paper about machine side at the symposium of cloud computing last month. And I'll please check out the paper for details. Thanks everyone for listening. I don't think we have time for questions. Oh, yeah, sure. Oh, I should probably slide. Questions? Yes. Can you come to the mic maybe? So in the overhead graph that you showed, how much of it is actual overhead of the sidecar versus how much of it is because of the load, you know, queuing, etc. Do you isolate the two? Right. So the graph here shows you the overhead of just the sidecar. So for latency measurement, we have very, the load generator operate at very low load. So the overhead only shows you the service time. It does not include the queuing delay. So I'm curious, did you all look at any of the differences relating to things like TLS overhead and not? So one of the things I see a lot is people install the mesh, they do in TLS, and then they go, whoa, it got way more resource utilization or similar. And so one of the things that might be interesting is like, how much of that is the cost of TLS? Yeah, yeah, let's see what we do question. We did not consider TLS in this work, but it's definitely the plan to consider in the future. Thanks for the question. Okay. If there's no questions, if you're interested in the word, please come talk to me after that. Yes. Thank you for the talk. Then do you have any solution in your mind to utilize the benefit of Envoy, Sidecar, all the stuff, but at the same time, optimize the performance? Yeah, the purpose of this work is to kind of break down the overhead, see what are the primary contributors to the overhead. But I think previously in this conference where there are talks on how to augment data paths, for example, using BPF and also we're just replacing the IPC with a lighter weight mechanism. And also, I think Celium has trying to push the proxy to the kernel to save some system call at contact switches overheads. Does the mess inside give any details around the packet size that are transferred between the control plane and Sidecars? No, it is only a focus only on the data path. So it does not involve any measurement of the control plane and data plane communication. Thanks again.