 So, hey everybody, thank you for coming to my talk and staying so late. So today we're going to talk about effort as profiling on Kubernetes, a little bit about myself. So my name is Eden and I'm currently an engineer at Yahoo and I've been coding for almost 15 years now and I'm really into everything, observability and profiling. I think that finding performance issues in production is my hobby and I created one of the tools that I'm going to show you here today called Cube CTL Flay. So today we're first going to start by some little bit of introduction to what is profiling and how do you analyze a profile, what's the best way to visualize a profile, secondly we're going to talk about some of the challenges in general in profiling and secondly in specifically when profiling in Kubernetes. And lastly we're going to do some demo, I'm going to try to profile multiple applications and do it first manually and then by using Cube CTL Flay. So profiling is the act of analyzing the performance of applications in order to improve truly performance section of code. And simple words is basically profiling is the process you do when you have some slow running application and you want to understand why this application runs so slowly. One of the most popular ways to visualize a profile to present a profile is by generating something that is called a flame graph. Flame graphs look something like that. It's multiple stack traces stacked on top of each other. Basically the y-axis is the stack depth, it's basically the stack trace itself, the x-axis is the sample population. Basically the wider the flame is the longer a method called took. The color is usually used in order to differentiate between different type of method calls. For example here green can be Java code, red can be C code, and orange can be some kernel space code. And the order is usually not that important, flame graphs are usually ordered alphabetically. So I tried to think about why profiling is so hard and I think it comes down to two main reasons. The first is overhead. Simply the act of profiling an application makes the application run slower. This is the main reason that most people try to avoid profiling in production because we don't want to hurt our production applications. Secondly, is that if I have some applications that is not ready to be profiled yet, it's not profilable. I need to modify the code in multiple ways in order to be able to profile the application. For example in languages like Node.js or Python, you have to add some flex to the execution command or if you are working for example in Golang, you have to import some library into your code like a pprof library. Choosing the right profiler might solve those two problems, but knowing which profiler to choose is not an easy task by itself. It's something that is highly dependent both on the programming language and on the operating system and it requires a research. So those two pain points are only getting harder when dealing with applications running on top of Kubernetes. So let's say we want to add a profiler to our container image. We need to modify the image itself. We need to include the relevant profiler binary and then modify the application code. After we did all those changes, we have to build the modified image, push it to some registry and deploy it to our cluster. This very deployment may trigger an application restart. And when you are dealing with a performance issue, you don't want to restart your application because some performance issues for example like memory leaks may disappear when application restarts. So we wrote a tool to try to handle all those complexities. It's a Kube CTL plugin called Kube CTL Flang. So it's a Kube CTL plugin. You can easily install it via Kru. Kru is a package manager for Kube CTL. It's a really cool project. If you are not familiar with it, I highly recommend you check it out. Kube CTL Flang aims to make profiling effortless by removing the need to do code modifications and by doing profiling without having to do restart to your applications or crossing any downtime. The way it works is that Kube CTL Flang is basically a facade for choosing the best profiler for every task. So if we are trying to profile a goal in application, it will choose an EBPF-based profiler. If we try to do a profile for Java or any JVM-based, it will choose Async Profiler. For Python, it will be PySpy and for Ruby, it will be ArbiSpy. And many more languages are coming soon. So the way it basically works, so it's a Kube CTL plugin. So it's basically a command line interface application that communicates with the Kubernetes API server. It detects which node the target container is scheduled on, and it creates another container, the profiler container, which contains all the profiler binaries. Those containers are sharing the same process ID namespace and the same file system via hostpef. This way we can inject the profiler into a running container, a running pod, without having to do any code modifications or any restarts. So enough with presentations, let's do some profiling. So in order to demonstrate this profiling, I'm going to use a microservices demo from Google Cloud. It's like a demo of eCommerce app, which contains like 10 microservices in five different programming languages and even some Redis database. We are going to focus on two main applications, one in Java and one in Python. The Java one is the ad service, the application that is responsible for displaying advertisements. And in Python it's the recommendation service, it's the application that is responsible for showing recommendations on products. So just to give you an idea what is this application, it's simply eCommerce site, you can buy products to the demo. So I already have this application deployed here locally on my Minicube cluster. So let's start with doing the profiling first manually and then we'll try to improve this process using Cube CTL Flame. So the first thing we need to do is to modify the container images to include profiler. So let's start with the recommendation service, I'm going to go ahead and modify the image. So I already did the research and chose to use PySpy Profiler because it's a Python application, there are many other profiles you can use. And to simply add it to the container image I'm going to add a pip install PySpy command and I already have Scaffold running in the background and Scaffold is like a local CICD that automatically watches all the files that are changing and it's going to build the image for me and push it to a registry and deploy it to the cluster. Once I will save this docker file, so let's go ahead and do that. Let's watch him do his thing, let's wait a second and we should have a freshly created recommendation service you can see by the age. So now we have the profiler installed on top of the recommendation service, let's go ahead and invoke it. So the first thing you need to do is to execute into the pod and then invoke the profiler. The way you invoke the PySpy Profiler is by running those commands. So what these commands do is basically saying let's profile the process ID number one for 10 seconds and save it as a slash tmp slash profile dot svg file. So let's go ahead and invoke this profiler, okay great. So now we created a profile. Let's copy over this file to my machine so we can watch it turn on the browser. So we're going to use a kubectlcp from slash tmp slash profile dot svg to here, let's call it record dot svg, okay. Now we can open and we can open our lovely frame graph and see exactly what our recommendation service is doing and analyze it and find all the bugs and the poorly performing code sections. So now let's do the same but for the Java service. So again we're going to edit the container image. For Java I already did the research and I chose to run it with async profiler. This time we're going to edit to the container image a little bit differently. We're going to do wget, download the latest release from github, extract it, the tarjzip file and move it to slash OPT slash profiler. Then I will save this file and scaffold will build the image for me and deploy it to the cluster. Let's wait a second. If we do a kubectl get pods we should have a newly created ad service with the age of eight seconds and now we can execute into this pod again and invoke the profiler. Let's go to the profiler directory and again we're going to run this command which basically says profile for 10 seconds, the process ID number one and save it at slash tmp slash profile dot html. So now let's do the same trick, let's copy it over to my machine, let's call it ad dot html, open it and see again what our Java application is doing and analyze this profiling. So let's recap for a minute what we just did all this manual process. First we had to do research, we had to find out what is the best profiler, first for Python and secondly for Java. The second step we had to do is to add it to the container. As you saw in every profiler there is a different way to add it to the container, sometimes you install it via package manager like pip, sometimes you have to do wget and extract it, there are many different ways. The third step is to actually execute into the container and invoke the profiler. There are two main issues with this step, first executing into the container and running this command actually assumes there is some shell on the container, on the image, which is not always true. For example, if you are using a really tiny image like distro less or images like that, you don't always have shell in your container image. So this whole manual process might not even work, but let's say you do have a shell, you still have to execute into the container and run some kind of command, every profiler have a different API, you need to understand and read the documentation and know what are the command line arguments and what every argument does. And after you did all this you still have to go ahead and copy the file over to your machine so you can examine it and open it and visualize it in some kind of browser for example. So overall not a very friendly process, let's try to make it better. So the first thing I'm going to do is to roll back the changes that I did, so we can remove the profiler from the image, I will save the files and Skaffold will automatically deploy it for me. Right, so now let me show you how we're going to do it with Kube CTL Flame. So for the purpose I'm going to do it in two terminals side by side. The right terminal is going to have some watch on Kube CTL get pods. This way we can see all the pods that created in real time and it will help us to understand better how Kube CTL Flame works behind the scenes. And here I'm going to actually execute the commands. So as I said earlier, Kube CTL Flame is basically a Kube CTL plugin. So after you install it you simply use it like any other sub command in Kube CTL. So you do Kube CTL Flame, let's start with the Java service. You need to specify which language, let's say Java, for how long, for 10 seconds. And let's save it as 2.svg. So once we run it you can see that a new profiler pod is created on the same node. It will start running, it will attach to the running add service, invoke the profiler automatically for us and will terminate when it's finished. And the file is also copied over to our machine. And the other cool thing is that add service you can look at the age did not restart it at all. So now I can open the file that it's created. And again we have the same flame graph as earlier. Let's do the same thing but for the Python application, again for 10 seconds. And let's call it req2.svg. Again we can see that the new profiler pod is created on the same node with the relevant profiler inside. It will attach to the recommendation service, run the profiler and terminate at the end. And again we can open the same image, the same flame graph and understand exactly what our recommendation service is currently doing. So this kubectl plugin, this flame plugin actually saves us from learning which is the best profiler to use, what is the specific API details, copying over the file, having to restart our application, having to have a shell on our container image. All those are saved for us by this kubectl plugin. So this was for the demo. So actually the last thing that I wanted to talk about is like a few points for the future of profiling as I see it. There are three main features that are going to make profiling better in the future. The first thing is a newly featured in Kubernetes called ephemera containers. It's already in alpha in version 1.22 with the feature gate. Basically it brings native support to Kubernetes to attach a container into a running pod without having to do all those workarounds that I'm currently doing. It will make profiling much more accessible without having to use a host path or process ID namespaces. The second idea is the eBPF new feature. eBPF by itself is a huge game changer regarding everything relating to profiling and observability. But it's a whole new topic by itself. There are tons of other talks here in KubeCon specifically on eBPF. They recently added a new feature called compile once run everywhere which makes running eBPF applications much more easier. You don't have to have all those Linux headers inside your node or container. And the third point is continuous profiling tools. There are more and more tools that try to make profiling a continuously running process with flame graphs that are generated continuously and always on. I tried a tool named prodfiler a few days ago which looks very promising. And there are lots of work to do in this area but once those tools are become mature it will change the way we do profiling for the better. That's it. Thank you for listening. If you have any questions I'll be happy to answer.