 Hello, my name is Razaram Sampoor. I'm a developer advocate here at Tigera. Tigera is the company behind the open source project Calico, but more on that later. First, a little bit about me. I create bugs and watch them cause panics. This is because I'm always eager to learn new stuff and open to suggestions. So let's connect and exchange ideas. Just like many others, I've completed all the TV shows and movies and all the streaming services. And for a change of pace, I started to do the Calico EVPF certification course, which led me to an interesting paper from the 90s that proposed Berkeley packet filtering or BPF, which to me is like the best conversation starter for any kind of parties. I find it so fascinating that I decided to preach it to others and here we are. I'll try my best to share all the things that I've learned so far and hope it will help you in your journey as well. In this video, I'm going to give you an overview of Project Calico and what it is that we're doing at Tigera. Then try my best to explain how an application works. After that, I will introduce a magical solution to solve most application problems and then talk a bit about BPF history and EVPF. I will demo some interesting projects that I found along the way that will help you with the cluster observability and let you know what could be the next step after you tame the BPF superpowers. What is Project Calico? Project Calico is a community behind a pure layer three approach to virtual networking for highly scalable data centers. By layer three, I mean IP and routing. Calico is an open source networking and network security solution for containers, virtual machines and native host based workloads. It is important to note that Calico is not just a Kubernetes CNI. In fact, Calico supports a broad range of platforms, including OpenShift, Merantis, OpenStack and BareMetals. Project Calico is an active community about the cloud networking and security. Feel free to join our community using these social networking handles. You can drive the conversation where you see a need for change or seek help for your Calico journey from the developers who are actively working on the project. What is an application? Applications are arguably the most important part of the user experience these days. If you're looking to do something with your smartphone, PC or smart fridge, there is an application for it in an app store somewhere in the internet. In a server environment, applications are used to create services that can interact with user demands to provide meaningful experiences. An application is usually a piece of software that allows end user to achieve goals that are not implemented in the computer operation by extending the original capabilities of the underlying software. When you run an application, it transfers some essential information to the RAM and stores any additional information in some sort of data store like hard disk or SSD. And in some cases, the RAM itself. Linux Kernel is a great example of an application. Now let's explore how the application is loaded. For instance, when you start the computer, a lot of low level stuff happens and somewhere along the way, there's a jump to BIOS. BIOS starts to check the hardware, then attempts to boot the bootloader. The bootloader program at this point does a lot of magic to decompress and load the kernel into the memory. Long story short, if all magical things happen as expected, you will end up with a running kernel that provides access to the underlying hardware. At this point, other applications can use syscalls and other methods provided by the kernel to consume hardware resources for drawing or calculating data. As a real life example, the video that you are watching is possible since your browser or player is doing a lot of syscalls in order to render the picture and my voice. Applications do not always behave like they supposed to. Applications can become unresponsive from time to time. An application being unresponsive usually is caused by exceeding the available amount of resources or an unexpected behavior caused by user interaction or a bug. If the underlying cause of application unresponsiveness is not treated, it might leap into a kernel panic in the future and bring the whole system down. By the way, speaking from the experience, there's a huge chance that you are thinking about a new problem in your environment that just happened. And if you don't have a monitoring or observability solution, there's a huge chance that it's a recurring problem that have happened before, but it was missed because there's no lot. Now, let's talk about the magical solution that can at least momentarily fix most application problems. Fix support in the British comedy TV show, I.T.Crad used to say a phrase that we all have heard it once in our lifetime. Although it was used as a joke by the TV show writers, this is a real fix in many situations. Let's explore why turning it off and on again works. If you remember, I talked about how an application stores data in RAM. When a restart or a kill signal is issued, RAM gets cleared and resources become available again, resulting in a magical fix. At this point, you might be wondering, it can't be that simple. Well, it is that simple. However, there's a huge problem with this approach. After restarting or sending a kill signal, you might destroy critical information that could help you in diagnosing and finding the root cause of the problem. So is there a better solution? Yes, monitoring and observability. This is not a new topic. There are countless applications like IOS stats, top iPREF, you name it. That will give you a better understanding on how resources are consumed in your environment. In fact, whenever a company is on fire, someone is trying to find a new tool that can magically find and solve the problem. I would like to tell you there is a better way called BPF that can help you find any issues that you might face in the future or having at the moment. But first, a little bit about BPF itself. In 1922, almost 30 years ago, Stephen McCain and Van Jacobsen, apologies if I butchered the names, wrote a paper with some eye-catching claims. At the time, Unix was popular and capturing network traffic by using CMU Stanford packet filter, or CSPF as an abbreviation, which was designed for 64 kilobyte PDP-11 systems was huge. You can see the 64 kilobyte PDP-11 in the left image. While it was a pioneering work and a leap forward for its time, the huge growing demand for information and the old architecture of CSPF started to raise eyebrows by delivering a poor performance in your machines like some 60 megabytes spark station. Unfortunately, I could not find an image for some 16 megabytes, so you end up with some spark station 1 plus in the middle. They claimed with implementing BPF, capturing packets can be 10 to 100 times faster than CSPF. And applying kernel agents for discarding unwanted packets can happen as early as possible, resulting in a huge performance gain. Today BPF offers way more than just capturing packet and filtering it. Nowadays, it is better to refer to it as a virtual machine inside the kernel that can verify instructions and run them safely with a great performance inside the kernel without the need to recompile the kernel. This is a huge game changer since writing a kernel module requires rs and rs of kernel compilation, if it works. BPF is a goal-lang development toolchain that extends the BPF capabilities and provides an easier way to run BPF program inside the kernel. So demo. Full disclosure, there are some interesting projects written with the BPF that other people have written and maintaining them. Please check out their GitHub pages, the links are provided at the end of this slide. I'm just a messenger here. All right. So I have Kubernetes cluster, hopefully. Yes. My cluster runs Calico and it uses BPF data plan. I have also the Google Boutique microservices application, which is free and you can download from their GitHub. Now, I was curious to find out more about the things that are happening inside the BPF data plane. So I stumbled into the eBPF data plane troubleshooting page in the docs.project Calico and found out that there is a BPF trace application that can provide insight into the BPF programs that are running by Calico in real time. Just like any kid with a new toy, I started experimenting and see what I can find. Now, this is the BPF program and all its glory. It will show you ARP, ROTE table, NAT table. Now, the things that I wanted to see was the BPF routing table, which you can see here. This is different from the Linux routing table and it is also a BPF program that runs inside the kernel. You can also check out the NAT table and these are the live NATs that are happening inside my cluster. This is without SSH. As you can see, I'm just running the coop-cattle-execute to run the Calico node binary with the BPF argument. Now, there's also a DELETE and SET if I remember correctly, yes. So, there's a DELETE and SET which sounds to have great potential to cause panic, but I leave it to you to find it for yourself. Now, after this, I thought BPF claims to be able to peek into everything and I like to verify this claim by checking what is happening in my application. And by my application, I mean the microservices application provided by Google. So, I search a bit and came across coop-cattle-flame which is another project that uses EVPF to profile an application and create a flame graph for it. Don't worry if you don't know about flame graph. I didn't know about it either. Just searched inside YouTube for Brandon Gregg. He's the one who I believe popularized the flame graph and he's also the maintainer of it. So, he got a lot of materials about the profiling and all the good stuff. And after you run the profiler, hopefully, you will find... you will find... where is... Okay, we have to wait. Okay. Now, I have an older flame graph that is used to graph the front-end application that I've showed you. So, with the flame graph, you can peek inside each function that was running in your application loop and all the time that it took for the application to execute each function. So, you could come here and find out, all right, this application uses net-http-helper-handler-function-service-http to provide the HTTP service that... All right, as you can see, the flame graph is dead. All right. Now, after this, I thought, all right, this is great, but I want to create something on my own. What are... Is there any project that can help me to write my own BPF code? And let me introduce you to the BPF compiler collection of framework that can help you write and execute BPF programs and tie it to a Python interpreter. For this, I created a dummy pod with an alpine image and installed the BCC framework, which is quite easy. I will share a link at the end to see how you can do it on your own. As you can see, there's not much running in this pod and we don't see any other processes like the server processes that is running inside the front and pod application. Now, if I use the BPF profiler and ask it to give me a profile of the things that are running inside my node, it will give you all the information and processes that are running on my node. So if I say I want to see my server application, which is the binary that is serving the front end application, you will see there are these function calls that are happening inside the program and these calls that are happening inside the kernel. I was experimenting with other tools that BCC provides and found out about this thing called TCP Live, which is an awesome tool to find in real time what connections are happening inside the node and how much data is transferring. I think this is the coolest thing that I have found in my journey. Although BPF programs are written in C, as I said, BCC will help you to tie them to a Python front end, which makes it easier. So you could actually take the C portion of these applications and change the output into something that works for your scenario and hopefully share it with others. There is also a kube-catl trace plugin if I'm not mistaken. That can help you run these BPF programs directly from the kube-catl command line interface so you don't need the part like I did. All right, so what's next? BPF can help with tracing and profiling, as we just saw, but that is not the only use of it. BPF can be tamed to drive highly optimized and blazing fast network communication. As an example, Calico ABPF data plane, or it could be used for security purposes, Google Android BPF loader, or many other uses and frameworks like BCC with the help of BPF can provide all sorts of fascinating stuff in every level of kernel to generate a real-time invaluable insight for your services with a minimal performance impact. In fact, this image taken from the BCC GitHub shows all the information that you can gather by using the different tools provided with the BCC framework. Now, you might be wondering what is the next step after collecting these information? Well, glad you asked. Well, you got the stats, so let's force an unexpected reboot to fix the issue. If you'd like to create a demo cluster and take it for a spin, check out the GitHub link in the op. I added all the things that I have done to create the cluster and install every portion that you saw. If something goes wrong, don't be shy. You can yell at me at the links below. I'm also always available at Calico User Slack. These are the resources that I used for this presentation, and at the bottom, you can see the original BPF paper that is really a good read. I really recommend it. Now that you find this recording interesting, I would like to mention there's also a free EVPF course in academy.tigerra.io. The course goes in depth on what is an EVPF program, how you can actually write one and how these applications interact with the kernel. The great thing about the course is that you don't need any special requirements in order to start it. We also started a new community program called Calico Big Cats, and we are eager to learn about your experiences with Calico. You can find more information about the program and its marvelous swag on our website. At last, I'd like to thank you for viewing and bidding your farewell in your BPF journey.