 Hello, everyone, and welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm Annie Talvasto, and I'm a CNCF ambassador, as well as I lead marketing at Vision, and I will be your host tonight. So every week we bring a new set of presenters to showcase how to work with Cloud Native technologies. And as always, they will build things, they will break things, and they will answer all of your questions. So you can join us every Wednesday or in a couple of other days, such as today and Tuesday, to watch live. And this week, we have Reza here with us to talk about troubleshooting EBBF data plane in QDBase cluster. I'm very excited for this. And as always, this is an official live stream of the CNCF and as such, it is subject to the CNCF code of conduct. So please do not add anything to the chat or questions that would be in violation of that code of conduct. So basically, please be respectful of all of your fellow participants, as well as presenters. But with that done, I'll hand it over to Reza to kick off today's presentation. Thank you, Annie. So let me find my slides somewhere here. And I can see that Andrea asks, will this EBB be recorded for later view? And I can happily say that yes, this will be recorded. And you can watch it, for example, from the CNCF YouTube page from the live streams tab. You can find all of the previous Cloud Native lives from there. So no worries. Miss something? You can always go back in the recording and check it out. All right. So now that we know this is being recorded, we can dive into the stuff. So hello, everyone. My name is Reza. I'm a developer advocate at Tigera. Tigera is the company behind open source project Calico where we do everything in terms of security, networking, and the cloud native scene. I'm always eager to learn new stuff and open to suggestions. So if you got an idea or you want to explore anything, let's connect. You can usually find me in Calico users like, I believe if you're watching this from LinkedIn, you should be able to click on my profile as well. I'm pretty easy to find. So for this presentation, I'm gonna talk a little bit about cloud networking and then give you a brief overview of Calico ABPF advantages. Then talk about troubleshooting because without talking about these fundamentals, it would be very difficult to explain why we need to build new tools to troubleshoot EVPF. Anyways. Yeah, and there was actually a quick question from the audience. Maybe you will answer this later on, but there is a question, what is EVPF actually? So EVPF stands for extended Berkeley packet filtering. However, at this point, it's not really about packet filtering. It's sort of a virtual machine that runs inside a kernel, I guess in a couple of months or years in Windows as well, that can directly execute commands or interact with the kernel. It offers a lot of performance and you can basically do a lot of the stuff that are very expensive in terms of resources with it, that are very expensive if you want to do it in other ways. So as EVPF. So Project Calico, if you like to connect with us, these are our social handles. Project Calico is all about cloud networking and security. We got a lot of Slack channel members and a lot of contributors. There are a lot of notes that are hovered by Calico every day and the beauty of Calico is it's an open source community. So you can just tick the code, change it and submit it to us and we'll include your contribution in the code base. So let's talk about cloud networking first. Now, modern applications are mostly hosted on the cloud these days. Oh, by the way, I forgot to mention, if you got any questions, please ask, Annie would help with the questions and read them for me in between decisions. There is also a Q and A at the end. So depending on how the stream goes and how the demo is working out, we might have time to answer your questions in the beginning or in the middle of the stream or at the end. If I don't know the answer, which by my experience is most of the time, I'll try to find one of Calico developers and engineers who know a lot more about these sort of stuff to answer your question. That being said, the best way to get answers about Kubernetes and everything cloud native is that project Slack channel. There are a huge number of community members and engineers who are always present in these project Slack channels that are, that'd be very happy to answer all your questions. With that being said, modern applications are mostly hosted on the cloud these days. Cloud computing allows companies of all size to offer their services without investing in infrastructure and data centers. On top of that, cloud computing offers unparallel flexibility and scalability that can quickly be turned, tuned to your needs. But this magical experience is possible by utilizing the power of cloud networking. Kubernetes is one of the main contenders in running the cloud environment. On its own, Kubernetes is a modular architecture, modular orchestrator that gives you the required setup to run and scale your workloads. In this presentation, our focus is going to be networking side of the Kubernetes, but at the end, we're gonna troubleshoot how this networking is actually provided by eBPF. Now, Kubernetes cloud networking has two parts of a physical networking infrastructure and a cluster networking infrastructure. Now, physical network infrastructure is the fundamental building block that connects your cloud resources to other networks. It could be on-prem or the internet, any other networks or maybe sometimes other resources inside your network. And it is usually managed by the cloud providers in the public clouds and the networking department of your enterprise in private clouds. This layer is comprised of cables, which is routers, firewalls, eUnited, it's there. The other part of cloud networking is the cluster internal networking. In a Kubernetes cluster, this is managed by your cluster CNI plug. This CNI uses a software defined networking or SDN to provide connectivity between your workloads and either internal or external resources. While there are some standard features that most CNIs provide, each one has a unique take on networking. For example, Calico is built with a modular architecture to provide you with the most flexibility and gives you a pluggable data plane that allows you to switch between technologies at any time that you desire. So as I mentioned, it is modular. The data plane for Calico is modular. So its architecture is similar to Kubernetes itself. It offers multiple choices that gives you the freedom to tune your cloud environment in the way that is the most optimal for your scenario. Today, Calico offers four pluggable data planes that can be switched on depending on your needs and environments. So starting with standard data plane, it's based on IP tables, provides fast networking, security and compatibility to all environments. Calico EVPF, which is our focus for today is another data plane created by Tigera which uses the power of EVPF programs to provide blazing fast networking and security for your environment. EVPF data plane offers capabilities such as complete co-proxied replacement, source IP preservation and DSR and a couple others. If you're running a hybrid environment with your Kubernetes, you can use Calico for Windows which is based on Microsoft HNS or host network service and can deliver networking security to your Windows nodes. VBP is the newest data plane for Calico. It was an external contribution from Cisco currently in its beta phase which accelerates the networking experience by utilizing the power of the user space programming. Now, why I'm talking about these in an EVPF stream? Well, this is the catch. So EVPF as I mentioned is a technology that runs inside the kernel. So everything that you try to establish is directly injected into the kernel at the lowest level. VBP is at the user space level which is the highest level possible for a program to run. Now, you might think, well, the kernel is faster or you might think the user space is faster. Well, the thing is, depending on your scenario, each one can have different performance and what is important is to check everything out, do your due diligence and figure out which one works for your scenario. In fact, EVPF and SNS are some of the technologies that we use in our commercial offerings too. So keep short, we're going to explore on the standard data plane first. Now, the standard data plane is built on the Linux networking stack. It utilizes the power of IP tables to communicate with the kernel and pushes network boundaries as much as possible. The setup has been battle tested in every possible scenarios and think companies like OpenAI, Reddit, like big companies are dependent on it to serve their customers. Calico EVPF data plane is another data plane, as I mentioned, utilizes the power of EVPF programs to directly interact with the Linux kernel, implementing network for your cluster. Now, standard Linux data plane has a high throughput and it was designed for 10 gigabit links, policy enforcement is very fast and Kubernetes services are fast in it. It is flexible and you can change its behavior very easily. The best part about standard data plane is that anyone can change its behavior to some extent. Now we're getting to the parts that will be included in the troubleshooting. So standard data plane is really easy to troubleshoot. In fact, anyone can face a problem and search it in a stack overflow or any other forums to figure out how it can be saved, how it can be solved. But with EVPF, it's a little bit more difficult because again, everything is a binary or is a program that runs inside the kernel. So EVPF data plane is designed for 40 gigabit links or more. It has a higher throughput while consumption or its resource consumption is less than standard data plane. Keep in mind that this is not something that all EVPF data planes, my chair, everything is different. Again, the thing is EVPF on its own, it's a tool and depending on how you use it, you will get different results. At the end, you would accomplish one goal. Let's say you would move packet A from source to destination, but depending on how you are programming your data plane, you might get different performance and different benefits. It offers a faster policy enforcement since EVPF policies are compiled as EVPF programs and are injected directly into the Linux kernel. Kubernetes services are faster since EVPF data plane implements them via EVPF programs inside the Linux kernel. So you don't have to go through NETS link to IP table. And with Calico EVPF data plane, you have more flexibility to control or manipulate traffic since you're directly interacting with the kernel. So I talked about the difficulty of troubleshooting. Now troubleshooting EVPF programs, as I said, can be difficult and since EVPF is a fresh idea and lacks the amount of resources or the questions that were asked like for other technologies like the IP tables, you might find yourself a little bit discouraged, but don't be because each project that uses EVPF usually ships a troubleshooting tools that can peek into their EVPF programs to give you some sort of output or some sort of information or insight. The thing that is important is to know that all of these EVPF programs that are injected into the kernel are binary files. So if you're trying to use EVPF, make sure that you first go to the troubleshooting page of the project that you are choosing and figure out what utility can be used. Today, I'm gonna show you how Calico EVPF troubleshooting works. If you like an actual comparison in numbers about EVPF, you can check this article. It talks about different data planes, how IP tables and EVPF work together, how IP table and VPF compare to each other. All right, so I'm gonna skip source IP precision. And there's an audience question. And I think this could be a good slot for it. We have a question that goes, so this tool will help us prevent against DDoS or scanning cluster in communities. So yes, that's why I skipped the first use case and we're at the second use case. So short answer is yes, you can use EVPF to protect against DDoS. For cluster scanning, you can use it as not that it's impossible, but again, cluster scanning would fall into your policy. So it's not mandatory to use EVPF. However, you can use it to get better results. All right, so as one of the audience members mentioned, DDoS, so denial of service is a type of attack where an attacker overwhelms the resources of a target system by floating it with an enormous volume of requests. This flood or this fluid of requests aims to exhaust the victim's capacity to respond, rendering their services inaccessible to legitimate clients, to legitimate clients. Now that we know how the attack works, let's explore the defense measures that we can employ with both the standard Linux data plane and the EVPF data plane. So in a typical Linux environment, incoming requests are initially received by the network card. Then they proceed to the Linux kernel and the Linux networking stack. At this stage, we can leverage the raw IP tables table to manipulate these packets and modify their behavior or destination. In such a scenario, where an attacker is targeting your environment with a DDoS attack, the earliest point that you can deny these volume of requests is the IP tables row. Now, it is important to note that with this approach, depending on the size of the attack, there can be a noticeable impact on your note. This is because these packets must be processed by the Linux networking stack before they can be identified and eventually dropped. Now, to minimize the impact of such an attack, you can leverage EVPF. So for instance, Calico EVPF and its XDP integration in such a scenario allows you to create policies that can inject XDP programs to intercept these packets at the earliest point of entry. So let's, when the Elisabeth user tries to contact, the EVPF data plane will take the request and send it to the process that is running. So this is something that is ideal. Now, what happens when a DDoS attack starts? Well, XDP integration allows you to interact with these packets at several points in your environment. In some cases, Calico can directly load its EVPF programs into your network card and drop malicious packets before they can even hit the kernel. Keep in mind that this requires a network card that supports XDP integration and not all network cards are capable of doing this. This method is usually referred to as offloaded XDP. In cases where the offloaded XDP is not supported, the network driver card might provide the capability to load the XDP programs as part of the initial receive path of the traffic. This method is referred to as native XDP and again, not every network cards are capable of supporting it. If both cases are not available in your environment, Calico will automatically use the kernel to run its XDP programs, which is a great way to run XDP programs on generic hardware that does not provide specific support for XDP. This method is referred to as generic XDP. So, in fact, Calico XDP integration is available on both data planes, but in the standard Linux data plane, it's a little bit limited. If you like to try this scenario, just scan the QR code and you will be presented with a live environment that allows you to fiddle around with the XDP policies. All right, so now that we talked about these, let's go to the demo. Any other questions? Well, I'm trying this out. Not at the moment, but there's a lot of people saying hi, which is always lovely to see, but again, if you have any questions, just send them into the chat, and I think that now that we are at the demo point, we will probably take the questions after the demo at the Q&A moment after. Thank you, Annie. I just need to figure out where's the terminal. There it is, I see it. Yes, and hopefully onto this, all right. Oh, the, but be a bit small to see. Is it small, all right? Yeah, but with the colors and everything, but we can try out, you know, how it goes as well. Oh, I can definitely solve this problem. Let me just change. It's always nice when there's an easy solution. Yeah, but it usually doesn't work. It happens. Yeah, so this one is gonna be a little bit better. Now I think it's looking better. Yes, and with a bit of zooming, I think it's looking good now. And then if the audience, if you're still struggling a bit, let us know, but I think it's getting there quite nicely. So thank you for confirming. I'm gonna use multi-pass to run a local environment at the end of the stream. I'll share all the codes that I'm using. It's going to be in the get page, my other presentation. So it's going to be Annie with share the link with you guys. Thank you, Annie. It's gonna take some time. If you guys got any questions, let me know as computer is a little bit slow. It always happens when you need it to be fast. That's for sure. True, true, that is true. No questions so far. Let's see if anyone has any right now. I would guess that immediately when we have a question, it's all gonna move forward and then we don't. That is true. But now we have a question, can we explain? So multi-pass is an application that allows you to run virtual machines in seconds. It's something that I usually use to spin up laboratories to run my experiments. As you can see, it allows you to use data, I think it was data user or something, which is Linux startup command for unattended, Linux startup script for unattended install. So you can just jerryk everything into your script and launch a virtual machine and the result will be something that is working. So multi-pass list, so that two command gave me two virtual machines that are configured with Kubernetes. It's everything is installed, nodes are connected, everything is working, hopefully, and we can do the demo. Perfect, and after we're done with it, we have a few questions lined up already, but we can finish the demo first. All right, so to install Calico, their commended way is Tiger operator. Now, if this cluster that you have access to throughout the URL that Annie was kind enough to share is configured without COOP proxy. So usually COOP proxy is what configures IP table in your Kubernetes environment that allows the resources that you have inside your cluster to be able to talk to API server, other resources outside inside. So everything happens with COOP proxy. Now, if we take a look at the operator here, at some point, it would tell us something is wrong. I think it's gonna be like a minute or a second, 30 seconds. So here, as you can see, the operator is trying to communicate with this IP address. We don't know what it is at the moment, and it can't. So if I come to my services, it's like, aha, it's the API server. Now, as I mentioned in EVPF mode, Calico data plane can fully replace COOP proxy. Now, how you can do it is you need to create a config map with the for you Tiger operator to let it know where the Kubernetes host actually relies. So if you look here, I have a environment variable. This environment variable is giving me back the IP address of my control plane. That's about it. So now operator knows what's going on. Now, the operator would restart itself and take for the installation. So for troubleshooting, most of the time, if nothing is coming up, just go ahead, figure out what is the log is saying. It could be Tiger operator or any sort of operator. Just skim the logs, describe the pod. It would give you a hint of where to look at. Now, first problem was easy to debug. So now that we have the Kubernetes API server configured, I can go ahead and check the logs again. Now, this time, as you can see, after the operator finds the OS, finds the stuff, it goes into communicating with the API server, gets some information that it requires and immediately stops. It stops for the installation config. So the operator would continuously look for your installation resource. Now, what is an installation resource? It is the instruction that tells the operator in which way you want to install Calico and how you want to configure it for this demo. These are the configurations. This is the pod sider IP. All my pods are gonna be in this IP sider for encapsulation. I'm gonna use VXLan. So each time when a pod wants to talk to a pod in another network, by network, I mean, if a pod in node A wants to talk with a pod in node B and these two nodes are in different subnets. So a broadcast domain doesn't work. It can encapsulate that information with VXLan and use it instead of the layer to ARP discovery. I do have NAT outgoing enabled. So by that, that means all my pods are right now able to talk to the internet. You can change that, you can make it disabled or you can create multiple IP pools. You can create multiple IP pools and put your pods into each one depending on how you want to form your environment. Now, next part, there is another problem. Well, as you can see, everything is running but core DNS. So the default behavior or the default configuration for Calico is standard Linux data plane. Now, there are a couple of ways that you can change the default data plane. So one is you can come here inside your installation resource around here at Linux data plane, I believe, and say BPF, that's one way. The other way is if you have installed it, you can actually patch it and everything hopefully works. You'll see, you'll never know with a demo. I forgot to talk about troubleshooting it before solving it. So in order to troubleshoot this, you can come and check the Calico node logs of images. All right, so how you can troubleshoot the previous problem? So Calico node is variable to an extent, you can change this configuration and it tells you a lot of information. You can rep for error and figure out what is not working. There is an explicit thing that says the data plane something is not working. So Calico node is variable to an extent, is not working. But how you can figure out if BPF is running? Well, you again check the logs for Calico node and it tells you BPF mode is enabled, you're all set. Oh, one thing to mention here. Calico uses unprivileged BPF. So there was an exploit some months ago where people could be still possible, but it depends on how you configure your BPF environment. For instance, Calico only uses unprivileged BPF programs. So no matter what happens, nobody can use BPF programs in order to exploit your environment. This is some sort of God shot that I've wanted to talk about. Now, next part, I'm gonna deploy an application and go actually into more BPF troubleshoot. So as I mentioned, BPF, BPF programs are binary. So if I go to my, let's say one of my nodes, there is almost no way out of the box. Similar to let's say IP tables dash L and we. So there is nothing like the sort of command that can give you an insight into what is happening. Now it's not a fault or it's not a flaw. This is because as I mentioned, each BPF data plane or each BPF program is unique to its community, to its projects. While everybody might use a same hook, they can use it in different ways or they can utilize it in different ways. So it will be pretty difficult to write a generic tool that can give you unique insights. It is possible to write something that gives you a general performance or general error sort of thing. Like you can use TC for instance. TC will allow you to run your generic BPF program and get some, what's it called? Some odd books about what is happening. TC also gives you a way to like hook into interfaces but we'll come back to that. So let's say I don't know what BPF is. I have no idea how to program. I enabled the eBPF data plane because of its performance and I want to troubleshoot it. Now what is the solution for me? Well, one thing that you can do is ship with your calcone node pod. There is an application called calcone node. Not surprisingly, but it has an option dash BPF. Now dash BPF has pre-prepared BPF programs that you can use to figure out what is going inside your cluster. So for instance, I can use the contours to figure out what is the ingress, egress or XDP metrics if I have any XDPs. If you launch the environment that I shared with a QR code, the XDP one, you can actually use these commands to figure out what is going on, how those policies are being used and you can use this information to your advantage, either performance tuning or troubleshooting. Now one thing to mention is there are a couple of arguments that you can use with these. I face, for instance, will give you explicit output about one interface or you can get all of them. There are a couple of others that you can use for instance, contract. Now Linux on its own in terms of networking is stateless. There is a component called contract that usually writes where a packet or flow is originated from and keeps it in mind for when you want to respond to that flow or traffic. Now in BPF mode, Calico establishes its own contract table using BPF programs. Now there are a couple of reasons for this but the best one is using BPF maps for this sort of things is very, it's very faster than relying on other things or other sort of hooks. Why? Because with BPF programs, as I said, the programmer can directly tap into the actual place that it needs to read and figure out what's going on and manipulate the traffic or the entries as they want. There are a couple of other things for instance, there is a NAT dump that you can do. So these will give you what is happening in terms of your pods accessing the internet. By the way, if these outputs are not familiar, don't worry, I'll share a link to the documentation which these are explained there way better than how I can explain it. It's explained by the clever engineers who peel these tools. I'm just trying to talk about it. So there are routes that you can figure out of what is going on in the notes that you have and one thing that might be very interesting for you guys is policy. Now, if you do a policy dump, you'll get an error. This is because when you want to do a policy dump, you need to tell what interface and what hook. Now the hook is either Ingress for the incoming packets or Egress for the outgoing packets or XDP, the one that I mentioned for DOS attacks. Now you can use all, but I would say an all is usually used when you want to just get of belling up what is going on. Anyway, now how this thing works, let me find an interface name first. Yeah, no worries, we have about nine minutes left and we have about one, two, three audience questions to cover as well, just keeping time, keeping in mind here. Oh, all right, so I'll just do this. So policy dump will tell you what is happening. It will use a TC hook to show you the way Ingress or Egress, you will figure out how these policies stack up. You can see the actual low level code for PPF. By the way, if somebody fancies a contribution to an open source, we can really use your help to change this output to whatever you like and add a flag for it. I'll be happy to review it, not that I would understand what you're doing, but I can get somebody who's really good in this sort of things to actually vet your contribution and that could be a good thing for a resume too. Again, it tells you what is happening and with that, I'll just end my explanations and share the link to the mothership. Perfect, this is the important thing now, perfect. Let's keep it there for a bit while we then get to some of the questions I think that we have. So the first one we have, colleague EPF can work with normal Linux system without Kubernetes cluster? Yes, there is actually a way to install Calico on a standalone host. Perfect, easy answer then. Can I use, Riza actually asks, from Riza, can I use EPF DDoS protection feature for advanced, actually how we understand what's happening? So, EVPF doesn't understand if an attack is happening. You can use EVPF troubleshooting tools or your monitoring platform to figure out if an attack is happening by monitoring the traffic, monitoring the bandwidth and if something is happening and it smells, you could write some sort of automation that adds EVPF policies to just get rid of that IP address or that sider. Cool, and then we have a question, which CNI tools prefer to use on premises? It depends on your scenario. I would say Calico, I can be a bit biased but there are lots of lots, you can use Flannel, Veeve, I don't, there are a lot of CNIs. Just test everything and figure out which one works for you. Hopefully Calico. Yeah, that would be a good one. And then we have, can I utilize EVPF for cluster mission? Yes, it is possible to use EVPF for cluster mission. Cluster mission, however, I'm not sure. So a cluster mesh is when you connect two clusters together. Now, there are ways to use EVPF for it but there are other ways too. Yes, you can definitely use EVPF to get those packets, assure them to their destinations. You can also use BERT or BGP or any Sorgot or protocols too. It depends on like how adventurous you are. Yeah, sounds good. And then Asif asks, is EVPF open source? Can we use it for testing? So there's a double long talk right here. So EVPF is open source. If you're talking about the programming part of it or the framework that you can use to talk to the Linux kernel, that is open source. If you're talking about Calico EVPF data plane, that is also open source too. You can go change it, run it in your own environment. Hopefully submit the changes for us too so we can use your awesome contributions. And can you use it for testing? Yes, you can also use it for production too. It depends on again, you and your environment. Great, and then we have the last question so far. And as we have three minutes left, so this is also last call for questions. So if you're typing a question in, send it in now while we answer this so far last question. Do Cisco Juniper also have a CNI solution? I'm not the best person to answer this because I'm not that familiar with Juniper. I do know Cisco ACS has option to install Calico and that way you can use a CNI in their cloud platform. Yeah. But yeah, the best way will be in searching their documentation. A bit of a deep drive there, perfect. While we see if we have someone typing a question, I would have a question. If someone's super excited to learn more about the topic, what would be the resource you would recommend them to check out next? So EBV of data plane, the documentation. So docs.tigerra.io as I shared with you. If somebody fancies writing code that directly interact with the kernel, then evpf.io is a great start. What's it called? The Linux kernel GitHub repository and the Linux kernel mailing list are way more amazing than you can imagine for this sort of information. But again, it depends on the levels that you want to interact with the technology. Yeah, sounds good. And now we have two minutes left and we have actually two new questions. So these are the final questions I think that we have time for. So we have, how can we use EBVF for Kubernetes security? I am very new to EBVF. So as I mentioned, EBVF programs are directly injected to the kernel. You can use these to manipulate traffic that is facing your Kubernetes cluster or you can actually peek into what is happening inside the packets. That will be the security view of the top of my mind. Yeah, then we have the final question for today which is hello, is the EBVF Calico highly available? The EBVF daemon agent crash dot dot dot dot and so forth. So essentially is EBVF Calico highly available? So Calico EBVF data plane relies on Calico node and Calico node is highly available. So if something happens, it will not drop your connections. Everything goes smoothly, but it like, what's it called, try to recuperate on the background and comes back. So again, BPF met working happens with BPF programs that are injected into your kernel. That's about it. If the BPF data plane crashes, the programs are in there and doing their work. Perfect, and that is all that we have time for today but so many great questions which is always lovely to see. So let's wrap it up. So thank you everyone for joining the latest episode of Cloud Native Live. It was great to have a session about troubleshooting EBVF data plane in Kubernetes cluster. And I really loved the audience interaction today and the questions really quit stuff everyone. Thank you so much. And as always, we bring you the latest Cloud Native code every week. So tune in in the coming weeks as well. And thank you for joining us today and see you all next.