 Hi, and welcome to Understanding the Kubernetes Networking Layer. My name is Marga Mantirola. I'm Director of Engineering at ISOvalent, the company behind products like Silium, Hubble, and Tetragon. I'm here today to share with you some fundamentals Kubernetes networking concepts and how they relate to the CNI or container network interface. Networking plays a central role in connecting our Kubernetes services between each other and to the world. And there's a lot that we can tweak and customize. We definitely won't be able to cover everything there is to know about Kubernetes networking here, but we should cover enough for you to have a basic understanding of what's going on inside your cluster's network. We'll first do a quick introduction of the fundamental concepts involved in Kubernetes networking, starting from the most basic components like the networking namespace, how pods connect to each other, and how services work inside our clusters. Then we'll build on to some more advanced topics like network policies, ingress, service mesh, and cluster mesh. Finally, we'll look into the different options available when choosing which networking plugin to deploy and how to pick the right one for you. During this talk, I'll focus on explaining concepts, clarifying how each piece relates to the others and ensuring they all make sense. I won't be showing you how to configure YAML files. I trust that once you know what you want to configure, you'll be able to find out the necessary YAML specification. Throughout this talk, I'll be talking about the features that may or may not be provided by the Kubernetes CNI. So before I jump into describing these features, let me briefly introduce you to the concept of the CNI. As I said before, CNI stands for container network interface, although this doesn't tell us much. We know that it's about connecting containers on a network. And yes, it's not just about Kubernetes. CNIs can be used to connect containers even if they are running outside of Kubernetes. Anyway, for this talk, what you need to know is that the networking plugin running in a Kubernetes cluster follows the CNI specification and so it's usually called the CNI. There are a bunch of different CNIs available with different goals and different features. Depending on our specific setup, we might choose the one that better fits our needs. But I'm getting ahead of myself. Let's first go through some of the features available and then we'll discuss how to pick the one for you. One of the first concepts that we need to tackle is that of the networking namespace. When we talk about namespaces in the context of containers, we mean that resources are inside some kind of bubble. The specifics of the bubble will depend on which namespace we are referring to. The basic Kubernetes namespace groups the resources present in our cluster, like pods or configs. This is not a security boundary, but rather a semantic one. On the other hand, we have namespaces that provide isolation between pods. For example, a process namespace means that processes visible inside one pod are different from the ones visible inside another. A mount namespace means that the file systems seen inside the pod are different from those in the host. And a network namespace means that each pod has an independent networking stack, including IP addresses, routing tables, and so on. In practice, what this means is that containers that run inside the same pod can reach each other through the loop back or local host interface. And to reach containers running in other pods, they need to use the clusters network. So how does this work? How do containers running on each pod reach containers running on another? The cluster creates a virtual network to which all pods are connected with cluster internal IP addresses. As this is a cluster wide network, it doesn't matter whether pod A is running on the same or different node than pod B. Both pods are connected to the same virtual network. Our networking layer ensures that packets can go from one to the other. With that, we've already covered the two most basic building blocks of networking inside Kubernetes clusters. First, each pod has its own networking namespace, its own stack. Containers inside a pod can communicate with each other through the local host or loop back interface. And second, pods can reach each other using the cluster wide IPs, no matter on which node they are running. The next step is making it possible for our workloads to reach the pods they need to reach. Say our Kubernetes application has separate pods that deal with the frontend and backend. To let our frontend pods connect to the backend ones, we use another entity called service. A service groups pods in whichever way makes sense for our application. For our example, we could have the backend service that includes all pods that have the backend label. So when a frontend pod needs to reach a backend pod, it does so by going through the IP assigned to the backend service. Remember that in typical Kubernetes applications, we can have pods coming up and going away, depending on different factors, like the load of the application or the availability of certain nodes in the cluster. Using services gives us an abstraction layer that allows our applications to just talk to the IP of the service without needing to care about the lifecycle of our pods. The service IP will act as a forwarding proxy, redirecting traffic to the currently active pods in a load-balanced way. This functionality is typically provided by the Kubernetes network proxy or Kube Proxy in short. Kube Proxy is one of the native Kubernetes demos that run in our clusters. It runs on each node, keeping track of which pods belong to which services and load-balancing requests as needed. There are many different ways of configuring Kube Proxy, and the configuration chosen will depend on whether we are optimizing for simplicity or scale. But no matter how we configure it, Kube Proxy will be in charge of translating the connection to a given service into a connection to a specific pod. It gives us an abstraction that hides churn by providing a stable IP address that pods connect to. This functionality is separate from the course C&I specification and is usually provided by the default Kube Proxy component. However, some networking solutions for Kubernetes come with an integrated Kube Proxy replacement that handles this behavior together with the rest of the networking stack. Now, another thing to understand when we talk about services is that there are four types of services, and the type we choose will change how applications can reach the service. The names are a bit confusing at first until we understand what they refer to. So let me explain them one by one. The default type is cluster IP. In this case, the IP that we use to reach the service is part of the internal cluster network. In other words, the service is only reachable from inside the cluster, and that's why it's called cluster IP. This is usually the right choice for services that are used by other services like the backend pod in our previous example. The Kube Proxy instance running on each node will create rules that handle traffic to this IP and redirect it to one of the pods currently alive and providing that service, no matter on which node they are running. The second type is node port. In this case, the internal cluster IP is still there and on top of that, the service is also exposed as an open port on each of our nodes. This means that the service can now be reached from outside the cluster, typically by other machines in the same virtual network as the cluster nodes. As before, Kube Proxy will handle the traffic and redirect it to one of the pods, no matter the node. So when we connect to an external IP of one node on a given port, our request may end up being served by a pod running in a different node. Next, we have the load balancer type. In this case, on top of the internal cluster IP and the exposed node port, there's one external IP address separate from the node IP addresses that makes our service reachable from the outside. And there's a load balancer in between that distributes traffic directly to the pods providing the service. Now, this type is special because it interacts with the infrastructure where our cluster is hosted. If it's hosted in a managed platform, like AKS, EKS or GKE, having a load balancer service indicates to the cloud provider that we want our service to be reachable from the outside through a public IP address. The cloud provider will allocate the IP address and then route traffic to our service through it. If we are not in a managed platform situation, we will need to run our own external load balancer to manage this. Typically, this is done using Metal LB. But didn't I say that Qproxy balances the load for the cluster IP and node port service types as well? Yes, that is correct. No matter the service type that we choose, traffic is always load balanced. The load balancer service type is used to indicate that we want an external load balancer separate from the one provided by Qproxy. And the last service type is external name. This is the odd one out as it's not distributing traffic between pods but rather sending it to an external domain name. It's typically used when having applications that have not been fully migrated to Kubernetes with some services running inside the cluster and others still provided by external components that are reachable through these external names. As a side note, there are a couple of geographical terms that are commonly used when talking about communication inside and outside the cluster. I'm not using this terminology here but you might encounter them when looking up documentation about this. So let me quickly clarify it. Communication between services that are running inside the same cluster is commonly referred to as East-West communication. This is the kind that we use when sending traffic between a bunch of different cluster IP services. Communication from the outside world with the services inside the cluster is known as North-South communication. This is the kind that we use when using node port or load balancer services. Now, how do our applications reach these services exactly? The simplest way is by using DNS. Every service defined in the cluster gets assigned a DNS name in the form of myService, myNamespace, SVC, cluster local. With myService being the name of the service and myNamespace, the name of the namespace where the service is defined. When any application needs to access a service it can just use normal DNS queries to find out which IP address to connect to. The domain name translation is typically provided by another daemon set called coreDNS. This daemon knows about the services available in our cluster and the IP addresses assigned to them and responds to the DNS queries as necessary. And as with Kube Proxy, some networking solutions might bring their own integrated DNS daemon rather than using default one. All right, we've covered exposing and reaching our services and it sounded pretty straightforward. For some applications, we might simply point the outside world directly to one load balancer service and be done with it. But it can get a lot more complex than that. In particular for HTTP and HTTPS applications we'll likely want to have more advanced setup where different services are in charge of different domain names or different paths of a given URL. This is where the ingress component comes in. It allows us to control which services serve which paths through rules specified in the ingress resource. This is sometimes referred to as a layer seven load balancer because it needs to look into what the actual requests are in order to direct them to the right services. To make this happen, we use an ingress controller that parses incoming traffic and forwards it as needed. This controller can be yet another service running in the cluster using open source products like NGINX, Envoy or HAProxy or it can be an external component provided by the managed cloud infrastructure. Using an external ingress controller is simpler as there's one less component to take care of. But using an in-cluster ingress controller can give us more choice regarding configuration options, load balancing algorithms and security capabilities. Which one we pick will depend on our specific needs. Now let's move on to one important security aspect. When we deploy pods in our cluster by default all pods can talk to all other pods. The front end can talk to the backend but also to the database, the certificate managers and so on. When thinking about this for the first time it might seem just fine to allow all traffic to go anywhere as application developers will be in charge of ensuring that their services talk to the components that they need to, right? The problem with this is that if certain pod gets compromised due to a security vulnerability letting it talk to all other services in the cluster increases the possible damage that an attacker might do. So we want to ensure that pods can interact only with the pods that they need to interact with and nothing else. For this we use network policies. These are configurations that we deploy in our cluster that establish the valid origins and destinations of traffic for the selected pods. That way only the traffic that we have allowed will be able to go through. These policies are enforced by the CNI but not all CNI support them. So if we want to have decent security in our cluster we'll need to choose one that does. This includes Celium, Calico and Wignet. All right, with those network policies in place let's move on to a slightly more advanced topic. One of the networking terms that has grown in popularity over the past years is service mesh. A service mesh is a dedicated infrastructure layer that we can deploy in our clusters to add some important capabilities like observability, traffic management, security and resilience without having to make changes to our own code. But wait, why do we even need this? In the cloud native world we typically split applications into a lot of different microservices. In large deployments each of these services can include hundreds or even thousands of pods. There's a lot of network traffic being sent and received between these services. Managing, securing and understanding what goes on with them is critical to the reliability of the system but it can be quite challenging. This is where the service mesh comes in. It lets us add the necessary infrastructure to the whole cluster rather than having to instrument it service by service, pod by pod. So what exactly are the features that we gain when deploying a service mesh? It depends a bit on the service mesh provider that we choose but we can split them into five basic categories. First, we have resilient connectivity which means that service to service communication works across boundaries like clouds, clusters and premises and this communication needs to be resilient and fault-tolerant. To make this happen the service mesh might do automatic retries and back off or stop sending traffic to a service that is not responding also known as the circuit breaker pattern. In other words, the service mesh infrastructure improves the overall communication between our services. Next, we have layer seven traffic management. This means that load balancing, rate dimeting and the overall resiliency are tailored to the applications, protocols and formats in use. The traffic management that we apply might be different for HTTP, REST, CRPC, web sockets and so on. This can also help us when deploying new applications by allowing us to do AB testing or canary deployments. Using the service mesh features we can have our traffic management fully customized to our needs. Another important feature is identity-based security. As our connectivity crosses boundaries and goes through untrusted networks we can no longer trust an endpoint just because of the network to which it's connected. Instead, services need to authenticate each other based on identities. The service mesh software helps us with this by ensuring that all traffic is encrypted and that access to our services is properly gated through the right identities. Security is a super important subject but not all application developers are experts on how to do security properly. By having it embedded directly into the service mesh we can improve the overall security of our clusters. Now, to make sense of all the additional layers that we are adding we'll need to have proper observability and tracing. Having observability in the form of tracing and metrics is critical to understanding, monitoring and troubleshooting application stability, performance and availability. The service mesh software sees the traffic go through and captures the relevant information about this data letting us better understand what's going on. And finally, a very important factor that distinguish a service mesh is transparency. All the functionality that we covered must be available to applications in a transparent manner without requiring us to change the application code. This is a core characteristic of the service mesh because it means that we can apply it in any cluster without having to modify the workloads to run in the cluster. When we talk about the service mesh we talk about both the software that we use to create the environment and the end result of deploying the software in our cluster. Okay, so how does the service mesh transparently add this functionality? The classical architecture of deploying a service mesh is to add a sidecar container to every pod in the cluster. These sidecars form the data plane of the service mesh and are what allows us to add transparent instrumentation by creating and tracking cryptographic identities capturing and exporting metrics and so on. All traffic that is sent or received by the application is also seen by the sidecar that captures the metrics, handles the encryption, performs automatic retries and the rest of the features mesh. On top of that, there's also a control plane that does the higher level management coordinating the behavior of the sidecars and allowing us to get a view of the overall network. This is the architecture used by some popular service mesh software options like Istio, LinkerD or Console. One disadvantage of the sidecar model is that we might end up deploying a ton of sidecars spending a lot of compute power on workloads that are secondary to our application. An alternative architecture is to have the data plane integrated directly into the core networking layer, the CNI, so that there's no need to deploy additional sidecars. Any necessary processing is done either by the CNI itself or by one L7 proxy that runs on every node. This has the advantage of having a lot less overhead, although it does come with the cost of having the functionality of the service mesh less isolated. This is one of the options offered by the Celium service mesh, which can be used with or without sidecars depending on what we want to optimize. We are almost done with our tour of Kubernetes networking concepts. We've been going from smaller to larger and when deployments become really large, it's getting more and more common to have them spread across multiple clusters, running in different regions, different cloud providers or in a hybrid mode that combines clusters running in the cloud and clusters running on prem. This is called multi cluster or cluster mesh. We might want to do it to increase reliability. So if services running on one cloud provider fail, we can fail over to those running on the other or we could interconnect our clusters to have some specific services running on premise with other parts running in the cloud. Whatever the case, interconnecting these clusters securely and reliably becomes another networking challenge. We want the services in our clusters to talk to each other without opening things to unwelcome actors. To do that, we need the networking infrastructure to handle the traffic that goes across clusters so that services inside one cluster can connect to services inside another using the same mechanisms and policies as when connecting to services inside the same cluster. There are a bunch of different options to achieve this. If we are already using a service mesh software like Istio or LinkerD, we can configure it to create a service mesh that spans more than one cluster. Alternatively, each cloud provider has its own hybrid solution like Google Anthos, Azure Arc, or EKS anywhere that also allow us to combine separate clusters. Or if using Cilium as our CNI, we can use the cluster mesh feature to interconnect separate clusters as long as all of them are running Cilium. And then there are specific multicluster solutions like Submariner, which might be an option if none of the other pieces are in place. With that, we've covered the most important concepts that we need to know to understand what's going on inside our clusters. Let's now move on to picking which CNI to use. But wait, why do we even need to pick one? One thing that can be confusing at first when starting with Kubernetes is that so many things are pluggable. Kubernetes gives us a basic architecture with some specifications that need to be followed by the different components, but also a lot of freedom to pick and choose which components to use and how to arrange them in our clusters. In the case of the network, the interconnection between our services is done by one or more networking plugins, which, as we already covered, need to follow the CNI specification. As we said in the beginning, there are only a few features that they must provide while most other features are optional or not detected by any standard at all. One interesting fact is that these plugins can be changed. This means that it's possible to combine these plugins to get to the functionality that we want. For example, if the default CNI in our managed platform doesn't support network policies, we can plug in another CNI to get the network policies support while keeping the default CNI for the basic functionality. Or if our architecture requires having more than one network interface available to our pods, but our CNI doesn't support that, we can add that support by changing MULTUS, a plugin that provides that specific functionality. So is there a default CNI or not? Kubernetes doesn't force any specific networking plugin on us, but when we deploy a Kubernetes distribution, it normally comes with one already installed. For example, small Kubernetes flavors like KIND or Minikube use KINDNET, which is a very simple CNI that supports only the most basic functionality. Its goal is to be simple and stay out of the way. Managed platforms offered by cloud providers also include their own default CNIs. On AKS, there's the Amazon BPC CNI. On AKS, there's the Azure CNI. And on GKE, the networking layer is called data plane. In all cases, these cloud providers have chosen a CNI for us, which will include some of the functionality that we covered earlier, but not necessarily all of it. When getting started with Kubernetes, we will likely be just fine using whatever is the default in the current platform. We don't need to start fretting about replacing it until we actually need to. So when would we need to replace it? When our architecture grows complex enough that we need better traffic management, improved security, or other performance enhancements? So let's assume we want to replace the networking plugin in our cluster. Which one should we choose? Let's have a look at four open source possibilities. The first one in our list is Flannel. Flannel was created by CoreOS, a container optimized OS that has now been discontinued. It was initially developed to interconnect containers outside of Kubernetes, then adapted to become a Kubernetes networking plugin. It is a simple CNI, very mature and reliable. It doesn't have many more features outside of the central one, but works well and is easy to use. To connect nodes, Flannel creates an overlaying network on top of the existing network. This is very simple, although not particular performance, as packets have an extra layer of encapsulation. Flannel doesn't provide significant additional features that would compel us to replace a default CNI with it. But it might be a good choice if we are selecting a CNI for a Kubernetes platform that doesn't come with a default. Another open source CNI that was originally created to connect containers outside of Kubernetes is WebNet, developed by WebWorks. One of the salient characteristics of WebNet is that it does some smart choosing of the data path used to send and receive packets between hosts. To optimize for throughput and latency, it uses OpenVswitch, a Linux kernel technology to perform faster packet routing. One of WebNet's goals is to make it easy to use out of the box. And so for example, it comes with a built-in encryption layer that handles all the necessary bits for doing authentication and secure communication. It can be used as a Kubernetes network playing that supports network policies, but not a lot of more specific Kubernetes features. It would be a good pick for someone that already uses WebWorks products to connect containers outside of Kubernetes and wants to also use them in their Kubernetes clusters. While the project has been inactive for a while, there is an ongoing conversation of contributing it to the CNCF and making it a fully community-driven project. So it might pick up pace in the future. Moving on, another possible CNI is Calico. Calico uses the BGP routing protocol to route packets between hosts. This means that packets don't need to be wrapped in an extra layer of encapsulation when moving between hosts, providing better performance and simplifying troubleshooting. On top of this traditional BGP data plane, Calico has recently added a new EVPF-based data plane and a Windows data plane. Calico also ships with many additional features on top of the basic CNI requirements. It supports network policies. It allows interconnection to non-covernetized workloads. It can do transparent encryption of all pod traffic and more. Not all features are available on all data planes. For example, when using the EVPF data plane, it replaces the cube proxy functionality, reducing the overhead of running one additional daemon, but IPv6 is not available in the EVPF data plane. Calico is an active open-source project with more than 200 contributors. Tigera, the company that created it, also offers an enterprise version called Calico Cloud, with additional features and commercial support. Calico is a well-rounded CNI and can be a good choice when requiring good performance, having hybrid workloads, or connecting to Windows clusters. Finally, let me talk to you about Cilium. Now, I said at the beginning that I work for Isovalent, which is the company that created Cilium, so I'm obviously biased, but I'm still trying to stick to the facts here. Cilium is a fully-featured CNI that was created from scratch based on EVPF. EVPF is a technology provided by the Linux kernel that allows us to dynamically modify the kernel behavior. The routing of packets is done directly in kernel mode, which makes it very fast and the dynamic programming makes it possible to apply advanced rules. Cilium implements all the different functionality that I covered in this talk. It can replace cube proxy, it includes its own DNS, it has an integrated ingress controller, it supports network policies with some additional features that aren't part of the basic spec. It can be used as a service mesh without sidecars and with sidecars. And it can create a cluster mesh to span multiple clusters. And more, Cilium supports transparent encryption, advanced load balancing algorithms, cluster-wide firewall rules, enhanced observability through Hubble, and a lot more features that we would have time to cover here. It's a very active open source project with over 400 contributors. It has been donated to the CNCF currently in the incubating stage. And Isovaliant also offers its own edition of Cilium called Cilium Enterprise that comes with additional features and support. Cilium is a great choice of CNI for most common use cases. It's also the default choice for some cloud providers. Google uses it as the basis for the GKE data plane and Amazon picked it as the default for the EKS anywhere offering. All right, with that, I try to share with you the fundamentals of Kubernetes networking. We've gone through the different features available, how things fit together, what are the options that you have if you want to replace the default CNI in your Kubernetes platform. I hope this has helped you better understand this complex landscape. Of course, there's a lot that didn't fit in this talk. So if you want to learn more, I recommend the official Kubernetes documentation. It has a lot of great content covering all the different components as well as practical examples. You can learn more about Cilium in the Cilium.io website. And if you want to try some hands-on labs, there's a bunch of super cool labs at the Isovaliant website where you can learn about service mesh, cluster mesh, network policies, observability and security while you try out Cilium, Hable and Tetragon. Thanks for listening and have fun applying all these new concepts to your Kubernetes deployments.