 Hello and welcome to Inside Kubernetes Networking, a KubeCon and CloudNativeCon Euro 2021 presentation. I am Dominic Tornot, Principal Engineer at Cisco. I focus on systems modeling, specifically conceptual and formal modeling to support the development and documentation of complex software systems. Kubernetes Networking is a core abstraction of Kubernetes. Kubernetes Networking ensures components within cluster boundaries and components across cluster boundaries can communicate. Kubernetes Networking is split into the Kubernetes Networking specification and the Kubernetes Networking implementation. In fact, many alternative implementations called Kubernetes Network Plugins exist today. The details of the Kubernetes Network implementation depend on the details of the Kubernetes Network plugin, no two are alike. Therefore, instead of discussing a complete picture based on one particular network plugin or an incomplete picture based on the least common denominator of all network plugins, in this presentation we will discuss an idealized implementation. Accordingly, this presentation is split into two parts. The first part discussing the specification, the second part discussing an idealized implementation. So first up, the Kubernetes Networking specification. From the point of view of the Kubernetes Networking specification, a Kubernetes cluster consists of a set of nodes. Each node hosts a set of parts and each part executes a set of containers. Additionally, each node hosts a set of processes called demons. In the context of Kubernetes, network addressable elements, that is, elements with an IP address, consist of nodes and parts. However, keep in mind that the ultimate producers and consumers of messages are not nodes and parts, but are instead containers and demons. The Kubernetes Networking specification is a set of constraints on the message exchange between containers and containers and containers and demons. The Kubernetes Network specification addresses three different concerns, container-to-container communication, part-to-part communication and demon-to-part communication. First, we will discuss container- to-container communication. The specification requires that a container C1 that is executing in the context of a Part P can communicate with any other container C2 that is also executing in the context of P via local host or via the IP address of P. Again, represented graphically, a container C1 that is executing in the context of a Part P can communicate with any other container C2 that is also executing in the context of P via local host or via the IP address of P. Next, we will discuss part-to-part communication. The specification requires that a container C1 that is executing in the context of a Part P1 can communicate with any other container C2 that is executing in the context of any other Part P2 via the address of P2. Again, represented graphically, a container C1 that is executing in the context of a Part P1 can communicate with any other container C2 that is executing in the context of any other Part P2 via the address of P2. Please note that this requirement does not include the node P1 or P2 is hosted on. Therefore, P1 could be hosted on the same or a different node as P2. Next, we will discuss demon-to-part communication. The specification requires that a demon D that is hosted on a node N can communicate with any container C that is executing in the context of a Part P that is also hosted on N via the address of P. Again, represented graphically, a demon D that is hosted on a node N can communicate with any container C that is executing in the context of a Part P that is also hosted on N via the address of P. Please note that this requirement does include the node D and P are hosted on. Additionally, however, strictly speaking, not part of Kubernetes networking but provided on top of Kubernetes networking, Kubernetes provides Kubernetes services and Kubernetes ingress. We will discuss Kubernetes services, both cluster IP services and node port services, as well as ingress when we discuss the idealized implementation of Kubernetes networking. Next up, the idealized Kubernetes networking implementation. In this presentation, we will rely on conceptual instead of actual implementations of a Kubernetes network to reason about the life of a message in the Kubernetes cluster. So the model we are about to discuss describes a conceptual network plugin. It does not describe an actual network plugin. We use this model as an educational model. Before we discuss a model of Kubernetes networking in particular, we will briefly discuss a model of networking in general. A communicating system can be modeled as a graph. The nodes of the graph consists of a set of containers and a set of switches. Again, here, a switch is a conceptual component, not an actual component. A switch can be modeled as a function and a forward information base. A switch can match any part of a message, typically the source address, source port, target address, and target port against its forward information base to determine the next action. Here, we assume that a switch may either drop a message, deliver a message to a container or demon, forward a message to another switch, or translate a message before calling itself recursively with the translated message. The links or edges of the graph consist of links between containers and switches and links between switches and switches. Messages may only travel between links. So as long as a network of switches, their forward information bases and their connections satisfy the constraints of the Kubernetes network specification, it is a valid Kubernetes network implementation. We will now discuss one possible network. As discussed, we will reason about the Kubernetes network in terms of a network graph. But how does a network graph, that is a valid Kubernetes network implementation, satisfying the Kubernetes network specification, actually look like? From here on out, we will use a simple Kubernetes cluster as an example. The cluster consists of two nodes, N1 and N2. Each node hosts a demon, for example the Kubelet, D1 and D2. Additionally, each node hosts two pods, P1 and P3, as well as P2 and P4. And finally, each pod executes two containers. Again, elements with an IP address consist of nodes and pods. And finally, in this example, the first container of each pod listens on port 8080, and the second container of each pod listens on port 9090. So for this example, what does a possible network graph look like? First, we assume that for every pod, there exists a corresponding switch from here on out called pod switch. For every container executing in the context of a pod, there exists a link from the container to the corresponding pod switch. Next, we assume that for every node, there exists a corresponding switch from here on out called node switch. For every pod hosted on a node, there exists a link from the corresponding pod switch to the corresponding node switch. Additionally, for every demon hosted on a node, there exists a link from the demon to the corresponding node switch. And finally, there exists a link from every node switch to every other node switch. Now to complete the network graph, we must determine the forward information base for node switches and pod switches. The forward information base of a pod switch contains three types of entries. First, local delivery via local host. Next, local delivery via the pod's IP address. And next, any other message will be forwarded to the linked node switch. Similarly, the forward information base of a node switch contains two types of entries. First, messages that target a pod that is hosted on the corresponding node are forwarded to the corresponding pod switch. And next, messages that target a pod that is not hosted on the corresponding node are forwarded to the node switch corresponding to the node that does host the pod. Later on, we will expand the forward information base of node switches to accommodate for Kubernetes services and add rules to translate messages. This network, consisting of containers and demons, as well as pod switches and node switches, in combination with the aforementioned forward information basis, constitutes a valid Kubernetes network. In this last section, we will walk through a sequence of examples, highlighting container-to-container communication, pod-to-pod communication, demon-to-pod communication, services, and Ingress. A fair warning. This section is tediously repetitive. But if you follow along, you will gain a solid understanding of the underlying mechanics. In this example, we will discuss container-to-container communication. Here, we focus on container-C1.1 and C1.2, executing in the context of pod-P1. Container-C1.1 will communicate with C1.2 via localhost. Recall that C1.2 is listening on port 1990. First, container-C1.1 will send a message M via its edge to pod-switch P1. The target address of M is localhost. The target port of M is 1990. The pod-switch P1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod-switch to deliver the message to container-C1.2. Pod-switch P1 will send the message M via its edge to container-C1.2. And finally, the container will receive the message. Next, in this example, we will discuss pod-to-pod communication. The pods are located on the same node. Here, we focus on container-C1.1, executing in the context of pod-P1 and on container-C3.1, executing in the context of pod-P3. Both P1 and P3 are hosted on the node N1. Container-C1.1 will communicate with C3.1 via the address of P3. We call that C3.1 is listening on port 8080. First, container-C1.1 will send a message M via its edge to pod-switch P1. The target address of M is the address of pod-P3. The target port of M is 8080. The pod-switch P1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod-switch to forward the message to node switch N1. Pod-switch P1 will send the message M via its edge to node switch N1. The node switch N1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the node switch to forward the message to pod-switch P3. Node-switch N1 will send the message M via its edge to pod-switch P3. The pod-switch P3 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod-switch to deliver the message to container 3-1. Pod-switch P3 will send the message M via its edge to container 3-1. And finally, the container will receive the message. Next, in this example, we will discuss pod-to-pod communication. However, this time the pods are located on different nodes. Here we focus on container C1.1. Executing in the context of pod P1 hosted on node N1 and on container C2.1 executing in the context of pod P2 hosted on node N2. Container C1.1 will communicate with C2.1 via the address of P2. Recall that C2.1 is listening on port 8080. First, container C1.1 will send a message M via its edge to pod-switch P1. The target address of N is the address of pod P2. The target port of N is 8080. The pod-switch P1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod switch to forward the message to node switch N1. Pod-switch P1 will send the message M via its edge to node switch N1. The node switch N1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the node switch to forward the message to node switch N2. Node switch N1 will send the message M via its edge to node switch N2. The node switch N2 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the node switch to forward the message to pod switch P2. The node switch N2 will send the message M via its edge to pod switch P2. The pod switch P2 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod switch to deliver the message to container C2.1. Pod-switch P2 will send the message M via its edge to container C2.1. Finally, the container will receive the message. Next, in this example, we will discuss daemon to pod communication. Here, we focus on daemon D1 hosted on node N1 and on container C1.1 executing in the context of pod P1 also hosted on N1. Daemon D1 will communicate with C1.1 via the address of P1. Recall that C1.1 is listening on port 8080. First, daemon D1 will send a message M via its edge to node switch N1. The target address of N is the address of pod P1. The target port of M is 8080. The node switch N1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the node switch to forward the message to pod switch P1. N1 will send the message M via its edge to pod switch P1. The pod switch P1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod switch to deliver the message to container C1.1. Pod switch P1 will send the message M via its edge to container C1.1. And finally, the container will receive the message. Next, in this example, we will discuss Kubernetes services. Here, cluster IP services. Some understanding of services will be beneficial. Kubernetes services is an anycast domain, where a message that is sent to the anycast domain is routed to an arbitrary member of the anycast domain. A cluster service is intended for internal communication. That is, the source of the message is a container executing in the context of a pod on the same cluster. The service is identified via a service IP address. This listing defines a service that selects all pods with a label value pair of foo bar and map support 80 to port 8080. Kubernetes will allocate an IP address for this service and configure the forward information base so that a message with a target address of the service IP address and the target port of 80 will be received by a container of any pod, matching the selector listening on port 8080. Here, we focus on container C1.1 executing in the context of pod P1, hosted on node N1, and on the container C2.1 and C4.1 executing in the context of pod P2 and P4, both hosted on node N2. Note that both P2 and P4 have a label value pair of foo bar. Container C1.1 will communicate with the service with the IP address of the service on port 80. First, container 1.1 will send a message M via its edge to pod switch P1. The target address of M is the address of the service. The target port of M is 80. The pod switch P1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the pod switch to forward the message to node switch N1. Pod switch P1 will send the message M via its edge to node switch N1. The node switch N1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the node switch to translate the target address to the address of pod P2 and the target port to 8080 or to translate the target address to the address of pod P4 and the target port to 8080. Here, we assume the target address is translated to the address of pod P2 and the target port is translated to 8080. The node switch will call itself and match the new message against its forward information base. And from here, we already know what is going to happen. Next, in this example, we will discuss Kubernetes services. Here, node port services. Again, some understanding of services will be beneficial. A node port service is intended for external communication. That is, the source of the message is a process outside the cluster. On the network, the service is identified via a node IP address and a port. This listing defines a service that selects all pods with a label value pair of foo bar and maps the node port 30,000 to port 8080. Kubernetes will configure the forward information base so that a message with a target address of any node's IP address in the target port of 30,000 will be received by a container of any pod, matching the selector listening on port 8080. Here, the source of the message is outside the cluster and we focus on the containers C2.1 and C4.1 executing in the context of pod P2 and P4, both hosted on node N2. Note that both P2 and P4 have a label value pair of foo bar. Here, we start our journey with the receive event of message M at node switch N1. The target address of M is the address of the node this node switch corresponds with. The target port of M is 30,000. The node switch N1 will match the message against its forward information base. It will find a relevant entry that instructs the node switch to translate the target address to the address of pod P2 and the target port to 8080 or to translate the target address to the address of pod P4 and the target port to 8080. Here, we assume the target address is translated to the address of pod P2 and the target port is translated to 8080. The node switch will call itself and match the new message against its forward information base. And again, from here, we already know what is going to happen. In this extensive example, we will put everything together and discuss Kubernetes Ingress. Some understanding of Ingress will be beneficial. Kubernetes Ingress is an API gateway for HTTP messages. This listing defines an Ingress that consists of two request level routing rules. A proxy is responsible to process these routing rules. Here, the proxy is fronted by a node port service to admit external messages. HTTP requests with a host header of example.org and the path A will be proxied to service foo on port 8080. HTTP requests with a host header of example.org and a path B will be proxied to service bar on port 8181. Here, we start our journey with the receive event of message M at node switch N1. The target address of M is the address of the node this node switch corresponds with. The target port of M is 30,000. Additionally, but not shown in the picture, the message is an HTTP message with a host header of example.org and a path of A. The node switch N1 will match the message against its forward information base. It will find a relevant entry that instructs the node switch to translate the target address to the address of port P1 that hosts the proxy and the target port to 8080. The target address is translated to the address of port P1 and the target port is translated to 8080. The node switch will call itself and match the new message against its forward information base. It will find a relevant entry that instructs the node switch to forward the message to port switch P1. Node switch N1 will send the message M via its edge to port switch P1. The port switch N1 will receive the message and match the message against its forward information base. It will find a relevant entry that instructs the port switch to deliver the message to the proxy. Port switch P1 will send the message M via its edge to the proxy. And finally the proxy will receive the message. Next the proxy performs request level routing. The proxy matches the HTTP host header and the path of the message against its decision table. It will find a relevant entry that instructs the proxy to proxy the message to service foo on port 8080. Finally the proxy will send a message M via its edge to node switch P1. The target address of M is the address of service foo the target port of M is 8080. And again from here on out we already know what is going to happen. With this we conclude today's presentation. If you would like to read up on today's material please visit my blog post inside Kubernetes Networking. If you are watching this presentation during the conference I'd be happy to answer your questions online. If you are watching this presentation after the conference I'd be happy to answer your questions on social media. But either way thank you for watching Inside Kubernetes Networking.