 Bonjour. Hello everyone. My name is Neha Garval. I'm a dev manager at Microsoft. I work in Azure Container Networking, Network Security Policies, and Observability for Azure Kubernetes Services. And along with that, I'm also actively involved in an ongoing discussion for a CEP for multi-network in a segment multi-network group. And with me. Hello everyone. My name is Ardalan Kangorlu. I'm a distribution engineer at NetApp. I've been involved with the Kubernetes community and I've been software based on Kubernetes since 2016. So today we are going to talk about how we have implemented network isolation in a shared multi-tenant cluster. So before we start, let's quickly go through the agenda. We are going to talk about the basic principles of Kubernetes multi-tenancy. What are the existing scenarios that exist today? That's all some of the crucial network multi-tenancy scenarios. How we have enhanced those scenarios and added network isolation on a shared multi-tenant cluster. And last but not at all the least, we have a very promising demo followed by question and answers. So with that, I give the floor to Ardalan to start. Thanks now. So as Mel mentioned, there are different definitions for multi-tenancy based on the context. For the purpose of this talk, we're going to define multi-tenancy as running multiple instances of the same application or different applications with some level of isolation on shared infrastructure. There are two main reasons for doing that. One is reduced costs by consolidating many applications on fewer VMs and clusters. We can reduce costs. There are fewer VMs to manage, less cluster management fees that all hyperscalers charge. And the second reason is scale because now there are larger pool of resources available to any given application, so the potential for scale is much higher. Now, as Mohan mentioned, there are two main considerations here. One is security, the fact that applications from different tenants sometimes, you know, they don't know each other or they can be adversarial in our applications that can run on the same infrastructure can run. And the second problem is that applications can adversely affect each other's performance, commonly known as the noisy neighbor problem. Now as far as state of multi-tenancy in Kubernetes, there are different implementations within Kubernetes and outside of Kubernetes. As far as compute, Linux namespaces and C-groups are common ways to isolate processes running on the same host. If you want even higher isolation, you can run sandbox pods. Examples of them are Catechon Containers, G-Visor and Hyper-V. That provides kernel-level isolation, just like virtual machines. As far as storage, there are some objects that are namespaced, some that are not. Plus, different storage protocols like NFS, SMB or different storage platforms, they have different ways of restricting access to volumes and files. As far as networking, that's the main focus of this talk. So we covered that quite extensively later. And in Kubernetes namespaces, these are the common constructs to separate different tenants and using service accounts and R-backs are common ways to control access to Kubernetes objects. Now the classic example of multi-tenancy is Coke and Pepsi running on the same shared infrastructure. So for the rest of this talk, I'm going to use the hypothetical example of the red soda company and blue soda company running on the same cluster. And I'm going to reference an application that I wrote called SodaDB. You can get it from GitHub to illustrate these concepts. So the three main ways to provide network multi-tenancy in Kubernetes are pod networking, network policy and service mesh. And these technologies are different in the scope of isolation that they provide. For example, pod networking provides isolation within a single Kubernetes node. Network policy is associated with providing separation for pods that are running on the same cluster. And then these technologies are different at different layers of the stack. For example, network policies correspond to transformer layer and the network layer, port numbers and IP addresses. Now, just a quick recap of how Kubernetes network is implemented. In Kubernetes, we have a flat network namespace model. That means any pod can talk to any other pod or a node or the API server using its IP address. Now here's an example of two SodaDB pods that are running on the same node. Despite both of these processes listening to port 8080, they don't interfere with each other. That's because the underlying processes, they map to different network namespaces in Kubernetes and in the host on the Linux. And if you want to know the next level detail, these network namespaces are implemented through something called POS containers in Kubernetes. So for each, for example, in this case, I have the red SodaDB namespace. I can see its owner is a POS container. And once I enter that namespace, I can see the IP address corresponding to the red Soda pod associated with this namespace. Now, network policies are ways to specify rules for controlling ingress and egress for pods. This requires CNI plug-in support and has been there since the early days of Kubernetes. An example of a network policy would be to allow communication between front end and back end or in a restrict communication within only red namespace or within the blue Soda namespace but not across them. Perhaps the most complete technology that provides network multi-tenancy Kubernetes to the service mesh and by defining authentication policies and authorization policies, one can restrict communication between different microservices that are running on the same cluster. So at NetApp, we wanted to build a storage service based on Kubernetes. And unfortunately, these technologies, they all have some gaps. For example, network namespaces and C groups, again, they help with isolation within a single Kubernetes node. Network policies help with isolation between Kubernetes pods that are running on the same cluster. In our use case, our server pods, they run on Kubernetes, but the storage clients, they can run outside on virtual machines, right? And with service mesh, one can have a mesh that consists of both Kubernetes pods as well as virtual machines, but that use case is only limited to GRPC applications. And for our use case, we wanted to support NFS and SMB storage protocols, so that didn't work for us. So the question was really, can we do better? And in the rest of the talk, Neil is going to explain how. Before that, this is really the state of art as far as how network multi-tenancy is implemented for cloud services. So today, in GCP, in Azure, and in AWS, the way cloud services are built is that each Kubernetes or service project has a dedicated project. Within it, we have a single, within it, we have a Kubernetes cluster. And this project or subscription gets linked with your customers project via VNet or VPC peering. So as you can imagine, this is very secure because there is only a direct link between customers environment and the service. But the downside is that there is really no sharing. Each customer, each tenant has their own dedicated Kubernetes cluster within the service project. So now Neil is going to tell us how we can improve on this model in the next few slides. Thank you, Arlen, for covering all the great options we have. So just in the last slide, what Arlen has explained. So the hosted services, what they're doing right now, they're deploying customer applications, which requires direct connectivity to their network, into their dedicated infrared Kubernetes clusters. Now what does that mean? That incurs a cost to the hosted services, which in turn will be expensive for the end customers as well. So what we really wanted to achieve is the shared Kubernetes cluster to host a multi-network, multi-tenant workloads. Now in this diagram, you can see there is a customer pod deployed on a Kubernetes cluster hosted into the hosted service tenant. It has two interfaces. One interface is the default interface. Let's call it an interface from the default network and let's call that as an infrared work. And that is routing all the traffic. Let's call the traffic as management traffic. That's routing the management traffic to the default cluster wide network. Then other Nick, which is injected, which is the ith one, the yellow Nick here, which is injected from the customer network into the pod running on a multi-tenant Kubernetes cluster. So that Nick has the private access to the services running into the customer network over a private IP. All this has been done on a shared cluster and that what that gives us, it saves money to our hosted services and in turn to the customers as well without compromising security. Let's talk about the challenges what we have faced in achieving what we have just discussed. So two main things, the Kubelet. Kubelet is not aware of multi-network pods. Kubelet internally invokes CNI. For those who do not know what CNI is, CNI is a container networking interface which is invoked by the container runtime to provision the networking for the pods. So Kubelet when invokes CNI, it only is aware of a single interface. It will go into provision the default interface on the pod, can have multiple IPs, but they all belong to the same interface. Similarly, CubeScheduler, it does not know about these special nodes which can satisfy the multi-network pod needs. So let's see how we are addressing the problem. So we have extended our CNI to attach the secondary interfaces onto the multi-network pods. So now here on the right side you see there is a node which today, a Kubernetes node is provisioned with a single interface, but now we have extended the node to also attach additional interfaces which will later be provisioned uniquely for each different network. So on the, you see your two extending R-Lens example for red soda and blue soda. We have our two necks which are red one is for red soda neck and then there is a blue soda neck it's two. They both are provisioned separately into their red soda tenant and the blue soda tenant and when a multi-network pod, the red soda pod gets deployed, now that neck which was provisioned on the host gets projected into the pod. Now inside the pod you are seeing two interfaces. One which is carrying your management traffic onto from the default network and the second which is carrying the customer traffic, the rest of the traffic is going all via the second interface. Now these red soda pod and blue soda pod are running a trusted code so they can be process isolated and then there are the security policies we can apply to make sure that they're multi-tent, but if they are untrusted then they can be provisioned, isolated via QITA, Cata, Hyper-V or G-Visor. Later in the talk we will also talk about how we have used device plugins to make this special worker nodes and these additional interfaces as a first-class resource for this CUBE scheduler. Deep diving into the details of the CNI extensions. So we have leveraged the CRD-based approach. So hosted services, they will deploy CRD, let's call it a pod network CRD. It defines what my pod customer network looks like, what's my VNet VPC information, the subnet I'm going to be using to inject, to provision the IPs. Then a pod event, the customer pod gets deployed. It is labeled with that CRD and then referencing that CRD gives a signal to CNI to do the extra work. And not star is we have, we will be aligning with then ongoing KEP, which is ongoing KEP for the multi-network. So extending onto it, this ith one, which is provision into the customer network. So the R cloud, which is hosting the core networking services, provisions injects this Nick into the customer network. But let me deep dive into what's exactly happening inside the guest, inside the VM by the CNI. So when the multi-network pod gets scheduled, CUBE it invokes the CNI. CNI learns this information from the CUBE API about that this pod is labeled with the pod network CRD. So it requires an additional in Nick. It also gets the information from the CUBE API that which of the available interface on the node is associated with this Nick. Eventually CNI moves that Nick, projects that Nick into the pod. And now my pod has two necks, et zero and et one. Going little deep dive into the blue soda pod, what you see inside, you will see two necks. One is your in front Nick. And then another is your management. And another is your customer Nick. And then the additional routes, which are provision, where like management traffic, like a pod traffic, or service traffic, all traffic will go via it zero. And all your rest of the traffic will go via ith one. Going a little deep further, how our CRDs will look like. So we have defined a pod net on the left. We have defined a pod network CRD, where we have defined the vNet and the submit constructs. And then on the right, once you label your pod with that CRD, you will have a multi-network pod. So now let's talk about the CUBE scheduler part. So how would a CUBE scheduler know that this is my special node, which is coming up with additional necks? Not all node in my Kubernetes Nick, I want to attach multi-network pod. I want to secure my system node pools. So there are only some node pools. I have additional network policies where I can host these multi-network pods. So for that, we have achieved it by using device plugins. So with the device plugin, we can extend the node capability. Device plugins registers these additional interfaces as a resource and make them as a first class. So they learn about the available interfaces on the node, pass it that information to the CUBE scheduler. Now a pod comes up requesting for one of the additional nick as a multi-network interface. And then CUBE let will pass that information to the device plugin and device plugin will do its magic, reserve the pod, allocates, and CNI will do its magic of extending that interface into the pod. So with that, I'm going to give the stage back to Arlen to see all of the work we've been doing into a real working demo. Thanks, Arlen. Next is to show you everything we talked about is real. So in the setup, we have an Azure Kubernetes Service Cluster, an AKS cluster that is running two instances of SodaDB application. One is corresponding to the Red Soda Company and one is corresponding to the Blue Soda Company. These pods, they both have two IP addresses. The ETH0, Nick, gets his IP address from the vNet and subnet associated with the AKS cluster. And then the ETH1 interfaces, they get their IP addresses from the customers vNet and subnet. So in this demo, we're going to show you that the VMs that are on the customer side in the projects and the customer subscription can only access the pod and establish provision for that customer and nothing else. All right. So now here I have a multi-tenant Azure Kubernetes Service Cluster. This AKS cluster has a single node pool. And this is also a special type of node pool. They call it multi-tenant node pool because it has multiple nicks. And in this instance, we have two nodes in this node pool. So I'm going to use one node per tenant for hosting the SodaDB pod. So the next step is for us to create our multi-tenant environment. So I'm going to create first two namespaces, one for each tenant. So you can see here, there is one namespace for the red Soda Company, one red namespace for the blue Soda Company. And then I'm going to create the pod network custom resources that Neal just talked about. So there is one for the red Soda Company and one for the blue Soda Company. And if you notice, these two pod networks, they reference different vNets and subNets, one corresponding to each tenant. Next, I'm going to create the pod network instance objects. So these are optional. And then I proceed by creating my deployments. So the first deployment corresponds to the red Soda Company. And here, if you notice, I'm using some special labels to associate this deployment, this pod, to the pod network that I created in the step above. And then I'm going to do the same for the blue Soda deployment. So I create a deployment for the blue SodaDB deployment and I associate it with the blue pod network. And one thing to note is that both SodaDB instances are listening on each one interface, the IP address that was coming from the tenant's environment. So I proceed by creating these YAMLs and objects. And now we should have two pods running in two different namespaces for each tenant. So within a few seconds, these pods should be up and running. So if you notice, there are two IP addresses here. These IP addresses correspond to each zero interface coming from the Kubernetes subnet, I mean, for AKS. Now, I'm going to show you the red SodaDB pod. Here you can see this pod is running and I'm going to exec into it and show you the two network interfaces that's within this pod. So the second interface, the ETH1 interface, that's the one that is connected to the customer's environment. And I'm going to use that IP address to populate some records in the SodaDB database. So in this case, I add four records, red classic, dyed red, sherry red, and I'm going to do the same thing with the blue SodaDB pod. Again, this pod also has two network interfaces and the second interface is used for connectivity to the client. So I also use this IP address to populate some records. So in this instance, I'm adding only three records for the blue SodaDB. Now, the next step is to show you the network connectivity between different environments. So we want to confirm that the red SodaDB cannot talk to the blue SodaDB instance. So I exec into the red Soda and I use the blue SodaDB's IP address and I can show you that they can't talk to each other. And I can show the same thing on the reverse path. So the blue Soda cannot talk to the red Soda pod. In the rest of the demo, I'm going to show you the connectivity that we're using in the customer's subscriptions. So for that, I switch over to the Azure portal. Here's the VM, red Soda client, in the customer's environment. And you can see the IP address for this VM. So now I'm going to show you this VM, can actually talk to the red SodaDB instance using its ETH1 IP address. So I use Curl. I'm going to retrieve record zero and you can see red classic was retrieved. Next, I'm going to show you that this VM cannot talk to the blue SodaDB instance that is running on the same cluster. And you can see it can't because there's no route to that SodaDB instance. And you know the same thing works on the blue Soda client VM. The blue Soda client VM can talk to the blue SodaDB instance. But it cannot talk to the red SodaDB instance. Here is what Arlen is showing, that this is all connectivity is over private IP. There is no public access. There is no load balancers in front of it. There is no express it out gateways. So it's all about on a direct private connectivity. So it's all, it's more secure than ever. Alright, so that concludes our demo. And with that said, we'll be happy to answer any questions you have. Hello. Thanks for the talk. I wanted to ask you, you showed the scenario where the pod has each separate network interface device. So if I run five pods on a single node within the same customer network, they do get five network interfaces. Am I correct? So yes, for now the implementation has, it will get five different interfaces onto the node. But that's something you can extrapolate as well. So there, you could do both. You can have single interface and that can be projected into the five customer pods pointing to the same neck. Or you can, the implementation, what we did the case study was we had a dedicated neck to a dedicated multi-tent network pod. But the one you're asking is it can be achieved. You can have a dedicated neck for each different network and can multiplex into the pods running on the same, same node. Oh, I probably like to have the other way around to just have a single network interface. We do. I mean, that's our default implementation. Oh, okay. Thank you. And does the cap also support this? Because from what you have shown, there is some kind of limits and the number of nicks that you require per pod, if I understood correctly. So the implementation what we had is an influence from the cap. The cap, what cap is suggesting is to have these, the CRDs, what we have created as the first class, first class spec inside the pod spec. So as of now, I'm not sure if there are any limits, but I can get back on that. Thank you. Thank you for the talk. One question I have is the choice of the network interface you want to connect is based on the labels and the podge that we put if I saw correctly. And how do you stop anyone to, if I am the blue soda to connect to the red soda network interface just by changing my labels if I want to. Sorry, could you please extend the last part? How will you change? If I'm the blue soda company and I want to change my labels and let's say I want to change the label pods to red soda company, how do you check that it's not possible today except with... So I'm going back to the slide. So you're saying how, if I change something in my red soda pod to access to the blue? On the slide we have the customer resource definition when you talk about it. So the customer resource definition is managed by the hosted service. So the security aspect is it's not owned by the customer itself. The hosted service platform which is hosting this Kubernetes cluster is managing the CRDs. And Arlen, you want to add how we will use we are going to secure? So basically, for example, NetApp as a service provider we are the entity that provision these storage pods or blue soda, red soda instances. And we manage the network policies for them. So when a customer comes to NetApp we coordinate with them as far as what subnets they want to use for mounting their volumes. And we have this contract between them and we as a service provider we define the pod networks for them so we restrict access across different tenants. But we ensure that each tenant can securely and privately access only their pods and nothing else. So it's the responsibility of the service provider to manage the pod network objects. And enforce multi-tenants. Thank you. There are two questions here on the front. Hi. I've seen multi-network interface pods with MULTUS in Kubert. Is this using MULTUS as well? No. We have extended our own Azure CNI. The reason MULTUS is attached to a different CRD called network attachment definition. And like I said, our influence was the multi-network gap. We eventually want this pod network CRD to be the first class resource into the Kubernetes ecosystem. And that's why we went with our own Azure CNI. What are the limitations in terms of network interface counts? I know certain SKUs in Azure can only have four network interfaces and stuff like that. What are the limits of that? So Azure today owns up to eight. Eight next on a given VM. But that's an ongoing work to extend that. Yeah, I would also add that the novel thing about this implementation is not so much multiple NICs per pod, but rather is how these NICs are connected to hyperscader network constructs like VPCs and VNS that are extending to customary environment. Hey, good morning. This is Asant from American Airlines Platform Engineering. So we are not using any network plugins. We are hosting like thousands of apps on our platform. But maybe in the future if you wanted to move towards that route, is there any limitation on your side when it comes to logical separation or isolation of the CIDRs or ETH 01 or 02? And let's say if I have a client and they have like a SaaS services running on my platform and even within the same name space they want to isolate the network for the pod security concerns. Is that possible with your network plugin? Or NIC plugins? Sorry. So yeah, so obviously different CNA plugins they have different implementations so there is no one silver bullet or one way of doing things. In this instance as far as isolation within a host this is done through network namespaces and I showed you an example of that how that was set up with the next network namespaces so that provides isolation within a host and many CNA plugins like Calico that once provided by Azure, Google and they all worked pretty much the same way by setting up these network namespaces. So that's isolation within a given node and then as far as isolation beyond the node there are other solutions that we talked about like network policies and service mesh. Thank you for the talk. Just one question about IP address management. I'm guessing the pod IPs are not stable. Do I have any chance to integrate like external DNS or something via service or something to have a reliable networking connectivity? Yeah, so actually one cool thing about this implementation is that we can also have the option of having stable IP addresses per pod and that's actually what NetApp wanted because as pods move around between nodes their IP addresses change and for storage services it's very important to keep the IP address persistent because every time it moves you have to do a remount. So one thing we didn't show quite here is that you can actually using the pod network instance custom resource that I didn't talk too much about you can actually associate a fixed IP address to a given pod and as the pod moves around the IP address goes with it. So that was another thing that was quite different than other applications. Yeah, adding on to what Arlen said great question we do have the capability to extend and not make these additional interface IPs as fmrl so that means we statically bind that with the multi network pod. I think that calls for all the questions. Thank you very much.