 Hello everyone. This is the talk on federation in Kubernetes. I'm Dmitry Mishin. I work for University of California, San Diego. This talk will have two parts. In part one, I will describe our Kubernetes cluster, how we use admiralty, how our project came to the current state it's in. In the second part, Adrian, the creator of admiralty, will describe how admiralty is working. Also, we will have the demo on how admiralty works. Our project is called PRP, Pacific Research Platform, and it grew from a project measuring network performance. The initial problem was that University of California, San Diego, had multiple campuses connected with what's called Science DMZ. Scientific network, 10 to 100 gigabits per second, so pretty fast. Sometimes, some segments of the network doesn't perform the way they're supposed to. PRP was measuring network performance by sending test data between those locations. That was done manually and at some point, there were just more nodes that PRP could handle manually. There was an idea to deploy Kubernetes on top so that Kubernetes would orchestrate those measurement software. Once we deployed Kubernetes, we realized that now we have a cluster of nodes. We just occasionally send data, but mostly sitting idle. We started providing those hardware resources to scientists to just do their computations. We saw exponential growth of our cluster because it was really easy to onboard our system, bring in the container, run it, and get free access to the resources for our scientific use. We called this cluster Noitivos. At some point, more projects started donating hardware to us. We partnered with Internet2, Open Science Grid donated some nodes. More networks were coming to us. This is how now it looks in the US. We have a scenic network in California providing us the internet access. There is a bunch of regional networks that also donate hardware to our cluster. Basically, we have a national Kubernetes cluster aggregating a bunch of computing resources. At some point, we started bringing in GPU nodes, storage nodes, and so on. Now our cluster is pretty big. At some point, it started growing internationally. In international scale, we have nodes in Europe, in Asian Pacific region. We have nodes in Singapore, Korea, Guam, Australia. If we're talking about nodes themselves, those are called fuel-in-the-nodes, flash-Io network appliances. Those nodes are optimized for fast networks, so they can actually leverage those speeds. Some nodes can fit up to eight gaming GPUs. Some nodes have a bunch of hard disks, which we use for storage for our self-cluster. When we talk about federation, our current state is that we have our PRP no-tools cluster, national for US, and some international nodes. Now it has 7,000 CPU cores, more than 500 GPUs, 2.5 petabytes of self-storage. Additional to that, we deployed a couple of smaller clusters. We have a development K3S ARM cluster. We have development Windows Kubernetes cluster. Agonis GPU cluster, that's for gaming and 3D visualization. And we spawn to AWS cloud. So all those clusters are separate. It's really hard to integrate all that hardware in one. And we need a federation to work with them. Also, we partnered with several other smaller clusters, and there is a federation already established between those locations. So basically, all clusters can finally talk to each other, but are controlled by different organizations and different people. That's the huge benefit of admiralty because all our other projects are seeing federation as one cluster controlling others. It's not one level federation between them. And what we are looking forward to is when expense supercomputer is deployed, it's also going to have the Kubernetes cluster as a part of it. And admiralty will be used to federate expense with notos and other clusters. And that will bring theoretical expansion to 100,000 cores. Of course, we will not get all of them, but that's a pretty big increase in what we federate with. So how can we use resources between those clusters? As I said, most of these clusters belong to different organizations who want to set their own rules. So federation cannot dictate policies for running stuff in different clusters. And scientists who use those clusters usually have access to only one or several namespaces. They, again, don't control the whole cluster. And so federation should allow working on namespaces level when people don't have to bug the admin to federate, but they can establish federations between the piece of the cluster that they control. Additional to those small clusters, which are on-prem, as I said, we federate with clouds. Amazon, Microsoft Azure, Google Cloud. If we need some resources from those clusters, if we have some big workloads or we want specialized hardware, we can always federate with those by simply running some temporary cluster in those and then establishing the federation. So how do we use federation now? GitLab CI is what we started with. It's super convenient when we need to build a container on some specialized hardware, like ARM or Windows, instead of manually creating a GitLab runner somewhere else or trying to join this node in our big cluster, we just federate. And that will be in the demo. So now we federate with Windows and ARM. And our GitLab, by just tagging the job, can build the container in some other cluster. And this container will be stored in GitLab registry. Network monitoring, right now we have a bunch of monitoring ports in our large cluster, but the goal is that any other cluster that wants to have automated monitoring can just join our federation and get all the monitoring pods. Jobs bursting, if some cluster has some unused resources and another one is overloaded, it's always useful to just burst another cluster and clusters can share their computing resources without actually reattaching nodes. So that's super convenient. Medical data uses a big one. Some data is highly protected. It cannot leave some cluster, but this data can be used for some computation. And then results of this computation can be shared when it's anonymized. So federation again can let you spawn your computing job in another cluster, get some product, anonymize it, get it back. So this is super convenient. Special devices, internet of things. Some devices are really small and tiny and they can just join a big cluster and just monitoring pods will kill it. So again, this is super useful to federate and some IoT devices are again controlled by some other people and federation helps establish those connections without getting or requiring access to the specialized hardware. Future of federation, we are now working on the project that will allow us to create on-demand layer to fast pass around the world. So that's a project we surfnet. It's called MSA Auto Goal. And this project already can create a layer to connections and tear them down on request. And what we are writing now is a special CRD and operator which will allow pods that spawn in some remote location to create a path, for example, to storage in some other location. And this path will exist while the pod is running and then this path will disappear. And federation will help us to run this across several Kubernetes clusters. So if some cluster wants to talk to another cluster, it just creates a layer to connection and that will be controlled by standard Kubernetes API. So this is a really great project that we're working on. This is a demo on federation. In this demo, I will show how federation works in our production cluster. I will not go through details of establishing the connection. It's all covered in documentation and this is supposed to be the short demo. So now we're looking at the nodes in our production cluster and we see that the first node has the role of cluster and master. This is virtual node created by Admiralty. So basically this represents the whole remote cluster and federated pods will run in this node while the actual node is below it. In this demo, I will show how our GitLab can spawn pods in other clusters and get results from them. So here is the jobs table of our GitLab. These jobs are building container and putting it in GitLab registry. But while most projects are just building containers in our regular cluster, this project is setting up this architecture RM64 tag on the jobs which make them go to special runner in GitLab. Let's look at our GitLab namespace. So in this namespace, we will see that it's labeled with multi cluster scheduler enabled. This will tell Admiralty to look at this namespace and wait for pods that we mark as federated. If we go in this namespace, we will see several runners running already on regular nodes. If you look at deployments, there is already deployment that I created called GitLab runner federated. We can look at it. We will see that in template for pods, it has the multi cluster Admiralty IO slash elect annotation. So this again will tell Admiralty that pods from this deployment should be federated. So Admiralty will decide where to send them to. Let's scale this deployment to one. And we will see that this new pod is scheduled to run on this virtual node. So if you look at all pods, all of them are in our cluster, but this one pod is running somewhere else. We can look at our ARM cluster. Again, this is GitLab namespace, but this is now remote cluster federated with our large one. And this is the node this pod is running on. So we see that GitLab runner actually started on remote node. And in our large cluster, we only have the proxy for this pod. Let's switch back to our main cluster. So this runner already registered automatically in GitLab. And if we run one of the jobs, the runner will run it remotely. So if we go again to remote cluster, we'll see that new job started and it just finished. This is pretty quick. And the result of this job went into the local container registry in our GitLab. So this completed successfully. This allows GitLab to run runners in federated clusters without manually setting them up and control them from one single location. Hi, my name is Adrian. I'm the CEO of Admiralty. Admiralty is the company behind the open source project of the same name. Admiralty is what makes the decentralized federation presented by Admetry possible. It's a multi-cluster control plane that uses common Kubernetes extension patterns. This talk will focus on virtual kubelet and the scheduler framework. What is virtual kubelet? This is a screenshot from virtual kubelet website. So like the Kubernetes kubelet that runs on each node, a virtual kubelet instance presents itself as a virtual node, but instead of running containers on a local VM, it runs them on a remote system. And there are 11 known providers to this date, including Admiralty. In the case of Admiralty, virtual kubelets represent remote clusters. To help us understand the concepts that I'll explain later, let's consider this example use case where a user submits jobs in their own cluster, cluster A, but containers running other clusters, B and C. And so in cluster A, there are virtual nodes that represent the other clusters. The pods created by the Kubernetes job controller are mutated by Admiralty's admission controller into what we call proxy pods. We call them like that because they represent other pods, not because they're proxying the networking sense. Admiralty's proxy pods scheduler schedules those pods to the virtual nodes. It creates candidate pods in the other clusters. Some of those candidates become delegate pods that are bound to real nodes. Those real nodes could actually be other virtual nodes. You can imagine several levels of inception. Admiralty also includes a bunch of controllers to update pod statuses, make config maps, secrets, and other dependencies follow pods. Notice that cluster A needs to talk to clusters B and C and also different ways to do that in a minute. Let's go back to virtual kubelet and how Admiralty implements it. Virtual kubelet has four main responsibilities. The first responsibility is to register a node object. Admiralty creates virtual nodes based on user-created targets and cluster targets. Those are customer resource definitions. That basically give a name to a virtual node and refer to a secret that will be used to talk to the corresponding target cluster. The second responsibility is heartbeat. Kubernetes needs to know that a node is healthy at regular intervals. Otherwise, they'll evict the pods if the node is not responding. So the virtual kubelet in Admiralty checks the health of the target clusters and updates the condition of the node. The third and maybe the most important responsibility is to handle pods, so to run the actual containers somewhere. And in Admiralty, this is most of the logic with multi-cluster scheduling, butt status feedback, cluster garbage collection, and so on. And of course, there are some other features like handling logs requests and exec requests. And in Admiralty's case, it's very simple, we just forward those requests to the target clusters Kubernetes API. The three last rows require two configs to call the target clusters. So let's talk about that. In Admiralty, clusters are connected in the control plane in one-to-one relationships. And we say that a source and a target cluster are connected when controllers in the source cluster can call the Kubernetes API server of the target cluster. For that, we need three ingredients. Routing, authentication, and authorization. Routing may require a VPN or a tunnel if the cluster's on public or if they're in different VPCs. Authorization, the last one, is quite straightforward. He uses all back resources in the target cluster. And that's very important for Jimitry and his colleagues and partners. They want cluster admins to be in control of who can do what in their clusters. And authentication can be done using different methods. The nice thing about having one-to-one relationships or cluster connections is that you can build any kind of cluster topology with those. As long as it's a directed graph, it's valid. And so the first one that comes to mind is probably a management cluster talking to many workload clusters. Or you can do cloud bursting where a cluster is its own target. In the case of the research platform presented by Jimitry, we have a decentralized federation where there's no leader. I said I would talk about cross-cluster authentication. This could be a full talk just by itself. So I'll go very quickly. The simplest way to achieve it is to take the service account token from target cluster and export it and save it in the source cluster. You could do the same thing with the certificates API. Instead of a token, you would have certificate. The problem with those two methods is that you're using the target clusters as an identity provider and not the source cluster. So you need to distribute and rotate the secrets yourself. With other methods, you can use an identity provider available in the source cluster to get your tokens or certificates and present those to the target cluster. And that target cluster has to be able to recognize them. And so if you're in the cloud, you can use a Kubernetes service account in a source cluster, can impersonate like an AWS IAM role or Google Cloud service account. Azure service will have a name, workload identities or machine identities. And then use that to connect to the target cluster. If you're in control of the master nodes and you can change the API server flags, that's right, because then you can use Webhook token authentication, then again proxy. If you need to kind of a solution that a one size fits all is, you can use an impersonating proxy that uses Kubernetes impersonation to authenticate. And you can check out the Qubo IDC proxy project as a prime example, and Admiralty Cloud uses the same concept. Admiralty has two schedulers, the proxy scheduler and the candidate scheduler. Proxy scheduler handles proxy pods on the source cluster side and candidate scheduler handles the candidate pods created by the proxy scheduler. And they're both built upon the scheduler framework. The scheduler framework is great. It's a set of go language interfaces that allows you to build your own scheduler while retaining all the features of the standard Kubernetes scheduler and adding yours at various extension points in the scheduling cycle and the binding cycle. And so the dots here mark where our, Admiralty's two schedulers extend the Kubernetes scheduler. You see, for example, add some filters, like how to filter nodes, or you can wait before actually binding to a node. And this is useful in Admiralty's multi cluster scheduling algorithm because the two clusters talk to each other using annotations on pod chaperons. Let's see how this works. So in this sequence diagram, I have the source cluster on one side and various target clusters on the other side that have the same components with different timelines. When a source spot is created and annotated to use Admiralty as a multi cluster scheduler, it is mutated. We need to make a few changes to the pod, like a scheduler name, change the scheduling constraints so that the pod can tolerate the virtual node. We save the original scheduling constraints for later. We had a finalizer for cross cluster, garbage collection, different things. Check the documentation for details. While the proxy part is being scheduled and the virtual nodes that it tolerates are filtered, pod chaperons are created in the target clusters that correspond to those virtual nodes. The pod chaperon has two purposes. It is the vehicle of annotations. Both schedulers annotate the pod chaperon to communicate. If they annotated one or the other pod instead, that would invalidate the cache of the scheduler. So it's best to use another object for that. But also, maybe mostly and most important from a feature perspective, if the source to target cluster connection is interrupted and a pod has been running for a while, a candidate pod has been, sorry, a delegate pod has been running for a while, but is evicted because a node that runs on, a real node that runs on is being cordoned, for example, or stop responding. We need a way to recreate the pod. So we need a local controller, and that's the pod chaperon. Pod chaperon is just a pod template that creates a pod. All right, and that's about pod chaperon. So each pod chaperon creates a pod that looks exactly like the pod chaperon. And it includes the scheduling constraints that were saved in annotations on the proxy pod so that we can ensure that the intention of the user to schedule on a node that had GPU, for example, is met at the real node level and the target clusters. And so, for example, here only two of the target clusters out of three can find a node for the candidates, their respective candidates. So they annotate the pod chaperon, the proxy scheduler sees that, selects one of the two using some topology spread constraint or other proxy pod level scheduling constraints that you can add. And when the highest scoring node is selected, the pod chaperon in the corresponding target cluster is annotated again. This time to signal the candidate scheduler that it is allowed to bind the candidate pod. That becomes a delegate pod. The proxy scheduler sees that, finally binds the proxy pod. And the other candidate pods are deleted via the pod chaperons and garbage collection. In summary, it is possible to build a global Kubernetes cluster. You just need super fast networks and some custom built nodes. But even then, Retree colleagues and partners found reasons to use multiple clusters. The Federation around Nautilus has over 10,000 cores currently and will soon be expended by an order of magnitude with the addition of the expansive supercomputer. Federation uses Admiralty, which itself uses virtual cubelets and the scheduler framework. If you're interested in joining the research platform around Nautilus, contact them. And if you want to build your own Federation, check out Admiralty. Thank you.