 So, I'm Kiri. That's Vitaly. Thanks for coming. Yeah, we work for Oracle. Obligatory slide here. Another obligatory slide about Kubernetes preaching to the choir here. Kubernetes growing. This is from the survey from the spring. Production picking up. Really bringing up the topic of multi-cluster more and more, especially as production more clothes go in there. Take your pick. Probably one of these is one of the reasons why you're looking at multi-cluster as well. But generally the trend is going there. What happens when you do multi-cluster? Lots of challenges to look at. We try to look at a little bit from a different angle, a little bit earlier up. The decision stream, if you will, in terms of how you arrive to proper multi-cluster strategy. So, literally a stream of decisions that lead to certain environments where you have to fulfill. But basically the type of questions that come in usually, well, how many clusters do I run? What's the best practice there? Do I run cluster for every business unit? For every app? For every service? For environments? For all of these questions usually come up when you start looking at real live deployment of Kubernetes with multi-cluster, right? Then you get into, how big the cluster should be? When do I resize them? How fast do I make those decisions? What type of tech do I use to manage the life cycle of those clusters? When is that decision made? What providers do I use? If you're running on public or even private in terms of what type of tech stack you use. How to guard myself against some of these decisions in the future to avoid potential day two problems or whatever later down the when you try to make some decisions later. And finally, how do we deploy applications to those multi-clusters? How do we achieve higher availability of those applications? Some continuity and so forth. A lot of these questions come up in the organization. So, we sat down at Oracle a few months ago and started to look into this. And we kind of started looking in terms of how would we approach solving these problems both for Oracle and for our customers as well. But before we get into what our thoughts are on this, generally what an organization would do is kind of implement some kind of process. I'm sure this it's a very generic way of representing, but normally it would be some kind of organizational solution to those questions with a lot of processes and tools out there. You'd usually start with some kind of decision and design to containerize applications to use Kubernetes. From there you would meet and you'd do some kind of reviews about what's current, what's the latest, and those decisions then they're delegated to other teams like development and ops to be able to kind of execute upon those decisions. A lot of communication going back and forth ultimately it ends up with some work being done by the operations team usually to manage the clusters, development to deploy those clusters, picking their own solutions, getting some reports out of it, closing the loop basically you kind of end up organically growing into this. If you're good maybe sometimes you look forward and you strategically implement some kind of process but there's variations of this stuff obviously depending on your organization and the roles, whether you do DevOps or not. So there's quite a bit of decisions going on here and one of the key points we're doing here is people usually make a lot of these decisions ahead of time. Before you even get to deploying clusters you're already deep into a bunch of decisions being made. Usually as you go you kind of end up requiring people with certain skills, the better people you have, the better you're going to do this, some companies do it better, some do it worse. But in general what we looked at it it's really a process organization, people type of solution to those questions that we posed at the beginning. We kind of started looking at well there must be a better way. So we kind of decided to take a blank loop obviously no better way to indicate a blank loop than a blank slide I guess but literally we wanted something that can use some text, some software to be able to tackle some of those questions a little bit later. Given that we're committed to the Kubernetes ecosystem whatever the first thing you start looking at when you're talking multi-cluster is the Kubernetes Federation. How many of you are familiar with the Federation? How many of you have tried a Federation? So a few hands the Federation is there to do multi-cluster we looked at it and it's if you start going through the documentation in the setup you'll end up doing something like you install a control plane it's in its own control plane basically run separate from your clusters you start up a Federation basically normally what you would do is you would go through and add some clusters they call it join so generally you would have to go and figure out how to create the clusters and the Federation just allows you to join those clusters in the Federation once you have the clusters ready what you do is you deploy to the Federation control plane which is pretty much the same process you can use coop control it's very similar to deploying to a cluster but you're deploying to this sort of virtual layer which then you know delivers the necessary application to the multiple cluster that's kind of how you would end up at this point looking at multi-cluster right so if you look at this and you compare it to the type of questions that we posed in the beginning well it's kind of scratching the surface from some of them but a lot of them are kind of left unanswered if you will right so it may start giving you a little bit of an option of how to deploy apps to clusters but it does nowhere near in terms of making decisions about how many clusters should be out there how would the size of the cluster should be who is responsible for them none of that stuff is really covered with the Federation but it's a really good stepping stone we looked at it it's a separate control plane it's a very interesting piece of the puzzle so what we started thinking about it as well right now it's a little too late we already built the clusters right if you wanted to tackle those problems we wanted to figure out well what if we remove the clusters has anybody tried this in the Federation deploying an app without having any clusters in it no what do you think it's going to happen right so the Federation has its own HCD store has its own sort of desired state and all of that stuff so what's going to happen is the application is going to go in there and it's just not going to be scheduled because it's akin to not having nodes in a cluster right so you're going to say well it's useless but it's not useless what you just did you just captured the desired state right so we said that's a great starting point for us right we have a desired state now what we can do is we have an input that we can use programmatically to answer those questions so if I have an actual deployment I basically have a programmatically way to describe the demand for my developers right what I need to do is figure out how to add something that will derive the answers that I wanted it's like if I know what they're trying to deploy I may be able to figure out what clusters do I need how big should they be and all that stuff so what we did is we created a functional component that sits next to the federation we're calling it Novarkos at this point Novarkos being Greek for admiral so what Novarkos is is a controller that sits within federation control plane that specifically listens to your deployments and it will start making decisions about what clusters you need in the whole world right so what does that mean and obviously without going into too much details we'll cover a little bit later about the details but that's kind of the general direction change if you will right what we did in addition to that we added one extra component to be able to offload Novarkos for some of the low level sort of piping in terms of managing the cluster so we created another component called cluster manager so really what happens is Novarkos listens to the demand makes decisions of what supply you need and that supply then gets orchestrated by what we call the cluster manager it's actually pretty simple wrapper around things like cops and we implemented a wrapper around the Oracle container engine so what it's doing is essentially acting as the human who used to sit doing the cops command lines right now the Novarkos emulates that human basically so what you end up with the clusters get created and the application gets deployed so but what we really ended up with is having this control plane where we can use software to tackle a lot of those questions and that's where we at right now so we're gonna get into a little bit of the advantages of doing that but what we really call this thing is an application aware infrastructure ideally it's all about the developers right since the app is the king here right everything else should really be contextually based on the app so at this point what we're trying to do is essentially kind of open up that control plane to be able to very dynamically procure adjust infrastructure based on the application needs we're gonna do a really quick demo Vitaly is gonna do a demo here we're gonna come back and talk a little bit more about the advantages hi can you hear me okay okay so with demo here's what we're gonna do so as Kira already mentioned Novarkos runs alongside with federation so here I already have my Kubernetes cluster up and running with federation parts provisioned you can see there is two parts for federation api server and kube controller manager and these are a couple parts for Novarkos namely Novarkos cluster manager itself and a little helper part there that I will explain later on so now let's start with configuring our supplies and now I have clean federation with nothing configured there and I'm going to load the cluster configurations objects into the federation so right now Novarkos supports two cloud providers one is Oracle Kubernetes engine and AWS obviously so let's take a look on this other configuration YAML's definition so basically the generic federation cluster although with some additional config attributes that are required by the cloud provider so standard stuff like availability zones working on shapes, workers per availability zone and so forth so that's what it looks like for Oracle and these are for AWS pretty much the same thing just a little bit more specific attributes AWS related with zones, obliques, hash keys and what not and as you can see all of those configurations are put here as an annotations so the key point here is to take a look at the lifecycle state of this configuration which is offline and so we're going to load these configuration objects into our federation so again I'm not joining any live clusters into the federation I'm just configuring a placeholders that if demand will arise will get provisioned and the cluster will become available for the federation to deploy to so now we got three clusters loaded into the system and they're all in an offline state nothing is going on there so next step would be to create a demand we're going to have to consume those resources so this is a simple deployment I configure the sample to request 50 replicas it's the image on a public docker repo it's a simple Node.js app that I just want to go ahead and deploy into my clusters so let's take a look at what's going on with our cloud providers this is Oracle Kubernetes engine, nothing is running there no clusters on IWS is to nothing is running, no instances, the same thing with AWS EC2 West, there's nothing there so let's go ahead and create this deployment in federation first I need to create a namespace and then I'm going to create this dummy deployment within this namespace so we can see the deployment is created with desired number of replicas as 50 but nothing is currently running but the market already detected that I've got a demand and it's going through my available supplies I see that I have three configuration items that I can start provisioning and as you could see already the Oracle cluster got provisioning EC2 instances got instantiated on the east and on the west coast and the system will start bringing up the Kubernetes clusters when they become available the federation will start reconciliation and it will deploy this simple app into all of those clusters that at this point will be in a live state and available for the federation to deploy so again as you could see this is two different zones in a public cloud provider, two different providers and the deployment will get deployed to all three of them so this is a little dashboard that we created on top of the federation we have a little controller run alongside that streams the federation events into the elastic search and this gives us a good possibility to visualize what's going on with federation now as you can see EC2 clusters already got provisioned and Nowarkas deployed 25 parts in each one of them and the Oracle cluster just joined the federation and as soon as the system will detect that it's up and running, there is enough capacity there it will redistribute the parts across all three of them so the system constantly monitoring the state of the clusters and available capacity on those clusters and based on this data it actually schedule the application replicas accordingly so it's like one level up scheduling the same thing the Kubernetes does across the worker nodes Nowarkas does on the cluster level so as soon as it will become available let's see it's coming up and now we have three clusters and Nowarkas deployed this application even a little 17 parts 17 EC2, 17 in Oracle and 17 Vest2 so that's what we call the Kubernetes cluster and again if I drop down and I delete this deployment and I release resources all of those clusters will get shut down and then they will wait until the next demand will come in place that's all handed over back to Kira so we do have actually one more shorter demo later we're going to try to show a bursting use case if we get enough time but I wanted to cover really conceptually so what does this mean like I mean that was kind of an explanation of how it works but I want to go back and sort of start exploring really the benefits of the advantages of taking this type of approach so one of the reasons we actually did it originally with a federation is to provide some consistency on the clusters we realize that if you don't have control over who's building the clusters you really don't have much control over I mean does it run ingress or not does it run what type of networking does it gets really complicated from a federation perspective it's like harding cats at that point right so for our solution was well in order for us to run a successful multi cluster deployment whatever we have to gain more control over the capabilities of those clusters and obviously since we're provisioning the clusters we can ensure that they have right or similar capabilities one way to do it is to kind of wait for conformance and all of that stuff but since we have control we can the consistency is key there you can run the same or your right kind of ingress your right kind of add-ons the right kind of CNI or whatever it is you have full control over that so that it's you know it matches what you really need to do for your applications right manageability if you notice one thing happens is once you start treating the clusters you're essentially going from treating the clusters to treating them as cattle it become very ephemeral right and that actually creates a lot more manageability in my opinion because now you can start doing upgrades you can move things much much easier than before where you kind of treated your clusters as you know your little you know static assets that you wanted to use right obviously portability of the application also follows because as you're moving clusters you can start moving applications a little bit faster with this stuff so that it gives you more portability of your application cost is a huge thing right so because we're doing just in time cluster we're essentially constantly optimizing the available well allocated infrastructure that you pay for to match exactly what the current demand is this is important in a public cloud but you'll see that there's actually various type of use cases for cost and we'll cover and that actually do it in on a private cloud as well you can share much more of the infrastructure and provision clusters this is anality for some of your apps and kind of sharing the same underlying infrastructure in a private cloud which ultimately will save you cost because you don't have to run as much infrastructure to achieve the optimal workload sort of consumption right compliance one of the things we realize that once we wedge that control plane in between the actual infrastructure and so it's kind of like we put something in between dev and ops and one of the huge things that you can take advantage of there is for things like governance and compliance I mean you can do significant amount of things in that layer before your applications hit the actual infrastructure right so if you start thinking about security compliance or even just budgets and things like that and I'll cover some of the use cases later but it can improve there's tons of compliance use cases that can take advantage of that pattern if you will right one of the things for example auto scaling we've implemented scale up scale down in that stuff but we're doing it right into the federation control plane using abstract objects what that gives you is the ability to scale up and down regardless of who your provider is which gives you a lot more multi cloud compatibility because all the business logic or operational logic that you're putting in the system it's not really specific to the to any provider global scale obviously when you want to light up certain regions certain components and all of that stuff it should be fairly easy to just kind of extend because everything should really work at that point is just a matter of declaring the right policy and the right selection criteria there when you get to that from that central control player and obviously the ability to do private public a little bit easier so a lot of these like you know there is kind of like it's almost like a platform at that point that you can keep working on and adding more use cases so if you look at some example use cases that we can solve obviously doing a continuous delivery of an application to multiple regions and I know we didn't talk too much about how the application has actually configured in that multi region we've done a bit of a development on adding additional sort of federated ingress controls and stuff we'll cover that a little bit as well but that's the sort of the primary use case to be able to deploy apps to multiple regions and kind of be able to kind of scale up and down anytime you want bursting and I don't mean actually bursting just in a classical single app going from private to public there's all sorts of interesting use cases that we can do with bursting by prioritizing the cluster for example as I said in a private cloud you can have a set of dedicated infrastructure for certain business units and a set of shared infrastructure for anybody to burst on so you should be able to run your base workload and only when you need it's going to burst to a set of clusters with a lower priority that are shared that way you can more effectively share private infrastructure for multiple business units or on Amazon if you want to use spot instances versus dedicated instances so you go on the reserve they're there but then you can burst on spot for example so or the demo we're going to do is potentially if the application allows depending on latency and data affinity you should be able to burst from private to public as well so all of that stuff is an option because your control plane does allow you to do that sort of like baked in obviously we talked about manageability the ability to do you know Kubernetes cadence releases everybody's stuck with doing upgrades well when the clusters are very ephemeral you can inline upgrades you can still do them probably depending on what your underlying tech you use to be able to do but major upgrades could become a problem this allows you to simply junk a cluster replace it with another cluster without even telling anybody in development basically continuous integration with sort of temporary clusters you know you deploy the app the app deploys the cluster you do an end-to-end test you undeploy the app it shuts down the cluster that's a use case too and as I said as we realize that we have this sort of limbo control plane that sits in between we can we can start looking at for example service meshes if you want to do inject things on a global scale you can use that control plane because you have visibility for the global the sort of the global desired state if you will we can start using it for security to insert network policies to insert service mesh rules and all of this stuff is possible we've done some pilots these are kind of not in something we have implemented yet but a possible use cases that we can take advantage of also managing budgets you know if you're using multiple providers then you get stuck with well how much capacity they give each team in each provider this kind of allows you to potentially centrally control some of that stuff because it's it governs the usage of the underlying infrastructure right so I think tell you want to show them the bursting yes so for this demo I'm trying to simulate the use case when you have your private cloud with cluster running in it and you want to burst out into the public cloud so I had a federation running locally on my laptop on minicube and I have another cluster running again on my laptop on minicube and I joined this cluster to my federation to be a managed cluster and I call it this my little private cloud and on top of that I registered my burst out cluster object I configured it to be used in Oracle Kubernetes engine and usually when ever since cool around locally my Oracle cluster is offline I don't touch it so let's take a look at how it works so as I said I have two clusters here burst out okay Oracle Kubernetes engine is an offline MQ1 is in ready state and let's take a look a little bit in more details how these clusters are defined so as I said this one is a Oracle one with a standard provider configuration and then the key difference here is this new cluster priority attribute that is for Oracle set to two and for minicube one the priority is one and this has a little this has limitation on number of allocatable pods set to 30 and the use pod is already 25 because I already deployed my app here and it's already used up some of the capacity so again back to little dashboard here we see that pretty much 100% of this deployment is resizing MQ1 which is my private cluster and my burst out or key cluster is offline so what I'm going to do here now is I will okay yeah I just want to demonstrate that this deployment have desired replica 20 and what I'm going to do is I'm going to scale it up to 14 and let's see what the system is going to do in this case so undo the cube cattle scale and put the replica equal to 40 all right so it's scaled now and we can see that it's already set yeah number of desired replicas is 40 and current is 20 but what happened in our office already detected it that requested number of pods is over capacity of my private cloud so it went ahead and start provisioning my burst out cluster in the Oracle cloud so as soon as we will become available what is going to do is going to fill up all of the available capacity of my private cloud and whatever remaining pods left is going to burst out into my public cloud cluster so we can see like okay my Oracle cluster is up and running and if we go back here you see the distribution so the blue one is whatever the left number of pods got burst out into the public cloud and it's used up all of the capacity on my mini cube so now the most important part of all of this burst out stuff is shrinking back and how the resources will get released while you know your peak time is over already so again what I'm going to do is I'm going to just scale my deployment from 40 replicas back to 20 and we'll see how the system will react to that all right so it's scaled if you go back here you see now all the 20 replicas are back on my mini cube and on my burst out cluster I have zero pods running now the system again monitors that okay my burst out cluster will become idle so pretty much no user pods running there and within a idle TTL time if nothing goes going on there it will pretty much shut it down because we don't need it anymore and that's basically what we're going to see here yeah so as you can see this cluster already being terminated because it's idle and we don't need these resources anymore so that's pretty much it on burst out how come he gets the applause all the time just one more slide we're done we've got five minutes for question we open sourced this stuff yesterday currently it's on the Oracle GitHub account there's actually the primary component there the cluster manager supporting component we also made some additional enhancements and changes we have a federated ingress controller that uses DNS as a back end this has to do with kind of managing the application a little better in multi cluster that was not focused today but we got some more code out there that we didn't cover today we are going to work with the SIG multi cluster to see what if we can kind of make this an incubator project we're just kind of starting those discussions right now but that's pretty much it any questions yeah great question so the question was like the underlying problem of the applications running into multiple providers with stateful sets yeah so and I kind of hinted a little bit on that obviously the runtime needs to support that now what I will tell you right now is that we're working on potentially doing a full stateful set global application now the federation has still hasn't released the stateful sets yet so we're kind of trying we've done some pilots where we can link up the clusters using Calico where we run like direct so some of the applications that are stateful especially running your stateful set as a data application within Kubernetes you'll be able to use the sharded application so essentially it's going to start creating like let's say you're running Cassandra or Kafka or something like that right it's going to start sort of bringing its own data as replicas into the other providers to be able to it's not supported right now but it's coming along and obviously as I pointed out the bursting from private to public in a single application it's a kind of complicated use case but the bursting as I mentioned it within the same cluster to be able to share some of the local infrastructure either in a public or private provider it's a valid use case that you can take advantage of right now while we're waiting for sort of better support you know for stateful apps across a hybrid but yes I mean that that's an underlying problem that you can solve with a control plane on top of it right you have to be able to weaken what we want to do is enable compatibility if the application supports that that your control plane will support that but it's a little bit of a chicken in the egg right so one thing if your application supports it and you're still stuck with well how the heck am I going to manage all this stuff right this gives you the opportunity to manage it once it's ready I mean it's not a perfect answer but yeah go ahead that's correct the federated control plane is it's got its own it uses the same APM machinery state reconciliation as the primary cluster right now the way they distributed it actually kind of needs its own host cluster but it's out completely up to you on how you run the federation control plane itself with Navarcos now we've been doing a lot of work on trying to create a highly available federated control plane because it's kind of become the critical point so we're probably going to come up with some solutions soon that you look into it but yes it's its own control plane at this point and you know obviously I would expect that at some point you know either you can run that control plane on your own with some high availability tools will potentially rely on some providers or something to give you that control plane but it's completely separate from the clusters right well I mean it's you can create that control plane on your own because it just requires like a special host cluster so you can deploy it as a regular Kubernetes app obviously you have to manage the federated control plane yourself and it gets into a bigger discussion about you know well how do I manage a highly available federation so you get into discussion about using federation to manage federation what not again we have some ways to do it to do it and we'll hopefully it'll come but you can either choose to run your own federation control plane you can choose whether you want it containerized or not and which providers you want to run the control plane whether you want multiple provider redundancy for that control plane but again that has nothing to do with all the other runtime clusters that you use for your application so it's kind of yeah it's essentially a it's control plane only right there's no worker nodes or anything in that control plane all it's doing it's storing desired state it doesn't run anything on its own it just controls the other clusters so it's pretty lightweight from that aspect one more question yeah so great question right now we're kind of so yes their first class objects we're currently borrowing the cluster spec definition that the federation came up with which is not really sort of a resource first class it only describes the clusters that were pre-built but we kind of flipped it so we are using the cluster spec as a source to manage the clusters a new working group right now is trying to declare cluster APIs and new specs about managing clusters and what not but your second part of the question if I understood it correctly is that the complexity of kind of making some policy based decision about which apps needs to run on what clusters even if it has to build the clusters yes I mean there's a feature in federation called cluster selector that will give you some basic sort of segmentation but obviously the segmentation is not at this point limited to what's actually running if the segmentation chooses to start using some clusters that are kind of idle or offline it'll start bringing them up and whatever so you do have some capabilities to start sort of like label based selector type of decisions so you can you can have a bunch of clusters with a label production or QA or west or east or you can come up with whatever labels you want you can have your developers bring their constraints basically saying I want an app I wanted to run in Europe I wanted to be secure I wanted to be whatever but they're never picking clusters all they're doing is they're giving you conditions that need to be satisfied it's your responsibility when whatever on the Novarkos side that's kind of the whole point of this thing really it's to decouple that developer flow if you will from the actual infrastructure in a way of you know you have a query and you need to satisfy that query alright that's it thank you