 All right welcome everybody. Let's get started really quick. Hi, so my name is Joaquin Rodriguez I'm a software engineer at Microsoft and I'm here with Alessandro. Hi. Welcome. It's my first time as a speaker So please be be nice to me about the variation at solo.io Okay, so let's get started with a small intro. So why are we doing this so? It all started with punch cards, right and you know things have changed since then We start looking into terminals Linux, you know Up development was a lot different and then we reached Kubernetes, right and then we started doing the Like the pushing directly using Qtl apply, etc And then we reached get ops Yes, I'm just curious by a show of hands. How many of you have used get ops? Nice, okay. Now. What's next? We don't know yet. It could be open AI chat GPT We don't know yet, but it's kind of cool to be in a place that things are changing things are evolving and you know Yeah, so it's a lot of good things going on, right? So Just to recap with the historical context like I was saying, you know back in the day We're doing manual deployments. We're Doing Qtl apply, you know this at the time You know, it would be there'll be a lot of issues with it, right? Like, you know, you'll be a lot of human Errors and inconsistencies things will not scale up This leads into the development of CICD tools like Jenkins and it was good I mean, it's still good even these days, you know, a lot of people use tools like, you know Jenkins and workflows And it helped with you know consistency and speed and reliability, but it was not enough So now, you know, we have get ops. We have tools like Argo or flux both of them are great And yeah, so it's you know, you have a good centric approach for doing deployments and you know, it has collaboration versioning rollbacks, you know what what you see is what you get, you know doing auditing logging like it It's it's just really great so and also because now we have get ops, you know, there's a This evolve into having like an organization like open get ops To make sure that there's you know standards and practices and you know, it's like a community driven approach And yeah, no, this this is this is awesome. So Again, even with get ops, you might have some challenges Some of these challenges are related to scaling Observability and deployment automation and it seems like it's like a like a vicious cycle that the more you scale the more you Care about observability and then how do you automate that? So today we're gonna show a quick demo that, you know Proposes how we can scale things a lot easier and how to manage and organize our deployments So I want to pass it over to Alessandro. Yeah Yes, so No minions been art in the process of creating this presentation But we we use the the paradigm of you know, there's grew. There's a management Person the manager, right? So that the person is in charge The person that that controls the minions where the minions actually do the work, right? So no one no minions get dies, but unfortunately we do delete clusters So we don't kill minions, but we we do delete clusters that there's this Phobia of upgrading right so that this is what drive us we we've been working together in Microsoft By the way back in the day and we had a lot of these customers that we're really really afraid of upgrading things right upgrading clusters Deploying new versions on on existing cluster. So this is the drive here I so we we wanted to show that you can really implement the paradigm of immutable clusters treat them as Always the same always stamped out of a template And you never need to really update them because you can just replace them with new new clusters shiny new brain cluster ready to accept your Workloads and and serve your your your traffic to the world with with a clean slate So it's it's really something that was born out of our experience at Enterprise customers at Microsoft as well And I keep doing keep seeing this this this this fear and I as much as you can persuade Customers and engineers that that the upgrades are safe And there are many ways to to perform updates in a safe safe way still there's this this resistance against moving forward with the with an update. So let's see if we can actually Create have this idea of immutable immutable clusters. So how we do that? So there is a great project. There is actually cluster API Project update. I don't know if you've been there It's a project was born to actually solve the problem of declaratively Express the idea of a cluster. I so as you know in Kubernetes cluster is not Was not a list until a KPI Cluster API came came came around was not a First class citizen. There was no object described in the cluster. So it is the project filled the gap of creating and Understanding clusters as objects inside another Kubernetes cluster. So that's a that's what we choose to to create this cluster because it's a very popular it's of course like part of the Kubernetes Kubernetes ecosystem and can be used in many ways. There are many plugins Azure Google Cloud OpenStack AWS We will choose and show the cluster API for Azure, which is just the the flavor of cluster class API that That interacts with the Azure API and creates cluster in there, but nothing stops you or We will we can expand into into other clouds and we also We also are gonna show another cool tool also I've been talking about a lot of this very popular now. It's called V clusters So imagine you don't want or you don't have the resources to actually spin up a full Cluster in the cloud in a public cloud or on premise then you can use this Little big project called V cluster Which is very popular now. We love it and it gets you new In new ephemeral clusters clusters there they live inside the management class about very quickly And so you can fulfill more use cases where the provisioning time is is very is really a crucial parameter So we had a solution to the problem of scaling observability and in this case We really want to stress that we want to Show how can you monitor these ephemeral clusters this cluster that come up and they go down as they come as they go So cluster that don't leave Enough to accumulate enough metrics or they you know, if you know Prometheus project There's this idea of Thanos I car that will cash for two hours We really didn't want to do that We wanted to to have every cluster coming up and just immediately start to send Metrics, we don't talk about logs, but this is easiest extended to logs and traces as well So we want clusters that come up ship metrics to the central management cluster and and when they go away we can still See the the history observe these clusters Even if they are gone so sort of observing the ghost of cluster past Right, so a cluster there long gone or they are just being replaced by new one, but you can still trace them you can still Observe their their behavior because you want to know if they did their job if they if they were Healthy and why they were they were if they were not why so we live it so some Crucial technologies I will start then Pass it along so of course Azure one of the best cloud in the world It's just a just it's our playground where we can interact with the with the public cloud with the public cloud API and Perform and create clusters as we need and we're using Azure to create the main management cluster Yeah, so our management cluster our group cluster is is actually a KS cluster in Azure Argo CD, of course the the great reconciliator as I call it So it just does what it's supposed to do which is bringing the cluster Exactly or as close as possible to the desired state in describing in git So we of course we use github where we store the the state that the state of the cluster that we want All the state of the entire state the management cluster state with all the infrastructure components and actually the number and Properties of the workload clusters Just one thing and sure so We're using that track vector or the web server using engine X. We're using external DNS and we since our manager Yeah, so we We don't like knit.io. We like to have like real Meaningful DNS names. So of course we will have a other project in the ecosystem namely external DNS serve manager and Ingers of Ingers and genics to provide the the automatic discovery and and DNS names to to our application Observability I think most of you are familiar with this of course like a graphana as a dashboard system Prometheus to collect and to ship metrics out of the workload clusters And imagine class itself and Thanos to collect and to receive this metrics and store them Safely, yeah, so so just on that like basically in the group cluster cluster, right? Which is a magical cluster we have graphana and Thanos I'm from me too, but then on each worker cluster We can install Prometheus and is going to do a remote right back to Thanos that way we can reconcile all the Metrics in one location. Yeah and To complete the automation, of course, there's this this project which I start to love with the caverno. I describe it as a if this than that for your cluster, right? So and we had an Incredibly good experience where you really showed it the the power of open source So yesterday we were we were working on this policy to automatically create The argocd secret that will describe the cluster has been just been created and I couldn't make it work. And then I said, but a Caverno creators are just in the room, right? So and then I would like to give a thank you to Jim Bagvalia because he's one of the creator and maintainer of caverno just walk to him and he held me out It's amazing that that's why we are here. So we can just help each other So if he's in the room, I will like to thank him publicly And of course class API so class API is also a great project Being developed for for a few years now and it's really what enables the automation side of things. So right and Of course we clusters because they they are awesome and they are fast. They spin up quick. They They are effectively clusters like every other clusters that you can interact with especially notably is organization between the The the the services in the V clusters and the services in outside the class in the host cluster So that makes it so easy to to expose the V cluster services through the English in the host is just just Very cool. And I think they have a boot and and and of course the company behind is loft. Thank you for for creating such a great piece of software and and Demo time demo time. So, okay Do you want to start but you want to talk about the stack or you want to talk about it? Okay, so for the demo I'm gonna show you first how it is structured. Okay, so we have a repo and we will show that the link in a little bit zoom in right Any here we have Essentially we walk you through how how to set up this this stack, right? So so the first thing, you know, you have unique some prerequisites like Alessandro was saying we're using Azure AKS for our management cluster. You can use whatever you like use Google AWS. That's not a problem That's just what happens. What we're using for our demo, right? and Then the first thing is we set up the management cluster with Argo and you know, we provide some some information You know about like the subscription ID the location, etc. We create the cluster and then Here we install Argo, so The first thing we're gonna do we are going to initialize the management cluster with some tools and Those are under And they're get up sorry Get up so we have get ups and then we have a Management folder so all the tools that we're gonna install in the management cluster are gonna be in the management folder and What are we gonna deploy later to the minion clusters or the worker clusters are gonna be in the workload folder, right? So So we run that we initialize Argo And then we're gonna install some initial objects to get us started Anything else you want to add? Yeah, so this is a completely self-managed Management cluster, right? So We really don't want to do it. We had to resort to too few Impedative commands, but if it was for us, we would also automate the class API deployment We couldn't so far, but the idea is like you install the initial object and the rest is all Is all Argo CD doing his job, right? So so everything is under Management all the external DNS is all is automated. So yeah Yeah, so then the next the next step is we installed a cluster API specifically Capsi Provider for Azure in management cluster, right? and then Let me show you once that's done You can see here So we have basically everything installed All right, and that includes Grafana and Here in Grafana, you know, we have already preloaded with a few dashboards to talk about you know shows us about our cluster Precisely we want to focus on the IMDB app, which is one of the apps that we deployed as well This app is just a simple in-memory web server that you know Sorry in memory web application that has a database as well and it just Logs metrics into Prometheus and then we push those back into Thanos. So Here you can see that we already have preloaded a few clusters the VC clusters are the virtual the V clusters and in the worker 4 and worker 5 are Capsi clusters in AKS. So Yeah, so so notably when the clusters come up Kiverno synchronized the The secret that contains kube config of the cluster itself with argocd So there's we're gonna their references in there and we can look it up yourself So there is a argocd secret to describe clusters are in a special format. So that's why we use Kiverno to translate or you know like a Create a dependency one-to-one from from the Kube config created by cluster API or by the cluster to the argocd secret that contains the Cluster definition We also segregated everything in project. So we really one of the best practices in In argocd is the to never use the full project. So we we try as much as possible to segregate the the management cluster infrastructure the applications and the workload classes infrastructure including prometheus for example in different projects because that's that gives you more Flexibility and more separation and segregation of rules. Okay. So, excuse me. So what we're gonna do next we're gonna provision a new V cluster It'll be a little faster than if we were to preview Deploy a new AKS cluster from scratch. So, so let me show you how we will do that so here As you can see, this is an application set for Argo and it's nothing more than a list generator So here we define our V clusters. We have another one for Capsi as you can see here, but For V clusters if one up if I want to provision a new cluster I'm just gonna copy this line and we can I mean technically we could automated this But just for the simplicity of the demo. We just did it this way Maybe like a help chart or something but Okay, so I added my VC five cluster Typically you will want to you know, create a pull request and you know, he will approve it and then you know, we'll merge it But since we're doing a demo, I'm just gonna push straight into main so Push okay So now if I go to Argo This is gonna take a few minutes oops Seconds it will come up in a little bit So it's sinking right now Eventually you're gonna see here VC five If the demo go gods are good to us, okay Yeah, so now okay, so we have we have VC five so let's take a look at the pods This is crashing Not mine. It's okay. It's all good Okay, so in in a little bit you're gonna start the pods the worker pause starting to come up. It takes a little while So That it's it's sinking right now Yeah, so As soon as so of course this this application set creates a New application because we just we use the list generator. So the application comes up Installs we we don't use the K3s. We use the the proper K8S V cluster m chart So cause the m chart deploys everything properly and then And then creates a secret and then Kiverno will be the one responsible to to move or to translate the secret into The proper cluster. So it's still not there, of course, but as soon as this this This application is reconciled and this actually creating the closet takes really a few seconds, but then then Arcosy will be able to see the cluster recognize that the cluster needs to have certain Certain things to be applied to namely Prometheus namely the actual workload application And they will just happily deploy Deploy on top. So Of course, this is the most critical part. Yeah, of course But we we trust that it's gonna It'll come up a little bit. Yeah Well, this one you see the VC5 job completed. Yeah, so it's doing things. Yeah, so and then when Prometheus comes up He's gonna tag his own metrics ship them to To Thanos and then Thanos will recognize that the cluster label and Okay, easy workload Yeah, it's coming go forward the presentation. Yeah, you know, let's let's do that. Yeah So let me go back to the presentation and then we'll come back. Yes Okay So Few tricks we and We will we will help you if you if you need just open up a request or just reach out to us So use projects all the time don't ever use the full project We actually enable Argosy D in read only mode so you can actually go and See for yourself. You cannot do anything, but it's nice to have Argosy D in read only Because doing Operate on Argosy D through the portal is as bad as using kubectl for us at least Server-side apply is actually coming as a default But we found a lot of resources that are at PR if you use server-side apply Because sometimes you generate these CRDs and they are so big and the client-side apply will just blow up in your face So if you will see a lot of of the manifest of Argosy the application will have the server-side Strategy We did consider some alternatives. So there's a cluster API Elma don't which does the same thing crazy Applies a cluster add-on in this case could be a elm chart could be anything else To a cluster deploy via cluster API, but then, you know, it's not it's not very generic Mechanism is only for cluster API. We wanted to use also elm to deploy the clusters and so Yes, and we really wanted to just automate everything including the cluster API Installation we look into the copy operator. It's actually pretty promising, but we couldn't make it work fully Like we wanted but of course also the the manage AKS Support for cluster API for Azure is also experimental. So it's also moving very fast and changing. So there's some some Some more work to be done on that side and let's see. Can you go back? Okay, so it's all good now Let's see if this is working and you refresh the dashboard and we don't have that yet It's a little while Let's just probably the point for me too. So it's probably yeah, probably the point for me to us. You can see that we see See It is because they are The cluster so everything runs in the same class. So you can also see the pods In the week in the in the cluster. Yes. So this is Prometheus deployed It's been applied. We didn't do anything about it. There's no no monkey under the thing Installing Prometheus is actually our go CD targeting that we cluster and and the play whatever we need And also we use sync waves, of course, so it stores Prometheus first and then the application otherwise the application is a service monitor, of course, it will be Will be failing because they have no service monitor until you install Prometheus Of course, eventually we'll all reconcile, but we will lie. We don't like to see red dots and That stuff so this should not yeah, it's coming. It's come. I promise That's when he said six seconds ago. So by the time it gathers the metrics and as soon as Prometheus comes up Comes up starts labeling metrics and into Thanos Thanos aptly stores them I think the label is loaded only when you load the dashboard. So maybe Yes, let me Oh, you mean refresh the whole the browser We'll show you we'll go back to the presentation for a minute and then yeah Yeah, okay, so Some take us away take a ways, of course, like this is our stuff and we know because we've been spending some some Quality time troubleshooting a lot of these things But with the right tool you can do crazy things like a big clusters and clusters as a service and All this stuff is where practically impossible few years ago Rely on open source right on the people open source They are amazing. They will help you out if you if you just ask nicely And and the the next thing we want to do is also to split the configuration per cluster type, of course V clusters AKS or other types. They might need slightly different Configurations we don't do that. So we apply the same to everything But it's probably it's not gonna take long for us to to come up with a way to customize the The the the infrastructure and the workload per per cluster. So and of course The next thing we want to show maybe in a future Future talk is you actually have also dimension clusters be in the data plane, right? So using is to multiclass there or some form of API gateway or gateway to actually convey the traffic and then use the the minions that the workload clusters to To to send traffic to two different clusters and do like progress delivery or a be testing or something more Fencing and just I have a cluster and deploying so we are not doing anything with the data plane as of now But that's that's the future Future expansion of the tool It sounds good Yeah, and thanks for for a gym. That's this session from from last time and and is our Ripple and yeah drum roll drum roll Hey, there you go Okay, and then there's also some other relevant talks here at cubecon If you want to check out we lift those here Also, if you want to check out our booths for Microsoft and solo And if you have any feedback for our talk, this is the QR code. Thank you very much