 Hello, Rian. My name is Katie Gamangi and currently I am a senior field engineer at Apple. I joined Apple last year and in this role I am trying to bring Kubernetes and Cloud native expertise to different products and services within Apple. As well, I am one of the TOC or Technical Oversight Committee member for CNCF or Cloud Native Computing Foundation. In this role I am joining 10 of the champions within the industry and we try to provide a technical vision for the CNCF landscape. I have many other roles in the community, one of them being on the advisory board for captain which currently is an incubating CNCF project and I am the creator of the Cloud Native Fundamentals course. So this is actually a free course that you can find on Udacity but if you have anyone interested in pursuing a Cloud Native job I definitely recommend looking at this course as well. Today however I would like to talk about bare metal chronicles and more specifically the intertwinement between cluster API, Tinkerbell and GitOps. And to do so firstly I would like to introduce cluster API and this is pretty much we're going to look into a tool that provides a set of interfaces and standards to provision our infrastructure to different cloud providers. Next I'm going to focus on bare metal provisioning and here I'm going to introduce a tool which is called Tinkerbell. Now more importantly I'm going to focus on the combination between cluster API and Tinkerbell and this is going to be the KPT or a cluster API provider for Tinkerbell. And lastly to introduce some automation in all of this architecture I would like to look into tools such as GitOps such as Argo CDM flux and where we can further introduce automation and parameterization in our infrastructure. Now before I move forward how many of you here are familiar with cluster API? Have you heard about it? Using it in production maybe? Okay some hands. How many of you have heard about Tinkerbell and not the character? Okay one hand. And how many of you have heard about GitOps? Argo CDFlux using it in production? Okay that's really good. Awesome and how many of you are actually using bare metal or have on-prem data centers that you have to manage? Okay some of you. I hope by the end of this talk you're going to be more inspired to actually look into bare metal provisioning because it's not that scary as it might seem. Now there is a reason why I'm getting this talk at this moment. Now actually just kind of a TLDR kind of a note I do realize that the screens might not be very visible so I'll just try to go for everything very slowly to ensure that everyone has an understanding of what's going on. So if you don't understand something please let me know and I will go for it one more time. Now going back to the thread of the story there is a reason why I'm giving this talk at this moment because within the cloud native landscape we had multiple tools that cross the chasm. Now crossing the chasm means that we're going to have late adopters and these late adopters are described by the fact that they're very restricted and they should have a full ownership of their infrastructure. As such they look into you solutionizing how they can deploy and manage Kubernetes on bare metal. However the story was a bit different at the beginning to actually reach the point where we need to solve this problem we need to look a bit into the past more exactly nine years ago. Now if you look nine years ago the container orchestrator framework space was very heavily diversified. We had tools such as Docosworm, Apache Mesos, CoreOS fleet, Kubernetes and all of them provided a viable solution to run containers of scale. However Kubernetes took the lead in defining the principles of how to run containerized workloads. Nowadays Kubernetes is known for its portability and adaptability but more importantly for its approach towards declarative configuration and automation and we can see these in numbers as well. Based on the VMware Tanzer report state of Kubernetes 99% of the organizations see a clear benefit of using Kubernetes. The first one being a better resource utilization so CPU and memory and the second one being an east application management especially when you want to upgrade your application in the cluster. Now a metric which is going to be very important for this talk is that 52% of the organization still have a need to deploy the infrastructure on-prem. Very important this number is actually declining from last year. Last year it was 55% this year is 52 so we see a slow decline in the need for bare metal but that does not dismisses the fact that half of this organization still need to deploy on-prem so they need a solution for that and another metric I would like to draw your attention to is that 88% of the organizations manage more than six clusters. Now the story here is that it's very complicated and difficult to actually provision one cluster but once you do it one time you can easily replicate that across different environments. So you can have for example QA staging production and many more and this actually grows exponentially if you need to deploy to different availability zones so you replicate your stacks across the world. However the community and the adoption rate for Kubernetes increased and this was very beneficial for it because over time multiple tools were built around it to extend its functionalities and we actually saw some of these interfaces that we talked in the morning such as support for different networking, runtime, storage and many more and this created what today we know as the cloud native landscape which resides under the CNC of umbrella and this is the landscape that we the TUC provide a technical vision for so we will try to pretty much accept projects within the sandbox level and we will pretty much help them to go through incubation and graduation status so the aim is for all of these projects to reach a maturity and the graduation stage. Now at this stage we know that Kubernetes is pluggable it is extensible however at the moment when we had interoperability at the basis we had multiple tools that were built to bootstrap a cluster as well so you might be familiar with tools such as kube-adm, cop spray, tectonic installer if you go back to the CoreOS house and many more actually. However if you look at all of these tools it's difficult to find a common denominator. What it means is that if I use one tool to deploy my infrastructure to Azure it's going to be very difficult for me to use the same tool to deploy my infrastructure to GCP. Usually I have to introduce a new tool and this is not sustainable especially if you have a multi-flit strategy for your clusters. As such it was clear that we need standardization and interfaces and this is where cluster API was introduced. So cluster API is a set of declarative APIs for cluster creation management and deletion but more importantly it does so in a unified manner across multiple cloud providers. Now when we refer to cluster API we refer to SIG cluster lifecycle which had its first initial release in April 2019. Since then it had multiple releases and this year it reached the V1 beta 1 endpoint a very big milestone for the cluster API team and as I mentioned it actually integrates actively with multiple cloud providers so we're going to have support for major cloud providers such as AWS, GCP and Azure. We're going to have support for Chinese providers as well such as Alibaba cloud by due and Tencent and if you deploy your infrastructure to China you would know it's quite challenging because usually you cannot lift and shift you actually have to use tooling inside that region. Well if cluster API at least the way you deploy your infrastructure is going to be similar and standardized and lately we have new initiatives to bootstrap our clusters on bare metal and these initiatives is led by packet, metal free and tinkerbell. Now I've seen a lot of people being new to cluster API so what I'm going to do I'll just try to introduce it to make sure that we have the base understanding and fundamental knowledge to know how it works. So let's suppose we would like to deploy multiple clusters in different cloud providers and different regions. The first thing we're going to need is a Kubernetes cluster so we need a Kubernetes cluster to deploy multiple Kubernetes clusters this is something that I call kubeception. Now for testing purposes you can use kind which is pretty much a dock rise version for Kubernetes to create this management cluster. If you want to use cluster API in production I do recommend using a fully fledged cluster and this is because it comes with a more sophisticated failing over strategy. Now once we have our management cluster up and running we'll require the dependencies or the controller managers on installed on top of it and currently there are three types of controller managers we need to look into the cluster API CRDs or customer source definition controller, the bootstrap provider controller and the infrastructure provider controller. Let's go back to the first one cluster API introduces five new customer source definition and we need a controller to make sure that we can create, delete or reconcile any changes we have to these resources. The second one is going to be the bootstrap provider and this is the component that translates the YAML script into cloud init script and it will make sure to attach an instance from a cloud provider to the cluster as a node and this capability currently is provisioned by QBDM, TALIS and EKS and the last component we're going to need is the infrastructure provider and this is going to be the component that talks with the provider APIs and produces the actual resources. Think about instances, VPCs, subnets, security groups and so forth so the actual resources. Now here with the infrastructure provider is that the relationship is one to many. You need at least one provider so if you deploy your infrastructure to GCP you will need the infrastructure provider for GCP. If you want to deploy to Tinkerbell you'll need the Tinkerbell provider and so forth so it depends on what kind of providers you want to deploy your infrastructure. So once we have our controllers installed we'll be able to provision the target clusters and the target clusters are the one we will be delivering to our application teams so they can put their services on top of and these are going to be the clusters that your customers will interact with while consuming your services. Now a very important concept that Cluster API introduces is cluster as a resource. Pretty much we'll be able to use YAML manifest to define our infrastructure's code and this can be done for the five custom resource definitions or CRDs that I mentioned and I'm going to introduce them shortly because they're going to be quite relevant for our demo. Hopefully the demo is going to work as well. Now the first resource we're going to look into is a cluster resource and this takes care of the main networking components for a cluster so think about the subnets for your pods and services or DNS effects and so forth. Now by default in Cluster API you're going to have a control plane associated with every single cluster so control plane resource pretty much allows you to programmatically manage multiple machines with the control plane label which is going to have all of the control plane components installed on top of it. Machine here you can think this is a resource that's very similar to instance so here you can actually check or actually specify the version of Kubernetes, the instance type, any security groups, network king and so forth. So this is going to be the vanilla provisioning for cluster API. You're just going to have a couple of control planes that's it control plane machines. Now if you'd like to deploy any workloads you need a data plane and in cluster API this is managed through a machine deployment. So I'm hoping that you're familiar with Kubernetes so machine deployment is very similar to deployment it will roll out strategies between different machine set resources. Machine set very similar to replica set it will ensure that we have an amount of machine resources up and running at all time and machine again this is just an instance we can specify the version of Kubernetes instance type and so forth however the label on this particular instances are going to be worker node. Now these five customer source definitions we can use to specify our infrastructure's code so we don't need to use Ansible or Terraform we can use our YAML we like YAML within cloud native so here is where we can say that I want a cluster with 10 nodes three of them being the control plane seven of them being in the data plane. I would like this cluster to be deployed in GCP for example in this particular region with this particular security groups attached to it so here's where we define what we want for our infrastructure and to kind of provide more of a visual aid of how exactly cluster resource looks like here I have a cluster resource for AWS and I'm going to showcase it for GCP and Tinkerbell as well. Now what I have here is a cluster resource with the name demo cluster in the V1 beta one endpoint in the spec section I'm choosing a slash 16 for our pods and towards the end you can see that we have a control plane reference attached to it so this is going to be by default with every single cluster resource. However I would like to draw your attention towards the infrastructure reference here is where we say that we want this cluster to be deployed in AWS and what's going to happen underneath is going to pull configuration that's very specific to AWS and we define actually so here we say that we want the cluster to be deployed in your central one as a region and we want to attach an SSH key name with the name default to our instances this is not a declarative list of variables you have a full flexibility of what you can configure but these are just for demo purposes here. Now very important pay attention if we want to deploy the same cluster to GCP these are going to be the changes required so on the cluster side we just change our infrastructure reference that's the only thing we need to do and underneath is going to pull all of the configuration that we defined for GCP as such the region naming convention is very different we deploy it to europe west free we have the concept of a project within GCP so we attach our cluster to a project called CAPI and we can specify the network in this case with the name default CAPI but more importantly what you can see here we have standardization we can use and reuse our manifest we have the building blocks what's going to be different is the configuration for the cloud provider and if you'd like to deploy our cluster using tinkerbell again infrastructure reference is going to point to a tinkerbell cluster however the configuration is going to be very specific for tinkerbell and here I'm just specifying the base registry where we'd like to pull our infrastructure or images to be installed on our bare metal now so far with cluster API we know that we can deploy our infrastructure anywhere and we can do so in a standardized manner so we have building blocks and interfaces open spec now what happens if as a organization you do not want to use a cloud provider what happens if you want to fully manage your infrastructure on bare metal well in this case we have tinkerbell that saves the day tinkerbell is an engine for bare metal provisioning anywhere kubernetes is just a subset of it it pretty much provisions bare metal anywhere it was built by the equinix metal team in 2019 and it was donated to cncf as a sandbox project in november 2020 now being a sandbox project means it's a still greenfield project it still requires a lot of enhancements when it comes to functionalities so it's no not at the production level or the scale that every single organization needs but this is where I would like to invite you to contribute and actually look into the tool especially if you have a need for bare metal and of course tinkerbell aims to minimize the time for provisioning bare metal anywhere so it can be pretty much data centers public cloud or even edge devices so now let's look at how tinkerbell works so to manage any bare metal using tinkerbell you need three sets of configurations let me try to show that three sets of configuration hardware template and workflow the hardware is going to be your inventory so for example I have 10 raspberry pi machines or I have this amount of servers available you need to declaratively actually specify that and you can uniquely identify every single hardware machine using the mac address and the ap address after the inventory specification we need a template and the template is pretty much a set of actions that we want to perform on top of the bare metal so think about this you want to install an operating system you want to install dependencies any middleware any applications so by the end of it you have a server in the state that you wanted in production and workflow it's pretty much attaching a hardware to a template and this is very useful especially if you have a multi-fleet strategy so you can say I want five machines to be installed with linux and their respective dependencies and they have five machines to have windows and their respective dependencies so you can have the strategy of how you want to provision your bare metal now once we have all of this configuration available we can use the things they lie and send all of this configuration to the ting server the ting server pretty much can be running anywhere within your environment or local machine if you're doing a demo and what is it actually going to do if you have a hardware and a template is going to take a machine run all the actions in the template so by the end you should have a server in the desired state that you wanted now as I mentioned tinkerbell focuses on bare metal provisioning anywhere what happens if I want to provision kubernetes on bare metal what happens if I want a machine or a bare metal machine to be attached to the cluster as a node in this case we have the combination between cluster api and tinkerbell coming together and this is going to be crowned by kpt or cluster api provider for tinkerbell and this is actually how it's going to look like quite overwhelming but i'm going to take it step by step we need to look at three sets of configuration what we need from the tinkerbell side what we need from the management cluster on the cluster api and what's going to result the result is going to be the target cluster so going back to tinkerbell side here's where we need all of our configuration like as I mentioned we need three sets of configuration workflow templates actually hardware templates and workflow you can actually see them already in this diagram so on tinkerbell side we need our inventory we need to say that we have 10 raspberry pi machines and we need to make tink server aware of them from the management cluster side this is going to be cluster api a mini recap what we need is all of our controllers so dependencies we need the infrastructure provider for tinkerbell so kpt is going to be installed by default and everything we're going to need here is the yaml configuration for our infrastructure so as I mentioned we use yaml or crd to define our infrastructure's code we need that we need to define that we want the cluster with five nodes and so forth and another thing we're going to have is a hardware yaml now this is the important distinction because if we have 10 raspberry pi machines you might want only five of them to be part of the cluster so you need to make cluster api aware of what it actually can use throughout the bootstrapping process so the hardware yaml is going to contain for example five or six machines that you want to dedicate for your infrastructure now the very important thing about cluster tinkerbell and the tinkerbell provider for cluster api it actually comes with a set of templates and workflows available so you don't have to rewrite them they're already available out there so what's actually going to do is if you want to provision a new machine a bare metal machine using kpt the template is going to have the actions to install all the kubernetes binaries so you're going to have the kubelet you're going to have the sorts installed you're going to have kubproxy networking attached so by the end of it you should have an instance with all the kubernetes binaries that will be able to be attached to the control plane and actually be part of the cluster in the target cluster cool it's a bit packed but let's take a breather now i would like to take you back to the beginning of the presentation where we've seen that 88 percent of the organization manage more than six clusters it is impossible to manage these clusters individually at least sustainably so what you need to do is you need to introduce automation and parameterization if possible and here is where we can actually use the power of github now most of you are familiar with github just kind of a very quick recap the github's principle has the github repositories as the source of truth for defining the state of your application and in our case our infrastructure now what it's actually going to mean is by default we're going to have a pr-based rollout that means that the delta between our local environment and production is just one pr-way github's actually bring automatic reconciliation as a very important capability and what it actually means is that we're going to have a github's tool that's going to watch a repository and if new changes are identified these are going to be extrapolated and applied to the cluster straight away but more importantly with github's we're going to have a version state of our cluster this means we have different historical data points of our infrastructure so if you are in a red state for example you can very easily revert to a known and green state using just a couple of kit commands now this is actually a very nice announcement i think everyone is aware but the github's principle is very well represented by argocity and flux and both of which graduated within the last week so this is a very important milestone for the cloud native community argocity actually announced their graduation yesterday and flux announced their graduation i think a couple of days ago but meaning actually being a graduate project means that you have a big um a very mature project that has a lot of adoptions and contributions from different organizations but more importantly it has a sustainable roadmap with functionalities that actually can solve the problems that the tools will need to solve in the future and with that let's actually see where we can use or where we can introduce automation within our infrastructure provisioning now what i've done for now is i've completely removed tinkerbell from the scenario because cluster api um pretty much standardizes the way we deploy our infrastructure anywhere even with using tinkerbell or even on bare metal so what i'm actually going to focus here is how can we automate our infrastructure provisioning using cluster api and argocity so going back to the fundamentals we're going to look at everything we can have or should have on the management cluster and the result is going to be applied to the target cluster now on the management cluster we're going to have again mini recap we're going to have all of our controllers up and running as well we're going to have our infrastructure's code so what kind of infrastructure cluster we would like to deploy now all of this yaml manifests by default they can be stored in git and by default we can use a tool such as argocity to watch this manifest so if you introduce any new changes to your git to actual manifest in git you can merge your pr argocity is going to pick up the changes from the pr and apply them to the target cluster straight away very optional however i i want to sort of kind of outline this you can use a template manager because we have multiple clusters we want to parametrize or we want to reuse as much of our infrastructure as possible so here for example i'm introducing help we can use customized as well to parametrize the version of kubernetes to 1.24 to parametrize the amount of replica for the control plane in this case to three and the replicas for the worker nodes in this case is going to be one so here we have pretty much a cluster with four nodes in version 1.4 124.0 now any changes now i need to introduce are going to be to the helm chart because argocity is going to watch the helm chart if i'm putting any changes to this chart for example putting a different version for kubernetes 1.25 for example these changes are going to be picked up by argocity and applied to the target cluster straight away and this is something that i would like to demo as well if i'm not mistaken so going here what i'm going to showcase the setup i'm having at the moment i have the management cluster deployed on my local machine using kind on this cluster i have all of the controllers installed actually i'm going to use aws a very important remark here i would love to do the demo on bare metal however to travel with a bit of raspberry pies across the world it's more challenging so i'm going to use aws because it's more convenient for me to showcase how we can automate our infrastructure provisioning but it doesn't matter because with cluster api the functionality is going to be similar so ideally you should have a digestible understanding of how we can automate and provision our infrastructure so in this case the controllers that i'm going to have is going to be for aws as well argocity is going to be installed and i'm going to have a helm chart that's going to manage my infrastructure and the idea is that i'm going to increase the amount of replica nodes for actually the worker nodes and ideally they should be applied to the target cluster without me doing anything so without any forbidding let's just make sure that everything works now this is a bit overwhelming i know these screens are a bit small so it's not very readable all the way through however i'm going to take you all the way step by step to showcase what i actually have happening on the screens so that's not it either cool what i'm actually going to do this is my management cluster i'm going to get all of my pots so i would like to showcase all of the controllers that i have installed so this is a bit overwhelming i don't like the viewing it's a bit trimmed but if i make it smaller no one can see it either so i'm going to point your attention towards the um cluster api this is going to be the bootstrap provider so in this case i'm using cube adm to bootstrap my cluster the yeah and this is going to be for the control plane we needed a control controller for our crd's and this is going to be installed in the kappa system so this is going to take care of our crd reconciliation and as i mentioned uh we're going to have an infrastructure provider and in this case it's going to be called kappa we know about kpt which is provider for tinkerbell kappa is provider for aws so cluster api provider for aws because i'm using aws i have that controller installed and as well we can see that i have our gocd already installed over here so pretty much we will be able to connect to our gocd as well now here is where i would like to showcase how we have the management cluster versus target cluster so on the top i have the management cluster on the bottom i have the target cluster so this is a cluster you're actually going to provision now because in aws the vpc provisioning takes around five minutes i've already provisioned the target cluster just kind of be um more resourceful with the time but what i have here i'm actually getting all the machines so the machines are our crd's these are all our instances and we can see we have currently i'm just going to try to highlight them you see here control plane control plane control plane we have free control plane machines and you can actually see in our target cluster that we have free control plane machines as well and here on the management cluster side we can see that we have one worker node this is well this is like insider knowledge but md here stands for machine deployment which is pretty much our data plane and we can see that here we have one worker node as well so pretty much what we have in our machines matches the target cluster as well the change i would like to introduce is pretty much increased amount of replicas that we have for our cluster now before i do that some people might be new to to helm chart or actually how we parameterize some of our variables so here is the helm chart or the input file for the helm chart um that i've actually showcased on the slides so pretty much is similar again we parameterize the version of let me make this bigger a bit we parameterize the version of kubernetes to 124 free replica nodes for our control plane and one replica node for our worker worker plane or data plane so everything matches that we have in our terminal now to showcase actually uh to showcase how helm chart works i would like to showcase let's see the template i'm going to go to the cluster template and here again everything we've seen on the slides pretty much is the same so we have a cluster resource we choose a slash 16 for all pods and our infrastructure reference is aws so here is where we pretty much say that we want this cluster to be deployed in aws to see where the template like where the helm chart will input variables uh let's go to machine deployment and here is actually where you're going to pick up the amount of replicas for our worker nodes and here's how we're going to pick up the version of kubernetes so this is just a way for helm to pretty much pull up the values that we have in the input file and recreate the manifest with the desired state that we want actually one more minute hopefully if you give me maybe two more minutes i'm going to showcase the demo if everyone is patient enough um but i'll try to go through this as quickly as possible so what i'm actually going to do the only thing i'm going to do is change our values that demo file that we seen on the slides and i'm going to increase the amount of replicas from one to let's put five because this git ops all i need to do is to use git commands to submit my changes so i'm going to use a git commit with a very meaningful demo message i'm going to submit all my files and i'm going to do a git push straight away i know over time but i really want to showcase this and how it's actually going to work cool now what we actually can do we can look into argocity this is going to be very overwhelming but this is how we can see a visualization of all of our parameters and customer service definitions so what i'm going to do um it's actually we're going to look for the repository every single couple of minutes but i can reinforce to look for new changes so if i hit a refresh and hopefully my internet is not going to fail me oh goodness mine this is pure pressure let's see am i actually oh oh you see i see i see i see i see okay let me just do the forward forward again local demos cool okay refresh refreshing awesome and we can see that we out of sync ideally we should see our change here we increase the amount of replicas from one to five now with argocity you can do automatic reconciliation i chose to do manual reconciliation best just because i would like to kick off the demo so what i'm going to do i'm going to synchronize manually but this can be fully automated with argocity and ideally we can see that we have some resources showing up here but more importantly we should be able to see them here on the management cluster we can already provision some of our machines and ideally by the end of it hopefully we're not going to have time to see this but within a minute or so we'll be able to see new machines added to the cluster so i can come back to this after um i'm introducing one of the last slides oh pure pressure cool i'll come back to that i really want to showcase how this going to happen it's usually around the minute so shall i actually stop now is that it i don't have time cool now unfortunately we don't have time to actually showcase the demo or another slide i wanted to showcase is how we can combine all of this tool together so how we can use tinkerbell cluster api and argocity to deploy our infrastructure in an automatic way now i don't have any more slides this is the overlap between the tools but more importantly i would like to thank you very much if you have any questions and if you like to see the demo there it's actually recording of it so you'll be able to see it um i'm more than happy to show it after the talk as well so like if you want to really see it i'm happy to showcase the showcase it after if you have any questions reach out to me on social media such as twitter and linkedin and this is a qr code towards the cloud native fundamentals course so if everyone would like to start their career in cloud native i recommend taking this course now this is gated gomanji and i look forward to seeing how you can shape the cloud native ecosystem thank you and enjoy the rest of the conference and we have we have new machines here 49 seconds ago here it is the demo worked actually but not in time