 My name is Mike goodness. I'm a systems engineer on the kubernetes team at Ticketmaster joined by my colleague and co-worker Raphael who is also a systems engineer on the kubernetes team at Ticketmaster Before we get started. I have a couple of notes. First of all, I want to say hi to my Family and friends who are watching this on YouTube despite not having any idea what I do for a living Secondly, I want to apologize again to everyone who's watching and Was hoping to see some sweet Lord of the Rings memes Given the title of the chat talk You're gonna be certainly disappointed Although I could say at Ticketmaster. One does not simply cube cuddle create a little background I Have a couple of years of production experience with kubernetes both at Ticketmaster and at my previous company I'm a helm charts contributor and co-maintenor recently Became a CNCF ambassador and I co-organized the devops days Madison conference Let's see. My name is Raphael deem. I'm an open source enthusiast. I maintain the sanic Which is a Python web framework Also pretty new to kubernetes probably started using about six months ago And this is my first time speaking in front of so many people The the the code there is you know be gentle This is what we're gonna talk about We're gonna keep it, you know kind of nice and tight hopefully We are going to kind of assume Some familiar some familiarity with helm. I will cover a few of the basics, but you know I'm also going to be picking apart care kind of dissecting a few sections from our our web service chart and it's going to assume some Knowledge of how helm works and kubernetes manifests But this seems like a good crowd to have to have kind of that baseline So Anybody who's worked with kubernetes knows that you're working with quite a bit of yaml like lots of yaml You want to deploy some pods so at the very minimum you need a deployment yaml You probably want to expose those pods behind a Load balancer so you have a service yaml You May or may not well, I mean assuming you want to actually be able to do anything with those pods outs You know from outside the cluster you want an ingress say it with me yaml I'm not gonna I'm not gonna show every possible Resource because as we all know there are a lot of apis now in kubernetes and each one involves yaml so You take you know all of those all of it all of that yaml file that goes into Building an application deploying an application to kubernetes you multiply that by You know the number of clusters that you may have in your environment You know and each one has a certain you know certain configuration points that you know that are different than the others You need to deploy different versions of the same application and each you know some of them have I mean at the very Least your docker image tag is going to be different. So that's you know, that's a different configuration point You know the problem with kubernetes as it is today is there's no real way to bundle all of those applications No native way to bundle all of these resources all these manifests together into a into a single unit You know into what we really think of as a deployable application We have labels and that's you know, that's the native mechanism But there's there's no real enforcement. You know you you Other than manual manipulation making sure that you know each manifest has a label, you know to to create that association There's no other real mechanism for it. And that's where helm comes into play so helm again at a high level Allows you to treat your kubernetes application as a single unit it provides a rendering engine most mostly focused on go template though that is Plugable with a Great deal of difficulty It provides a package an application manager. Is that butcher? Did I hear butcher laugh? No. All right So it can act as your application your package manager akin to you know the way aptium or apk works for your Linux distribution and then it also provides a release manager so that if you're deploying the same application multiple times you can track that release you can you can you can track those those related Manifests or those related resources and alter them as a single unit Some terminology really and you know, I should have put this bill at point last obviously When we refer to helm we are you know, we're going to be referring to the complete application But it's also specifically the client side tiller is the server side application that actually lives in your cluster and communicates with the API server so One of the reasons we did this is because we have a lot of clusters Last we counted there 15 total. We have hybrid cloud. So we have AWS and on-prem We have different regions for each of those and we also have multiple environments So as you can see that's quite a few clusters there We also have One namespace per team. So every team gets their own Namespace this is facilitated through we have a tool called namespace creator internally kind of like how get hub was if you went I heard that in the keynote. They have something similar so The team basically a product team that wants to use a Kubernetes cluster We'll fork a repository that has a list of the enabled namespaces and add their Product code to it and then that will perform some validation To make sure for instance that they have one thing that we enforce is that they have a contact Technical contact filled in another system so that if their stuff goes down we know who to reach out to Let's see The namespace creator also provides role-based access control via active directory groups so if you know you want to give somebody access to that namespace you add them to an active directory group and It also enforces resource quotas on the entire namespace and Deploys tiller that server side of helm For each namespace to isolate things the resources and the creation of them everything is separate Finally another detail here is that we have multiple as a result of having the hybrid cloud. We have AWS Ingress controller, which uses the Application load balancing ingress controller which we developed jointly with chorus and then on-prem We just use the standard nginx ingress controller, but as a result of all these Complications or you know different details. That's essentially why we developed the web service chart So yeah, as Raphael said this we refer to our our common Helm chart as the web service chart really couldn't get any more generic of a title than that We developed it originally so our original approach was that it would be Just kind of a template for teams to fork and customize as needed That's you know, that's a pattern that we've seen in the in or that we've used in the community charts repo You know each application has its own chart. It is tuned to that applications needs and as it needs to be extended PRs are filed against it and we You know we integrate those changes There are some pros and cons as there are to everything to that approach the pros as I mentioned is that or The single pro really the biggest one is that that chart is tuned to specifically to that application There are no extra configuration points to confuse a team or to you know to Conflict or just to you know cause problems. It's it's very purpose-built The cons are that you know, there's no there's no commonality between those charts. So When team a you know discovers a new way to do something Team that that's not easily communicated or shared with team B, you know, there's no shared best practices Or you know, not it's at least not easily, you know, it requires some manual intervention people actually, you know talking to each other, which Yeah, you're right. Yeah But then, you know kind of the the inverse is that when there's where when there are cluster changes, you know Whether it's something that we've done to add or remove functionality. We being the cluster ops teams We then need to communicate those changes back to the teams again communication Or when there's an upstream bug that is either Discovered or fixed that requires a change in chart functionality that needs to be shared So it's just you know, it's it's all about sharing and the difficulty therein So a few months ago We decided to kind of flip that And as you can see from the bullet points like really flip it So now the kubernetes team maintains that web service chart manually And we share it across all the teams. So when a team Deploys an application using the chart they point to our you know to that one helm Repository to that one chart in the helm repository rather than you know having a team a chart and a team B chart So the pros again as you can see our again flipped, you know, we have some There there's no need for commonality because there is only one chart Team a deploys that chart using the same values Or the same, you know, the same options are available to team a as are available to team B If a team discovers a new better way to do it They can submit those improvements via PR Likewise when we change something in the in the cluster or there's an upstream change We can integrate that into the cluster cut a new version and you know send that out to the teams who? You know then update their pipeline so that they're deploying using that new version of Of the chart this has worked out, you know really pretty well so far I don't know if you know, I don't know how long it'll work You know before we have some really customized applications that need You know that that need special care and attention, but we're hoping that this can Get us pretty far. So what I'd like to do now is kind of a la Vic Iglesias's demo yesterday of the community charts best practices is dive into some of the Some of the things that we've done in the web service chart to support deployment to hybrid cloud and Account for the differences between those environments So basic structure here, we have the usual The usual components of a helm chart our chart yaml, you know metadata We have the templates folder and then our values file. So I'm going to dig into the values file and Because I'm using a large terminal This is going to be lots of scrolling as you can see This values file is 251 lines long 6256 characters. So to say we provide a lot of knobs and dials is Putting it somewhat mildly A few I'll go through a few of these options. So we have things like You know configurable aws region I am role revision history limit. Let's get to some more interesting ones of affinity If a team and most teams have actually adopted this we're probably going to flip this bit Because most teams do want to have some anti affinity so that their pods get spread across availability zones here. We have that set to false just because that is kind of the You know, it's not it's not out of the box behavior but again, we're probably going to flip that pretty soon a Couple of things that I will mention in just a few minutes Our replica count and max replica count. So these come into play when creating the deployment Mac replica count is you know, the number of replicas kind of static replicas that you want minimum And then max replica count is used by our horizontal pod autoscaler component. So again I'll cover that in just a just a second And then we have an option for max unavailable pods, which is used by a pod disruption budget, which again I will show in a couple of minutes So I'm really just going to kind of scroll through the rest of these some standard stuff if you've used a Community chart before you're going to recognize some of these values because we've identified those as best patterns So, you know being able to specify the service account name Being able to add custom pod annotations and labels. We recognize that You know, we don't want to provide values for everything. So some of these are relatively relatively freeform where they you know, they're just expecting lists or Whereas, you know others are expecting actual objects Ingress settings. So again 62% of the way through and I've scrolled enough So the first resource I'd like to show is our deployments of your deployment object Pretty standard manifest with the addition of lots of curly braces, which you know really improves readability. I know But you know, it's a it's a standard manifest that we just have made very configurable one particular item of interest here is What we've done with this conditional around our rolling update strategy We've we've said if if we're only deploying one replica We want to set the maximum unavailable to zero so that when we do a rolling update We're sure that we still have a pod running, you know during that update So it'll bring a pod up rather than killing the old pod and bring you know before and then Not waiting until the new one is our you know bringing a pod up and then not waiting until the old one New one is ready. It makes sure that at least one pod is running during the rolling update So that is actually something we did that one of our teams discovered. They were having you know They were having outages anytime they were deploying, you know best like best best practice would be don't run one replica, but You know, we try to accommodate We've added Let's see I am role. So this again, this is a pattern that we use in the community charts repo for AWS specific applications being able to add the I am role as an annotation Init containers, there's a there's a bug in Versions of Kubernetes less than one dot eight that ignore the standard init containers Object so the fix for until one dot eight Was to revert to the pod dot beta dot Kubernetes dot IO slash init containers annotations. So we've accounted for that Let's see there are Here's the anti affinity rule that I mentioned in the values file I think what's of particular interest and what we actually kind of called out in our summary of the talk is The ability to add sidecars to your application. So we've recently Started pursuing Jaeger for Tracing distributed tracing So we added a Jaeger dot enabled Flag to our values file if you set the Jaeger dot enabled to true The Jaeger agent gets added as a sidecar and then we provide other Configuration points so that if a team is testing a newer version or a different version of the image they can plug that in Resources also, this is a pattern that that Vic mentioned yesterday And just kind of FYI you should definitely when that video is available check that out because if you contribute to the Community repo we will be very grateful if you follow those patterns like day one But in this case, I'm referring to the resources section here where You plug in your resources because you're you know different teams are going to discover different resource needs So this is how we accommodate that Fluent D some of our teams are using fluent D for log collection and Forwarding so we have provided a very simple one-off Configuration point for enabling a fluent D container sidecar We've done the same for Splunk, but what's what's really kind of interesting is that we acknowledge that we can't cover all the bases We have no interest in covering all the bases 251 you know lines in our values file is you know pretty good We don't need to add you know options for every possible sidecar that somebody might want to deploy so We have Here just a plane if you want to provide a custom sidecar Just give us the ammo and we inject that right into the manifest and then that gets added to your pod There's yeah, there's you know plenty more in this deployment file that I would love to go into but I Also want to give Rafael a chance to talk at some point. So what I'll do next is show our service so Just like we want to be able to enable sidecar Containers we want to enable Prometheus like one touch Prometheus metrics scraping so we have a metrics dot enabled value you set that to true and Helm adds these annotations To the service and then our standard Prometheus configuration starts scraping those services scraping the metrics off of those services Automatically Let's see then the other thing I wanted to point out in the service is right here We have a conditional we wrap Our service type in a conditional here. So when we deploy to AWS We set the platform value to AWS and if we're requesting an ingress we said ingress dot enabled equals true The way the ALB ingress controller works is that it you know, it requires a node port to be exposed It attaches to the node, you know the nodes port. So We have this conditional that says if those if those two values are set The service is going to be of type node port if it's if they're not set then they you know It doesn't get set explicitly and the default Kubernetes behaviors to provision a cluster IP service We have discovered that this it's not quite a binary thing. So this is probably something we need to revisit soon We have services that are not deployed to AWS and do need to be node ports And you know, they're not always going to be cluster IP services, you know, even though These conditions aren't met. So Again, that's that's something we learn and as we do we make changes and everybody benefits Speaking of ingress Let's take a look Here the entire manifest is wrapped in a conditional So if ingress if an ingress isn't required, we don't bother creating the resource, you know pretty straightforward But here again, if we're deploying to AWS then we need to add a set of Annotations that the ALB ingress controller uses to provision those application load balancers And there are quite a few of them AWS provides a wealth of options configurability and the ALB ingress makes does really does a good job of supporting them all so If you have a TLS certificate that you want to attach to the ALB You can provide that you can provide its arn here. You set the health check Etc. All of these options Some of which are kind of sub templated for example the security groups If We have a helper template that Lists our ten our subnet names Based on the type of sub based on the type of ALB we're asking for if we're asking for an internet-facing ingress Then it puts the security groups and the subnets that correspond to our public-facing Infrastructure in those annotations if they are internal only ALB endpoints then we use a different set of security groups and subnets So that that kind of covers How we deploy or how we handle the AWS case if we're deploying to our on our on-prem clusters as Rafael mentioned we use a shared engine X ingress controller in those clusters so we so We don't need the AWS annotations. We just need one that sets the class to In our case shared engine X And then here we have you know a few of the usual suspects, you know annotations and We can set hostname. We actually do have another helper template that will provide a default Fully qualified domain name that can be overridden by teams if they want to use, you know their own custom hostname As promised, I'm not going to go into config maps. They're really pretty pretty bare bones for the most part Fluency is just kind of a default configuration config map The config map YAML is basically empty. All we do is you feed in you feed in Key value of you know, what's your what's your config name your config map? Key should be and then you feed in the data and we just stick it right in the manifest. It's it's super elegant What I will show though because I mentioned it earlier is the horizontal pod auto scaler. So this is where Those two values max replica replica count and max replica count come into play if you provide both of them and You specify a higher max then You know kind of min. I guess we could have called it min Then it will create this manifest and it will use those values in the in the appropriate fields So in the spec you have max replicas and you have min replicas You'll Anybody who knows the HPA realizes that that it is If you have the custom metrics API enabled You can scale based on you know Prometheus metrics for example and this this that's something that we really like to offer like soon to our dev teams They are really keenly interested in being able to scale based on things like requests per second You know being able to spin up new pods when you hit a threshold For now though, we are using just the base functionality, which is CPU So what the what the HPA does is it looks at the average CPU usage across pods And when it when it exceeds Given threshold it will create more pods up to your maximum Until that threshold is no longer being met. So a Little bit. Yeah inside kubernetes not really inside but bonus kubernetes information The last one I'm going to show is pod pod disruption budget. This is a relatively newish resource That we are also finding pretty Convenient this is when we're doing when we're doing manual node maintenance say we need to take a node down For whatever reason when you do a cube drain It's first going to look at your pod disruption budget and make sure that you key that it that the scheduler keeps Your max unavailable pods that number of of pods It's going to affect no more than that number of pods during a maintenance During maintenance So this is again something that was highlighted by an application team. They were wondering why they're you know why? All of their pods are why would they were returning five hundreds, you know At the same time that we were taking a pod down for maintenance that maintenance and it's because of a few things I mean, you know, all of their pods got co-scheduled, which you know oops But something like this, you know the pod DB is meant to address that so that even if you do get co-scheduled this will make sure that you know that X number of pods will be Will be up and running before it kills the rest of them. So it comes in very handy You'll notice we have a conditional around this manifest as well This so if release dot is install is a helm Condition that says, you know, we're only going to create this if this is an initial installation The reason is that the PDB at least as we understand it the PDB is currently immutable So if you try to change a pod disruption budget after it's been created your helm is going to Flip out a little bit. You're going to get an air message that it, you know Resource already exists or something or you know field cannot be changed. So there are a couple of issues I know about that are open on this So hopefully that gets addressed soon we are I am running super long so at this point I'm going to hand it over to Rafael to try to squeeze into the last five minutes Alright, can you guys hear me? Alright, so I'm just going to run really quickly through an example of deploying Our deployment essentially with the web service chart Let's see if you're interested the code is available on my github raffles. Hello, kubecon It's pretty minimal. It's just it's not actually going to include the web service chart or anything But if you want to go and look at the files and more deep tail that I'm pulling up That's where it is What the example does just basic go application like hello world. It's going to run go to us It's going to build the application push the container image to our container registry Which in our case is ECR? Amazon ECR and then finally deploy the application using the web service chart and helm Well, thank you So this is the app in question as you can see just says hello kubecon I'm going to just show you the files here really quickly simple go app Here's the test my hands are shaking like I was hoping they wouldn't This is you know, just checking to make sure the status code is 200 and that the message is as intended And let's see what else so this value's file This is a more minimal one as you know You don't have to fill in a lot of those values are just already defaulted in the other thing So here's like a kind of minimal example for what you would need to deploy this As you can see I'm just the important stuff here is that I'm telling it. Hey use the app I'm this is where I'm going to push the container So that's probably the most important thing. This is shared values It will be used both for AWS and on-prem if we look at one of the others You'll see I also filled in the fully qualified domain name and the platform is on-prem So yeah, you can specify multiple values files for helm and it will use a you know They can override each other and stuff. These are completely not overlapping. So that's not even really consideration anyway So we can go to the website. I already deployed this so we wouldn't have to wait for the Whoops, how do I do that three fingers? Yes. Yep. Whoops. I also need the FQDN Oops So, you know the whoops, we have to be on the these are internal only so you have to get on the VPN so might Yeah, we're intended to do that earlier. So yeah, hello. Keep calm now. I'll be showing up there But yeah, it's only it's not exposed externally we have had to configure that separately I did not put that annotation on there or up. Yeah, so we won't be able to see the hello keep calm But you can use your imagination. It's just regular text What we were going to do next I can do it But it will be far less exciting is modify that message or the FQDN and then do a get push, you know So go over here So yeah, just imagine that that we had seen the hello keep con in the web browser It's really there. I promise you Let's see. What else? Oh, yes, the test will not be happy if we don't change that as well but Yeah, but so, you know, the point is that once we do this I didn't go over actually what get lab Runners are but you're probably familiar with them in some flavor or another It's a lot like Jenkins or Travis CI and essentially we're just running some code in it You know to find the container of choice And so, you know whoops. Oh, yeah, well, we need to be on the VPN to push as well So there we go But yeah, next you would have seen that the message changed after a moment Yeah, so now if you have any questions or yeah And if you have questions about the any of this you can contact us there mics on Twitter or you can see my email address