 We've talked a lot about platforms that commit and one of the consistent themes has been simplicity With fewer integrations to maintain and more visibility across the life cycle because you have a single source of truth We can bring more people into the process and we can extend their reach so they can become more productive and do more things And that's great when you pair simplicity with scope It's very very powerful, but with that power comes the ability to introduce new complications into the system One area where that can happen very easily is cloud native workloads. It's Very very easy to over complicate things, especially if you ignore simplicity and manageability And you focus exclusively on scaling So our next speaker Robbie Lockman will go into a lot more detail about some of the traps you might encounter on your Kubernetes journey and provide some tips on keeping things simple enough to help you maintain the Productivity, which is the whole reason you're trying to do this in the first place. Let's listen Well, hey everybody and welcome back to another github commit I'm super excited that you're here to hear my talk and couldn't be happier to chat with you during the talk So my talk is called scaling simplicity idea to production in Kubernetes And we're gonna be going through a little bit of a journey that we had here at harness Well, we're actually we're migrating to Kubernetes. So a little bit about me. I'm Robbie Lockman I'm an evangelist at harness you can get me at my get lab handle at Robbie lock also my Twitter handle at Robbie lock if you Want to communicate afterwards? So what are we going to be talking about today? So let's actually talk about a time before kumarets Yes, there was a time before kumarets and workloads were distributed long before that And also as you're going through your own kumarets journey simplicity has its virtues One of the design principles kiss keep it simple stupid. That's very important and we get into that And during your journey scale will come right? So don't worry if you're not using the latest and greatest which is kind of hard to say in kumarets This kumarets is kind of mainstream now if you're using the bleeding edge That's okay, and then also we're going to be talking about continuous delivery Which is your conduit from idea to production So at time before k8. So let's take a look at actually a distributed system architecture Right before kubarets. This is actually the architecture that I was most familiar with people were kubarets Not in my background. I've been a je your jet Java Enterprise Edition software engineer for a long time I had lots of workloads off of kumarets and lots of workloads on kubarets And so going through this particular distributed architecture They might looking from the user to let's say some error that you have some persistence is that you know What the first thing is you get load balanced and then you might have more than one application server, right? So at the time report Kubernetes it was normal to have let's say one application per application container And even having if you had a stateful application that those application servers have to communicate So you're looking at some sort of in application clustering or out of application clustering For example if in the Java stack you might using a particular in-memory cache like in fitness band or even clustering or consensus algorithms and platforms such as jgroups and then finally, you know You don't live in outer space or in the ether at some point you have to write something at some point So if database stack might be some sort of streaming messaging like a dupe or writing to a fall level database like Alessandro Or even no our good old friend Postgres here and potentially before kubernetes There was a big investment into platform as a surface or passes So you might be using a cloud foundry reduction of rendition let that be a pivotal cloud foundry I've been blue mix or even open source cloud foundry itself or even you're looking at something like message for DC West So there was help to get this architecture out there And to also facilitate each one of these with the parts But like any distributed system if you take a look at this should be system through this thing called the fallacies of distributed computing right, so it was developed by some engineers at Sun Microsystems that No matter if you have a distributed a platform or distributed computing is that hey You know what these things are actually fallacies first of all your bandwidth is not infinite, right? You have caps of bandwidth a big one here is latency No matter how many kubernetes clusters or how many nodes you have There's still going to be latency between the distributed system right or between the nodes or even Looking at modern Clark architecture if you have different availability zones There's gonna be latency between the AZ's and also again a distributed system is not 100% reliable You know the more parts you have potentially the more failures you have the old adage You know what do you want to plane with four engines or a plane with two engines? Well, you know the more parts you have the more failure there is and also and this is actually very true Kubernetes the assumption that is going to be one and only one admin is not true Right like the beauty of Kubernetes is that the more people you have cubes a cube CTL access to which is you know If you're familiar with it's command line of Kubernetes You have the ability to change the cluster But what do you get to hear is that you know? I'm not gonna you know kind of no kind of rain on Kubernetes too much again Like Kubernetes didn't automatically solve these problems right like you still have latency if you have your nodes that are too distributed They actually might get more marches on healthy if the latency is too high right here if Kubernetes their QBDM can't connect Or or talk to the different nodes reliability like any system, you know what it's not because you're not automatically reliable The platform itself, you know needs management your workloads are not automatically more reliable because your Kubernetes again You know the same network IO that you're bound with without Kubernetes It hasn't increased unless your network IO is increased and again going back to one admin Actually, there's no it's designed to have more multiple admins, right? So you might have some sort of tasks that are overwritten. I'm just having to make sure you keep those people in line But what do you get with Kubernetes right? So what do you get? Well, Kubernetes is really really good with lots of things, right? So you get for example, you get parity, right? So if you're having something running in your local machine something your local machine like, you know K3 is your mini cube There's some expectation that that workload to the run kind of similar in a production cluster You got scale, right? So adding more pods and more replicas and even actually more worker nodes is trivial, right? Like you can you can scale your cluster to support the workloads or workload of your choice Upgrades, right? So if you look at what it took to orchestrate a container workload without orchestrator Basically, if you had just some sort of patch or upgrade which means in the container world, it's immutable So you're making a new version of the image It's actually pretty easy to do that with Kubernetes or you can say hey You know what redeploy this version and Kubernetes will start taking care of that for you And also with all these benefits you get more agility, right? Potentially that if your operations team your development team your application infrastructure team of all three of those teams are speaking kind of the Same language here Kubernetes You get harmony and agility with that So let's talk about the I call it the virtues or virtuous simplicity, right? And so let's talk about why simplicity is so important, especially with the distributed system. So Taking a look at your particular firm There is do you have an internal versus external customer, right? So it's just talk take this in the terms of being a platform to your So your internal customers could be your development teams, right? And the thing is with internal customers. They don't have any choices. What you give them is what they get right now There's a whole notion of shadow IT But for the most part, you know what they're they're not going to be rocking the boat You know, it's they don't have a choice versus your external customers or quote-unquote your actual customers of the business They have almost infinite choice, right? They have lots and lots and lots of choice And so, you know the adage that your external customers don't care how you did something is actually pretty true They care about the results, right versus your internal customers absolutely care how you do something because they're subject to it, right? So if you're if external customers are happy or vice versa if your internal customers are happy They produce the best work for external customers. This can be a very intrinsic talk But really, you know what you have to support both customers, right at the same time because again your external customers You know what if you have something overly complicated, they don't care and it's detrimental to your internal customers Because why is this? Well, eventually in times that are good and in times that are bad We move on right so as people we're very rarely or I would say almost never at the same company for 30 years anymore We're not especially we're not the same project for 30 years and all the institutional knowledge that we have kind of goes You know when times are good get a new job when times are bad Yeah might get a new job or change project teams that or change tribes or depending if you're on that Spotify model But that's it right eventually people move on move up You know and next the next generation comes in and gives a fresh pair of eyes the problems And if you have something that's overly complex like you're limiting the number of people who have knowledge a That the learning curve would be extremely high when they join the team would be a lot of what industry convention Might it be kind of weaning on your particular team And the latino one of the last things that you don't want to be doing and this would be a guiding principle If you're an architect is that you don't want to be troubleshooting the at the bleeding edge, right? And so here's that little famous meme, you know, there's hey, you know what? I only test in production or you like to watch the world burn It's items that are tried and trued Yeah, there's kind of two parts there at one. There's operational maturity So the kind of the more mature something is there's more operational maturity and so that if there is a problem There's more of prescription of what to do at the bleeding edge versus it is the first time You know as you're using something in alpha or beta There's not much, you know use of scale maybe one or two Silicon value firms have invented it and they're kind of you know, they're they're operating at scale But they have 40 people using it versus there's like two people at your firm who you know Who might have a little bit of you know exposure on D zone or something to something like this You don't want to be troubleshooting the big edge like when all eyes are on you where all eyes are on the team Last thing you want to do right? You might run the edge cases that you just have it a raised condition that you just haven't thought about before And that's it, you know, you don't want to be troubleshooting on the bleeding edge Let's talk about scale will come so eventually. Yes, you know what you start to incorporate new to especially in our journey We start to incorporate new technology In buckets of it, right? And so and depending on the functionality that we need but scale will come So the first thing you want to try to do is when you're kind of building out your journey Is that actually focusing in on well, you know, why are you migrating and also well this particular trifecta? That if you've seen any sort of the DevOps monitors people process technology This is it right like this is the DevOps mantra But I kind of point a little bit different of a picture here is that I call it the confidence trifecta and asking the question Where do you have the least confidence? It's where you're gonna start building first, right? So I always like to say computers are easy about people are hard And so if technology if you're the least confident in your let's say infrastructure technology That's easy to solve for for example, you know what for some odd reason We're you know, we're having maintenance trouble in our application You don't want like bringing it down And so okay, why don't we have five nodes of the application then because the things are five You can take two concurrent failures, right? You can do it you can do one for maintenance maintenance or upgrade and one for actual failure And that's actually very confident in you having five nodes of a particular Resource or service process wise. Okay, sure, you know what we you know We don't have proper test coverage So we need someone with a lot of domain experience to review the changes or especially if you're probably a regulated industry or things I take a lot of domain experience like insurance or healthcare like this, you know might take or banking especially It might take someone to kind of you know, who knows who knows the domain Solve for that we can put that person in the process or people in the process versus people if you're the least confident in your people It's it's guard rail city, right? You're building all sorts of guard rails in between that but even even more so, you know People are naturally curious, right, especially software engineers. They want to try new stuff out They want to build things and this huge push Into cloud native architecture is these two terms that are actually going against each other So let me let me start the basis with this a Dividend C plus infinity equals cloud native and so if something is indebted it's actually a mathematical construct that it's no matter How many times you hit the same equation you get the same result, right? So or a good example is asking a weather. So I live in Alfred at Georgia Outside it's kind of sunny and humid today. So if I go, you know asked, hey, what is a weather in alpharetta? It's sunny and humid no matter how many times versus the weather changes I ask that question I get the same answer from each piece of the distributed system Versus a familiarity is that being a federal means you're short-lived, right? So containerized workloads meant to die if you have something that's super long-running. It might not be, you know Wise to put it in a single container But if you have a one-hand something that needs to be consistent It's something on the other hand that will die all the time Those two things are actually competing against each other and you would think to yourself that hmm You know what there's probably like three things on earth that can actually do this which is actually incorrect So if you take a look at the choice overload, which is the this the common thing of pundits here That's a cloud data compete foundation CNC earth landscape there's that you know There's over a thousand cards or 1500 cards here is that there's so much choice and so much I would say very granular functionality and things that are vying force incremental improvements and very granular functionality It's you get inundated with with choice, right? So how can you decipher as something is bleeding edge versus mature? You know, do you have to be using like several or dozens or all about all of these technologies at one time for it to be Be confident is no, you don't need to start your journey somewhere and build that operational Expertise and operational excellence and change will come right so let's take let's actually take a look at you know Our journey. I've been dirty. It's been going on for two years now when we've been migrating our workloads That the purely Kubernetes and so kind of like the first you know day zero. I will call it when we start turning our workloads We're using very vanilla instances of GKE, right? So our cloud provider this case is Google and so we were using we were trying to stay as You know out of the box as possible without writing tons of custom resources and operators and whatnot and say hey You know what if it's functional is given to us by default or something easily injectable by Google Let's go ahead and take that and change will come right like hey You don't have to do it all at one time. You know if you know kind of the shiny penny You know a year ago might have been the service mesh like Istio even more kind of running out last year You might be tearing something if it's coming to call OPA or open policy agent You know, we're leveraging those things now But if we weren't leveraging them, you know two years ago, right? Like hey, you know, we weren't really invested that much in a service match years ago OPA is one of a newer kind of a newer type of way of doing something and We haven't been invested that early on and again understanding that hey no change will come Because if you try to do too much change at one time There's this concept of shifting left, right? So you might have heard several talks you're talking about shifting left shifting left shifting left And this actually introduces complexity, you know, so computer science is like an abacus You complexity never goes away, right? You're just moving complexity around You anybody would experience will tell you that like hey, you're just moving it from left to right or you know in the abacus and You know the more we shift left the more the more that the control and the Prescription the demand issues outside the DevOps or pop from your team gets shifted to the developer, right? So not only when I develop now if you take too much a shift to left hand I only this developer have to write the feature They're also shipping route changes with Istio, right? And they're also saying you know what we need the off because of using OPA the author can also be the enforcer of the system And so, you know, we wouldn't when design consideration that we had to think of the consideration here is as we shifted more items Like hey, you know, what is the role of the pop-up engineering team? What is the role of you know, do we make sure that the Istio service up and people can make changes versus? Do we have another expert like a hey network infrastructure engineer take a look at this versus Do we have an app second generic take a look at some of the OPA changes like where is that balance, right? Do we put all that that emphasis on the developer which can be fairly difficult? Because again, you know that there's just two technologies. What will the future hold right as my fault You know where we are today versus in two years from now as my father would say, you know, I can predict the future Let me go buy a lottery ticket and I You know, I would I would just buy a couple other tickets and be set for life, right? But unfortunately, you know, we can't we can't predict the future But taking a look at you know, some other things that are coming on the line leveraging something like a Berkeley packet filter here Like EPPF, you know, that that can be something that we're looking into now. There are dangers of this The at the current time we're recording to talk there has been a very kind of big Lennox kernel EPPF or a particular CVE they came up, right? So it's some sort of incorrect balance calculation. I'm not a kernel engineer I couldn't tell you what that means but Because that's out there, you know, it's coming back to how do we keep complexity in check, right? Like hey, you know what going back to you. Hey, I'm not a kernel engineer. You know, I'm a software engineer or a platform engineer that low level CVE shows that hey as we continue to push more complexity People with the expertise or people who would have you know, hey, you know what? So we had a rel engineer who managed, you know, the lunch distributions Those are, you know, kind of rarity these days Where do we have the person with the expertise or people expertise, right? As you keep pushing more and more complexity It's you know what? I'm not a kernel. I get going back to not a curl engineer I also know no work engineer. So it still was a new one to me, right? No working roles And you know what not only that they have to offer multiple things But they have n number of customer research definitions that are going into, you know The deployment at this point and it just keeps going on and on so like be be wary of that, right? Like Kubernetes that the solve all now let's talk about. Okay. Let's be more positive now Let's actually talk about getting getting your idea Into production right and the kind of why you know, some of those caveats there Now we're a port to it to understand as you're kind of marching towards production So the first thing is I call us to talk in a talk I Participate in this organization called the continuous delivery foundation or CDF So I this is a promise based on very passionate about and I get actually give a talk called how much should you deploy, right? Well How much should you deploy is to open in the question here, you know You know, let's say you're curious for the first time you grab your copy of the lean enterprise and you Google it You know, I say, oh, Amazon deploys every 11 seconds I'm pretty sure this number is less than this now. You might say, okay, you know what? We're not deploying every 10 seconds. We're you know, we're doing no good. Well, hold on You know what the argument is not everybody's Amazon They'll actually take a you know kind of unpack what your deployments used to look like Before kubernetes, right? So who are going back to who remembers these deployments, right? And so deploying Prior to kubernetes room, which let's say we're mature kubernetes instance is that you know what you might be using a platform as a service, right? You might in this case, you might be using methods for DCS So that's authored in that marathon is in JSON You might be you have to be modifying if the past isn't doing it You might be modifying your load balancer rules, you know for logic a that's in a particular proprietary format Wiring for the application servers, right? So you might be looking at you know what? How do I wire multiple wild flies together? That's the chemical X camel And then in the in application clustering portion of like we'll say the in-memory grid stuff, you know That's all configured by XML from jgroups in the fitness band But each one of these was a separate language each one of these had separate Notation each one of these had separate ways to go about deploying So how I deploy a jgroup change is it different than how I deploy a Wi-Fi change versus How I would deploy a load balancer change or a path change or so on and so each one of these was separate right took expertise Versus, you know what if I you can wrap all of these into one very big kubernetes manifest or multiple manifest Literally kubernetes will apply out mega all as they have here But that's basically it right because of advancements of things like helm a package manager or things like customized Which is a configuration manager You're able to all speak the same language or speaking yaml, right? Might as you write like a you know an operator or controller that we wouldn't go or another language But you're all speaking the same language. There's nothing stopping you from writing one super lengthy Manifest or big yaml called mega all and to supply it right or apply it and kubernetes is up to the recent Because as now understanding those two particular things right is that and also understanding That your kubernetes cluster architecture right so as an engineer I'm always liking to take the path or developer taking the path of this resistance, right? So the great thing about kubernetes is that you know the adage of don't let license costs determine your topology It's actually very true kubernetes is you know technically free, right? It's an open source project So from your local machine using like mini cube or k3 You know depending on how you architect your kubernetes cluster to be a one very large cluster You know you separate groups by namespaces or if you have lots of multiple little clusters That's it clusters are cheap, right? But again, it all comes down to a management philosophy that it's kind of diagnosed the patient for a certain type of workload One of the things that you get out of the box to kubernetes and where it gets a little bit tricky It's actually very easy at the beginning. They get a little bit more tricky at you know As your workloads start to age is that kubernetes has this out of the box functionality call a rolling upgrade or rolling update There's two strategies. There's a replace that will kill everything and then just restart everything But rolling would be in this particular case that it's exactly as a rolling update into previous application use It's the same in kubernetes There's a number of running let's say instances you define how many you need at any given time How many you can go over and kubernetes will say, okay, you know what Robbie said he wanted to Particular tacos and then you can use two all chunky orders here to go over and so the first pass would be okay Replace two and two and then you can go to two total. It would have made more sense with four four and two, right? But I was running out of this It's on the screen here, but you get to just of it, right? It's like making sure that something starts It's healthy keep going keep going keep going until it's fulfilled Which you know it works kind of you know, but if you start having Let's say workloads of substance, right having a rolling update, you know what it can be either slow Or it could be actually it's not as safe. It could be the whole Part about continuous delivery is that you're supporting safety and iteration, right? So be going from Software is an iterative sport, right? You have to be able to prototype and then make incremental changes to release And then if you need to go back, you need to be able to fail fast, right? And so there's this deployment paradigm called a canary release, right? So how do you support? Not only incremental releases lots of them and also how can you support getting feedback quickly as something wasn't working a little bit different than a rolling update or upgrade now using a canary or Austrian canary will under underneath the covers be facilitating Those particular mechanisms that are out of the box to remain but if you're unfamiliar with what a canary releases this one we do all the time is that You actually it's intermole release. So it basically you're releasing a small portion. So here we have a 33% split here So the canary I say chunky orders version 1.1 or chunky orders version one of the application the canary version 1.1 You would actually deploy version 1.1 and one at the same time. So two versions are running You would stagger get traffic, you know, it kind of split them by the load balancer level and say, okay You know what our our instance is stable for 1.1 Let's move on and so you would promote the canary to stable So chunky order will be replaced here and the canary will now be version 1.1 is not stable And that's it, right? So as you make that judgment call and validate, you know You say, okay ship the traffic over to 100% and that's it, right? Like that's basically a very safe way of releasing All of this can be orchestrated into a pipeline And so if you're unfamiliar with what I continuous delivery pipeline looks like this is typically what one looks like here Or actually segregating Via different or different, let's say functions and so in this model I'm actually a big fan of this model that it's that it's I was called the tightening of the screw or tightening of the rope type Of model is that as you go from dev to QA to production You're actually getting more strict in your rule sets, right? So developments and trick tasks, you know You might be doing things like unit tests or code coverage or static scans versus if you're going towards QA You sure introducing infrastructure So you're doing like perf tests or smoke tests and then when you're getting into production here It's you know, you're actually looking at how do we have a safe a safe release like some sort of canary even continuing to run Test after deployment the going joke is you know, when is your deployment over? Well, it's over when the next one goes in, right? And so it's making sure that you are staying you're staying on it And I'm making sure your external customers have a positive experience. Lastly, I'll leave you with this As you're coming up the time here is that we only you know the level of automation that you like, right? So it's you know, some of these topics might have been a little might be too much automation for your particular group But that's okay. Hey, you know what? You know, it's the best or depending where you're on your journey Any sort of improvement is an improvement, right? You're helping better the craft But with that, you know, thank you so much for catching my talk. I'm Robbie Lockman You can catch me on Twitter at Robbie lock and until next time cheers everybody