 Welcome to our session today get ops as a service My name is Andrew block. I'm a distinguished architect from red hats I specialize in a number of different areas everything from cloud native Architectures CSED get ops and security. I'm an open source maintainer on the helm project as well as oris Which is a object OCI as OCI registries as storage and also I'm an author I've written two books in the cloud native space one on Kubernetes secrets and helm My name is Gerald nun. I am the open shift get ops technical marketing manager for red hat I've been with red hats since 2016 before my new role, which I started in January I was a solution architect for for many years. I live not too far from here near Victoria in Victoria, British Columbia with my wife son and three very slightly annoying cats We'll try to stay away from being So today we're going to talk about a number of different areas first of all What is this concept of get ops as a service and then all the considerations that you need to think about When whether you want to choose to run a get ops as a service Whether the bit the buying or the building one of your own some common principles and personas that come with Get ops as a service as well as some common service models considerations for isolating workloads as well as some deployment methods and of course our everyone's favorite standards So get up to the service so everything these days is being offered as a service you go to AWS You go to Azure you go to Google everything's a service database as you name it now get ops can also be run as a service and The question is do you want to or do you not want to and it comes under the question of do you want to buy a Managed service of your own or build one from it yourself and in addition to that you need to think about all the trade-offs when it comes to building versus buying Building gives you that flexibility. You can do whatever you want to your heart's content However, it does require you to have an quite a bit more technical acumen than if you just go ahead and click a button Something just available for you now if you do go down that route of Clicking the button and getting get up to the service or a managed service offering you well Yes, you can spin up resources really relatively easily But unfortunately, you're kind of baked into what that service provides You don't have as much flexibility to customize it And that's really some of the trade-offs you need to think about when determining if you want to go forward to Use a service that's provided for you doing the whole buying option or building it on yourself As being a technical person. I always say I can do it I could do it but not everyone wants to be able to have that opportunity to build everything and maintain it yourself So in the get-offs world there are in many cases two different personas we need to cater to One of them is your platform operations team. They're going to be the one that manages Kubernetes clusters Dole those clusters out to individual application teams and then you have your developers These are the ones that typically are your application users Everyone from I'm a core developer and that's all I do is code and I might play with Kubernetes all the way to Those who are doing testing CICD and everything but not at a cluster level so What are some of these responsibilities between these two users well developers as I mentioned will typically be just responsible for Maintaining applications. They know how to deploy applications. They know what an application is They may be familiar with Kubernetes. They typically work with an individual namespaces one or multiple individual namespaces They do not have a viewpoint of an entire Kubernetes cluster and in most importantly because they only have a set number of Namespaces that they manage they typically have Less permissions at a cluster level. They're not able to see everything They can just see their own little slice of the world versus the platform team They're responsible for everything they can go ahead and see all the different namespaces on the cluster They can see who's out there what resources are available be able to define new custom resource definitions Everything to the heart's contents and know what they need to do is they need to consider the balance between Developing teams and what they need to do from a platform management capability And that's really the two different personas that we typically see when working with a get-offs as a service type paradigm So what are some of the common paradigms that we do see when it comes to get-offs in general number one? If you're just getting into get-offs the first thing you're going to do is deploy the actual application whether it be flux Whether it be Argo whether it be not even Kubernetes whatsoever If you want to actually use get-offs in a different context as we talked about earlier this morning get-offs is not only Kubernetes it's really anything that can reconcile state But in most cases especially for this talk we're gonna talk in the Kubernetes world get ups bring your own get-offs I want to go ahead and install it and maintain it myself developer does it great The other one is it's going to be when you have a platform install and manage get-off solution But then you have your developers who are going to be able to manage the day-to-day life basically just doling out this from a From a application or from a platform standpoint and then giving somewhat of an autonomy to your development teams But once again, you still give them a small set of the world that they can then manage themselves And then finally the last one is where you have the platform owns the platform manages everything It just goes ahead and it has the entire ecosystem that it has to deal with as someone who lives in the platform side It's a lot to take on I'm not gonna lie So these last two boxes here these two paradigms are what we think of as get-offs the service It's not going to be the they bring your own get-off solution It's gonna be the one that you're gonna have some team manage and then provides to others or at least you have to use it for other use cases So some of the challenges regarding this model is how much complexity that you need to provide to both your development teams and Your platform teams now when you have a bring-your-own get-off solution You're giving more onus and more technical acumen that's needed to be made on the developer side of that world They need as I mentioned earlier. They need to go ahead and know how to manage a get-off server Many cases developers know how to just manage applications They don't even they barely even even know what get-offs is you have to teach them you have to go ahead and say okay I can go ahead and maybe deploy on day one, but how do they manage actual platform infrastructures? From a platform perspective, there's the day to care and feeding of your server I mean potentially backup for a store Operationalizing notification monitoring some things that you don't think about in day one But now it becomes a developer problem where all they care about is rating code in most cases Versus a platform side if you go ahead and you Give them all the power to manage everything you're going ahead and putting all the onus on them to be able to then manage everything they have full control but once again, they need to then manage every aspect themselves and It's someone who works with a lot of application teams Applications teams. I'm sorry to say you're needy. You're really needy. I want this I want that especially when you have cluster level resources like CRD's. I need the CRD. Oh, it's my app's not gonna work I promise so really If you want to choose between one of these different service models It depends on where you are within your organization So bringing on get-offs is great when you have maybe a small number of application teams Developers are very 10x. You know, they can do everything. They don't need a lot of care and feeding from an application Maybe you have any small clusters If you want to use more of a platform manage side of the world This is when you start looking at developers who are just getting into Kubernetes They may not pretty familiar with Kubernetes Versus you then need to have more of the platform team manage more of the day-to-day feed of The get-off solution and at all and then finally a platform and selling platform manages when you have really developers who have No Kubernetes expectation experience, but you need to have a high You need to think about security because security as you know and have the talks either Today, but also especially when I was at get up Qcon two weeks ago. I guess is that only two weeks ago. Oh Geez All the talk was security security security. We're gonna talk about security in a moment But that's where it's really important when you think about learning a get-off at scale You understand the security is baked in it's fundamental if you don't think about it You're gonna actually hurt yourself in the long run So I mentioned isolation I'm going oh, I think he's doing a great job of covering my slides. I'm sorry about that No, that's all right stuff happens. You're on a roll. I didn't want to like stop you. It's like getting in front of a moving train I'm not Superman. I'm not gonna stand in front of that train when it's on a roll I'm gonna just tackle isolation and you can just go power through it Sounds good about that So isolation as I mentioned security is incredibly important security and all aspects is and I actually had a talk on this at a get-offs con two years ago give or take around securing get-offs It depends it depends on your organization and your team structure Do you want to be a are you able to? Satisfy not only your organization requirements because many different organizations have very different requirements everything from regulatory compliance to just Different principles and practices within your own security teams and then once you look beyond the organizational requirements How good are your individual application teams? They trust each other. Can you trust them? What is the autonomy between them? How much tech a few men do they have? That's something you need to think about when you want to isolate workloads because if you have application teams who have no idea But Kubernetes are you gonna really give them access to to kubectl delete things in different resources? That's just a recipe for very very bad choosing There are multiple multiple isolation models that you can consider I'm gonna have to turn this one over to you for the isolation model. Sure. Why not? Yeah So in terms of isolation there are a variety of things and just to build on what Andrew was saying as well It's not only just about security. It's also about Misconfiguration and human mishaps that happen who here has accidentally deleted a database in production. Just me not the only one You know people are making mistakes It's not unusual for two people to stomp on each other or interact with each other in ways You don't intend in a system and isolation is important along with security to prevent those mishaps So in terms of isolation models from a full isolation perspective Everybody gets their own get-offs instance right as Oprah would say here's a good office instance for you Here's a get-offs instance for you off you go to the races But that can be somewhat expensive So the next model really that we see it's probably the most common one is partial isolation around boundaries These can be team boundaries. These can be situational boundaries I eat different use cases where the differences are going to react and then finally no isolation Everybody's just sharing the same instance and off the go to the races So you look at these different isolation models You can see that there's a relationship between the safety that's being applied with that isolation model as well as the Resource utilization that's being used by resource utilization not just referring to you know memory compute type resources, but also the human resources that are involved in terms of managing it because more isolation means more Instances means more things that people need to manage and maintain So as you increase the amount of isolation that you have you increase the amount of safety that you have both from a security And a mishap perspective, but you also increase the amount of resources that are going to be used right Just curious here if anybody's operated kubernetes cluster anybody done Jenkins on kubernetes and You probably seen how many resources handing out a Jenkins instance to everybody takes right does the same idea our Sustainability friends would not be happy, but that's all running Jenkins at scale. Yeah Exactly So that leads us to a logical topology So from a logical topologies point of view, this is kind of the three different topologies. We typically see The no isolation is everybody's sharing a single get ops instance in a cluster both from the cluster configuration use case and D team use case. So the cluster configuration use case is somewhat special because in order to configure that cluster Typically, I'm going to need either cluster admin or near cluster admin type privileges in order to do that configuration And I'm going to share that same instance with teams that are deploying applications. So for the Minor security guy that's in me the little security guys. I'm not a big security guy the little security guy. I'm a developer That gives me a bit of the PBG these right just the fact that if something goes wrong in terms of how I've configured The RBAC and the permissions in my get ops instance Somebody could do something on that cluster that they shouldn't be doing the partial isolation is really designed to address that Where we separate it out by use case the cluster configuration runs in its own get ops instance with its own set of privileges And the teams get a much their own get ops instance, but with a much reduced level of permissions, right? They can operate in certain namespaces. They can perform Operations and namespace scoped resources, but they can't do really anything the cluster level Now in situations where you need maximum isolations, maybe you're an industry that has high regulatory requirements Maybe you don't trust your teams teams are particularly technically sophisticated You're worrying about different teams stomping on each other and even inadvertently You can break them under completely separate resources and manage things that way as well Okay, so that's the logical topologies But there's also the physical topologies that come into play and how we deploy this we saw a little bit with Dan I think earlier where he had his four topologies for me I like looking at this as two topologies and for me I really go by intent What is your intent when you're defining your topologies your tent centralized I am going to have a centralized get ops instance That's managing a bunch of different clusters Or is my intent distributed where I'm going to have separate individual get ops instances running on each individual cluster and managing it that way So centralized as Dan alluded in his presentation It's great from the point of view you get that single pane of glass I can see what's going on across my whole fleet no problem at all The downside of it is it's also a big fat single point of failure If you lose that cluster or you lose that instance or the networking goes out to it You've lost your manageability across all your different clusters Distributed on the other hand addresses that single point of failure. It solves that problem But you lose that single pane of glass now like I said I Define these by intent, but there are variations, right? There's a lot of different ways to do your Topology model so a common one that we see I like to call it the Intuit model because they were the ones I first saw it first a couple years ago I think that one of the argocon 2020 I think it was where they outlined how they do this and they essentially I look at this as more of a very to centralize you could maybe argue. It's a very much distributed I'm happy to argue it over a beer with you at the pub or something afterwards But for me it's centralized because it's centralizing access based on a different boundary not the cluster boundary But the team boundary right so I'm going to have that single instance of get ops That's running a team can see across the different clusters like say a non product cluster in a product cluster see their Applications get that single pane of glass, but I avoid that single point of failure in the sense that your teammate loses Their instance, it doesn't impact team B or team C now if these are all running on a central hub cluster And I lose the hub cluster are still blown up everything But that's really another way that people tackle that and the other topology We often see is the control plane topology and Dan alluded to this as well in his presentation Essentially you have a hub management type of cluster that's running it has something that's running on it And it is managing your get ops across that fleet This could be a distributed architecture as I've got a picture here where you've got get ops running and all the different hub clusters And this is my preferred architecture for hub But you could also have a hub that's using a centralized kind of model But providing some extra capabilities over the normal centralized model as well The nice thing with the hub is that when you're doing this distributed it avoids that single point of failure Yes, if I lose my hub, I lose some manageability in the sense that I can't make change But those instances that are running on all of those other clusters are running just fine They're managing it keeping things in sync everybody's hunky-dory So with apologies recommendations point of view I am a fan of not using the same get ops instances for cluster configuration as for teams I like keeping those two use cases separate and running them in separate instances for the reasons I gave earlier But having said that choose a topology that works best for you Every organization is different every organization has different requirements. There's not a right answer that fits everybody So pick the thing that's going to work great for your particular use case that you're trying to solve Do not have a separate get ops instance for each and every application Don't go to Jenkins model You're handing them out like candy and just chewing up resources and require more management Align the number instances that you need along your team trust boundaries, right? So if you have different teams that work well together, they're part of the same or broader organization That's an opportunity to kind of aggregate them into a single instance And then finally Dan had this I think earlier in his slide to use get ops to deliver your get ops That's what it's there for right don't manually try to provision all these get ops instances yourself and manage them Have a central get up somewhere Argo of Argos Other tools like Ocm that we saw earlier push those out and manage those for you So when we talk about all these different recommendations and all these different paradigms What do we what does that lead us? How do we actually do this within an enterprise organization? So what I do and what Jared Gerald also looks into is where do you start you start with some industry except in principles? The good one is the the principles that are emphasized by the open get ups Because they're a good North Star to begin with everything from making sure your manifests our version make sure they're declarable Make sure they're constantly you know reconciling all of that that's a great place to start and then on top of that You're gonna have your organizational constraints everything that you need to comply with whether it be or regulatory compliance Whether it be just I need to make sure that every fifth Friday of the year we do x y and z Inject those in and then finally you combine some of the best concepts from the community some ones that are That are out there in terms of how we manage get-outs of scale Combine up with your organizational constraints Those become the established processes that you can implement within your own organization You bring these together and you'll be able to then refine and look how they'll actually work because no one gets it Right nobody's the right a lot of us. We're just guessing. I'll be frank. We're guessing But that that's where you start with you think we did get ups perfectly the first time No, of course not. I I work here work with organizations across the globe and we can tell you that they ask You know us how do we run get-ups and we'd say here's where you start. It's a journey We'll come from the journey together and learn and then I always recommend is after you learn Share it because you're guaranteed. You're not gonna be the only one that runs into the same problems and challenges That's what the open source communities is meant for so as I mentioned There is no Best practice as an architect people ask. Oh, what's what's your best practice? I'm like, I don't know. I'm just a guy Same thing number two kind of ways laws always gonna prevail Your get-outs processes are always gonna be a reflection of your organization That's why we know we brought in number two as really it is to me still part of it Just know that that's why every use case is different because you have to sprinkle in those organizational constraints Then finally kind of harping back to the open get-offs practice Make sure these don't become paper standards make sure you document them make sure their versions make sure they're well Established and declarative so others will know about it. It's like if you have a Jenkins server sitting underneath your desk It's just there. It's not gonna be essentially manages. No one knows about it That's why you have always a call that's um, oh where you you're kind of running your own your own thing There's a name for outcome of later But anyways, make sure you apply you comply to these standards and then be able to actually be successful in your get-offs journey So one The big there we go Sorry, one of the big ones that people get hung up on is repository and directory standards And I'm not here today to kind of lay out to you the one, you know 12 commandment type style of stone here's my here's my standards off you go to the races because again There's no one standard that works for everybody But a lot of organizations when they're starting with get-offs and they're trying to get the offices of service going get really hung up on What is our standards? How are we gonna do it? I think it's really important to look at your organization's operations model, right? So if you're a traditional enterprise and you have a traditional developer team and a traditional ops team and the ops team Does things separate from the develop team and they control certain This is that the developer team never sees never has any access to that's gonna be a very different model in terms Of how you set things up than a fully matrix DevOps team where the devs and ops are working together in a single team deploying the application across all different environments Avoid analysis paralysis is another common one. I see it a lot of our customers where it just goes around on a rabbit hole Or a big circle in terms of trying to figure out what their standard is You know start small and start high level, you know You don't need to lay out your directory repo structures to the end degree or the ninth Level of traversal, you know, here's my high level standard We're gonna go with that. We're gonna start with that. We're gonna see how it works and then be agile iterate iterate iterate Don't be afraid of change standards You know the norm kind of a plot the the name kind of implies that it's not something that changes these change And it's okay that they change as you learn as you understand your organization's needs and requirements It's very natural to vault your standards in order to better adapt to the organization's Needs and where they're going Achieving consensus with development teams through a platform team You're trying to put in a get-up standards one of the challenges You might have as you've got 10 20 30 different development teams and they may not agree what those standards are right so Seek input from those developer teams They will have a lot of good ideas that will have an impact on those standards and be useful But do not get into an endless loop of discussion with you know The teams in terms of what those standards should be at some point in time A decision has to be made and everybody has to go forward for it with it And that's really where management buy-in is critical to break those long jams and move forwards You really need to have management and on board and in line with you in order to be able to make those decisions and make things happen And then finally it's 2023 I'd be amiss if I didn't include platform engineering in any talk these days It's with over hearings platform engineering Now there is a good alignment between platform engineering and get off as a service in many cases You know they're going to be very complimentary to each other both attempt to standardize and enable developer productivity Because you want to make it easier for developers to build and do good things The get-offs has so many benefits and being able to then provide that in streamline how you deliver applications All the way to her to production now. I'm not gonna be honest in many cases a platform engineering team at one point ran a get-off-the-service Platform because they platform engineering is just yet another extension of what get-offs to the service can provide So especially in large organizations to be able to kind of remember get-offs is just one aspect of delivering applications to production It's just one of the services that you would then hook into platform engineering So being able to have developers be able to easily consume Resources that are provided by a get-off as the service capability is just one yet other component of a golden path that makes easier for them to get to production So we have a number of different resources here. Joe don't walk the resource Hello, yeah, there we go So these are just some resources you may find useful some of these were referenced for these slides acuity Did a great blog on our OCD and how many instances you need if you're using Argo and flux as well has a great Suction in their get repo in terms of doing multi-tenancy directory structures and things that come from that perspective for repositories and directory structures There's a lot of prior art that you can look at and Investigate the Argo CD autopilot comes to mind allows you to bootstrap projects to get them up and running quite easily and quickly It fluxes well documents that recommended repository structures if you're on the flux path because I'm doing the presentation I reserve the right to promote my own standards. So I've got a link in there for those as well Take that for what they're worth the other thing that comes up for us a lot Being us both from red hat. We're using Argo CD every day from our customers is how do I manage our back for my tenants? I eat the developers that are at leverage that there's a great blog that one of our colleagues wrote and how to do our back in Argo CD and shows you how exactly how to configure Your our back for the different tenants to achieve the goals that most platform teams I think are looking for in terms of the the isolation and Accessibility that they're providing to the platform So that brings us to the end. Yeah, any questions? You don't have any now Gerabel and I can certainly address them afterwards. I'm not going anywhere yet So when you give your teams or your developers a GITOPS environment Do they manage it themselves? So do they create a PR against the the central GITOPS repository? What typically happens especially in a lot of organizations I work with is there's going to be an ITSM like a service now That's provides them access it basically just makes a commit to a more centralized the office of service repository Which then makes whatever get up solution you have available to them and then they manage it from then It really goes back to those service models though again, right? There's different levels of service models and really depends on what you want to offer So if you want to offer the service to the point of view where it's like application developers Just say here's my set of manifest or even here's my source code and you take care of it all That's one level of service another level of service is more the where you're alluding to is that here's a GITOPS instance for you To use and you can manage it now whether that management happens through PRs or they directly manage it again It's kind of up to you to figure out from a thing that works for you from a service perspective a lot of Factors that go into that as are my developers sophisticated enough to manage it and platform team wise Do I have resources to do the full-blown GITOPS as a service? Do I have the ability in bandwidth to create that solution versus maybe something? That's a little further to the left right spectrum. Yeah, thank you Anyone else thanks a lot everyone. Thank you