 Hello, and welcome to Managing Developer Workflows with the Kubernetes API. My name is Colin Murphy. I'm an engineering manager at Adobe. First thing I'd like to do is cover why we chose to write our own Kubernetes client. Then I'll show you how we did it. As I said, I'm an engineering manager. My current responsibility is the infrastructure for DocumentCloud, both the architecture and the software. Why write your own Kubernetes client? You certainly could just write a bunch of bash scripts with kubectl, throw that into Jenkins, and get 80% of what you could hope to achieve with your own Kubernetes client. Is it really worth all the effort for that extra 20%? For us, the overriding reason is that just like anything at a large company like Adobe, you're never in a green field. You always have lots of other things to consider. First off, all of these applications were already deployed, either on DCOS or another home spun Quintain orchestration system. So we needed to go from those configurations onto Kubernetes in a seamless manner. So the first one is really, we want to provide outages. We need to exert a greater control over what our application teams can do than just saying, here's kubectl, here's some YAML files, go have at it. You do that, you're going to have outages because people just don't understand Kubernetes. And I'll get to that in a second. But you also want to limit the Kubernetes objects you have, potentially apply labels, keep them from interfering with each other. And all this is really just to say, I should point out that it's not a substitution for RBAC, network segmentation, open policy agent, other controls on the control plane side that you'd wanted to implement to protect. So this isn't a replacement for those. It's just a nice supplementary thing. So we also, as I mentioned, it's not a green field. We had an existing deployment system called Moonbeam. It actually made the problem statement a little easier because we knew we just had a SHA. And that's all we had to do with deployment, so a Git repo, a SHA. And that made it kind of easy to start. It was a nice place to start. But we couldn't just use Spinnaker because we already had a bunch of stuff that Spinnaker can do and likewise for other tooling out there. So the other big thing, we want to reduce cognitive load. Application engineers, they have a deep technical knowledge. They understand how to manipulate a PDF or an echo sign agreement or maybe some sort of machine learning. They don't really have time to do Kubernetes. There's no such thing as a person who fully understands everything that's required in a web application. Then we had our site reliability engineers. And really, they had the same story. There's a huge legacy application. It needs constant attention. They don't have time to win this whole other system, Kubernetes. We also have some compliance and tickets. So it's just not a thing that we could just, oh, let's just re-educate hundreds of people on how to do something. Another reason is secret management. We use hashacorp vault at Adobe. And we really can't have, especially some of the clusters with very complex requirements. We're going to have FedRAMP clusters coming online. But we have PCI, HIPAA. We can't always have a connection to some sort of from a cluster to a secrets management. So we really had to find a way to package up all the secrets an application could need within our internal network and push that in a safe way out to Kubernetes cluster in a way that would allow for rollback, make sure there wasn't a way that it would interfere with secrets in the next release, that kind of thing. And then lastly, it's really just not that hard. It's very simple. It's fairly well-documented. If you know Kubernetes, if you're familiar with the concepts of Kubernetes, you can write a Kubernetes client. I mean, Go is a little bit of a learning curve. You don't have to, of course, write your client in Go. But it's probably the best supported SDK that at least at the time I wrote the client. OK, before we continue, just want to point out that the client has been open sourced. It's on GitHub at Adobe slash ported to Kubernetes. So moving on to the requirements, I don't know if I've already pretty touched on them. The first one being just that we were migrating from another system or two other systems onto a single system. So we had to have the way that secrets were retrieved, have the way that environment variables set to be the same. And then also, the code had to be the same too, that the application code. So we couldn't change anything and how that was formatted. In addition to that, we needed teams to just immediately need teams to migrate right away. Or at least migrate without talking into each and every single one of them and treating them. It's a special little snowflake. So we did a lot of work around automation generating the right files for teams automatically. And that's still true now. So our onboarding system will stub out the directories required, and teams can just go from there. We don't have, as I pointed out, we don't have a team that can just meet with a new service every time there's a new service and figure out what they need, unless they need something new and different that we don't support. One really key thing and a real improvement over our previous systems was this whole idea of segmentation of responsibilities. We allow teams to do a lot within their repository, generating service instances, generating operator instances that allow them to create MySQL databases or DynamoDB or various Azure backing stores, things like that. And that's all their responsibility. And this tool allows them to do it through Kubernetes objects. We'll touch on a little bit, because that does cause some problems. But we certainly don't allow them to do whatever they want. But certainly a lot more freedom than they had before. There's no other. They don't have to learn some other system of cloud formation templates or some internal system we have that generates infrastructure code, that kind of thing. It's all completely in their control. And also, sometimes in a previous system we had, you'd have to go to some other service if you wanted to make a certain kind of change to your service. So another internal web UI, and you'd have to click through it. And it would take some time. And so that was another thing we really wanted to get rid of was anything like that. It's just there's a single deployment system, as I showed a few slides before. And you can make any change to your service using that system. And for change management, that system, we rely on that system to verify that the proper approvals have been made and reviews and things like that. There's a concept. This is a little bit of a stretch. But there's this concept of hermetic builds from the SRE manual that I think hopefully most of you have already built that way, which is no to build dependencies on the build system. So usually this way to do that is to build in Docker. But if you can stretch that a little bit to deployments, you don't want a service to roll back and have that rollback behave differently. Once you roll back a service, you don't want it to now behave differently than where it was when it was at that previous version. So we really want to tie the secrets and the environment variables of a specific get SHA for the service to that deployment in Kubernetes. So every deployment that a client deploys has the SHA labeled and we inject into it config map and secrets, a Kubernetes secret with that SHA. So if we ever roll back, even if that secret was changed somewhere in hashCorp vault, we're still getting those secrets. And we keep up to five. So if you roll back more than five previous deployments, then I guess you could potentially get a secret change. But we really just want to have that if you ever roll back, you know that you're going to a safe place. And then one last thing is we, I might have kind of inferred this, but we deploy to multiple cloud providers and some applications actually deployed to data centers. And we wanted to abstract that away from the service team. We didn't want the service team to say, oh, I need a different domain name because I'm in this, or this is HIPAA compliant, or oh, it's FedRAMP. So I have to make sure my Kubernetes configuration is a little different. So the idea was we're going to have the configuration live in each Kubernetes cluster. Actually, it's a per name space configuration. And we'll have the client as it deploys, read that information and modify the objects as required. It sounds actually kind of complicated. I'm sure we could have gone crazy with it. It's actually not that hard. For the most part, it's really just the Docker registry, whether it's used, whether the cluster is Istio or Contor, and also domain names. But it'll, I'm sure, it'll expand as we go. But it's kind of a nice idea to just have all that configuration, the common configuration for all the applications just live in the name space, saves us a lot of effort. OK, so moving on to how the ported Kubernetes works. We take the SHA, we clone the service repository, and we kind of go from there. We read in a bunch of environment variables and secret references. I'll just show you what those look like in a second. Then we read the cluster configuration from the cluster itself. And then we compare to what we have for an update. And we update the Kubernetes objects as required. So I'll just show you quickly because I couldn't get it into a slide what these configuration files look like. So it's a nested directory structure. We have a file, we try to keep it really dry. So these files will reference each other. We need at least one for every environment for every region. And as I said, they can reference each other. So if there's one that's common for all AWS, there's one that's common for all Azure, it's just all about scalability. That's the name of the game. So it looks a little intimidating. It's a lot of files, but it's fairly straightforward. So in those files, we have four different environment types. So in configuration environment entries, the first is most straightforward, an environment variable, and the value. We also have, as I mentioned many times now, secret references, so vault paths. We also have the ability to override cluster settings. So say you had a different certificate and you had needed a different domain name, TLS cert, or if you didn't want to use the same gateway for Istio gateway or contour ingress, that kind of thing as the other services in your namespace. And it's a constant fight. We're always getting requests for more features. We'll probably have to expand it a little bit, either some kind of annotations structure or something like that, because it gets very complicated with multiple ingress controllers, things like that. So I wanted to show this. This is, I guess, the first code. I was surprised how difficult it is to show code in a presentation. I just want to credit Catherine Cox Boudet. I read her book Concurrency and Go when I got started. She had this concept of pipelines that she talked about that I had never seen in another language. Maybe all of you are really familiar with it. But basically, you're just passing a channel here from in a nested kind of function structure. And it's really efficient and a clever way to, it's not, it is parallel computing, but it's not obvious, embarrassingly parallel. But you're passing through each region individually in a kind of an elegant way. But here, we're basically creating a struct for each region that's going to be deployed to, each cluster, each namespace that's being going to be deployed to. For us, it's typically along regions. And so what we're doing here is we're getting that configuration, we're fetching the secrets, we're reading in the objects, we're copying the objects, doing a deep copy because we're going to modify them. That was a lesson learned, right? You don't, you have a pointer to a struct. If you modify it for one region, it's going to modify it for all regions. So deep copy function method, actually. And then we get this, this big struct that has all the information we need to do our deployment. The way we handle kube configs is we have just, we have a file that lists, that's common to all of our services and it just lists out all of the kube configs in their locations involved. We use AWS IAM authentication for our EKS clusters we use. Active Directory authentication, Azure Active Directory authentication for our Azure clusters. So it's not really, they're not really secrets, but it's just a, it's, it makes all the security people happy, right? It's giving warm fuzzy feeling. All of our kube configs are involved and they get pulled down at, at deploy time for every service deployment. The next thing I wanted to talk about is structured client client sets versus dynamic. So I really like the, the client sets. I really like the schemes that, you know, you find those for all the Kubernetes objects recently, I think in the last year or so, Istio put out their own client library, they call it. We use the service catalog that has its own. Unfortunately, that newer stuff doesn't, so the Azure, sorry, the Azure service operator and the AWS controller for Kubernetes, they don't provide these. So we have to actually do dynamic. I'll talk about that in a little bit. It's kind of an upcoming challenge for this. But yeah, we just use the universal deserializer, just read in YAML, it's really straightforward, really easy. And then we have a, you know, construct with all the, all of the possible Kubernetes objects that we can apply. And, you know, there's only a finite number, right? As opposed to when we get into custom operators, it really, it really expands. And we have to kind of find a new way to do things. But for right now, this is what works and this is what we do. The nice thing also about the schemes and the type client sets is it's really easy. For instance, we have to always make sure that we, you know, if somebody for some reason puts a number of replicas in their deployment, we typically discourage people from doing that because we want people to use horizontal pod autoscalers. If they haven't horizontal pod autoscaler, we always, we wanna make sure that when they deploy, they don't reset the number of replicas. So just, you know, something like this, really basic. We just, we don't have to transverse a map of strings to interface, right? We just spell out the whole, the path. Actually, so sorry, in this case, it's actually just making sure that the horizontal pod autoscaler references the deployment that is actually being deployed. We don't want somebody to make a horizontal pod autoscaler that modifies someone else's deployment. That, you know, that happens. People copy from one another as much as we tried to automatically stub things out. Just, it's just a reality when you have this many services, right? Hundreds of engineers and, you know, over a hundred services. So, and as we expand this actually out to more people in DoE, we're now going to talk about thousands and, you know, thousands of engineers and potentially thousands of services. So, yeah, just these little kind of little things, really easy to do with the type client set. Cluster settings, so these are, these are when I talked before I mentioned that there's a cluster-specific information. And it's, we've done a kind of a good job limiting it because you could really go crazy here. But these, the real, you know, the kind of the obvious thing. So, registry, we have registries in every region. You know, we don't want an outage say of AWS in you use central one, knocking out our Azure in, you know, Singapore, right? So, we always want, in our deployment strategy, we always have a regional registry. So, that has to change for deployment. And we don't want the, we don't want, that's not the application engineer's problem, right? So, in their deployment.yaml, they don't have to, they kind of just, there's just kind of whatever, you know, my registry, and then the name of their, their Docker image for the, so that just gets overwritten. It doesn't, it doesn't, you know, ported Kubernetes can handle that. Whether the cluster has Istio or not, the domain name of the services in that namespace, the Istio gateway, if there's Istio. So, a minimum number for the HPAs, we typically, you know, it fits in production. You need, you need two, some services need, you know, a lot more than that, but we always want at least some redundancy, sometimes three for some heavier use clusters. And then what, you know, what is the cloud? Is it Azure? Is it AWS? What is the cloud location? This is typically used for the service instances. You know, is this a, is this, you know, what does the cloud provider call their region? And that, that's useful for setting the location and say an Azure service instance, or an, or the Azure service operator, or the AWS equivalents. And then let's see, update. So this is, we, we, this is how the updates happen. It's, you know, it's kind of, if you're familiar with Go, this is kind of probably not that interesting. But if you're not, this, this is, you know, this is basically how we do it. We, there's an update function defined for every type of Kubernetes object because we have those, those type client sets. And basically just make a work, make a weight group. We run that function and then once they're all done, then we, then we, we exit out of this, this, this updater, we, this updater returns the, the, the, the channels with the results. And then it gets called again in a for loop for the, for the next update function. And, and just iterates through. Obviously the, the fact that every namespace can be updated simultaneously. It's a, you know, it's an, it's an embarrassingly parallel operation. So there's nothing, nothing too clever there. Just a, just a fan, fan out, fan in and check the results. Okay. So to wrap up, I just want to go over some of the things we're working on now. The first is we have the AWS controller for Kubernetes and the Azure service operator we're working on supporting those. And they come with this whole mess of CRDs. So it's going to be a common problem. If anybody wants to write their own Kubernetes client, you're going to run into this issue unless you're very specific in what you want to do. If you want to just support, you know, hundreds of engineers, you're going to, you're going to need to, you're going to run into this problem where there's just too many CRDs. And unfortunately the ACK and the ASO, they don't come with client sets or schemes. So you can either generate them yourself, which sounds like not a lot of fun or you can use the dynamic client. As I mentioned previously, the dynamic client has a few drawbacks because you don't have an actual struct that you can really pin down and grab something and compare it to something else. Right, it's a map of a string to an interface, right? But you know, we're going to try to work through that and see what we can do. I mean, right now we're just reading it into an unstructured type of a Kubernetes API, unstructured type, and then using the Kubernetes Go client to apply that. We'll see, not quite done. So I don't really want to show it in action, especially if it turns out to not work at all. And then the other thing is that we have this situation where teams don't want to deploy every object to every namespace, or they want to have a certain type of ingress that they use instead of, you know, instead of just every ingress that's available. So it's kind of how teams can fine tune things. As I mentioned before, we have the ability to override the produce Kubernetes, you know, the region, the registry, those configurations, but that's kind of a little bit too blunt of an instrument. So we're going to work on that. Probably some type of annotation or label that people put in their Kubernetes objects in their k8s directory inside their repository. Looks like that's what we'll do, but you know, still haven't decided yet. So I'd like to thank you all for coming. Thank you for bearing with me with the lighting, which is super inconsistent. I don't have any good, you know, lights. So I'm kind of using the sun and that's kind of fickle. And then also, if you want to reach out, ask me anything on the Kubernetes Slack, I'm there, I don't use Twitter. I try to maintain my sanity, but please, you know, please try it out. You know, as I said, this is all on Git. It's meant to be really a reference for other people to use, but you could, you know, you could use it. We use it here at Adobe for lots of services that are run heavily used. So please take a look. Thanks, bye.